Human vs Computer
In my post Disruptive Programming Languages, I snared Ian Griffiths into my trap. Ian Griffiths is an .NET author and instructor at DevelopMentor and writes lots of low-level .NET articles on the web. I own his book .NET Windows Forms in a Nutshell. Maybe he’s getting back at me, after I was surrendipitously able to score one up on him on weak delegates in a prior post. Thanks, Ian, for the opportunity to present myself as your peer and pair off my sophomoric ideas against your expericnce and research.
Here’s a simple exercise: How many words can you come up by rearranging the three letters in the word TEA, including TEA? Even though there are six possible words that one can make from rearranging the letters, it’s likely that you will not have found every word. You can also try this for many other words with my WordFinder application, which I’ll post later. A computer can search for and find every instance of a valid word each time without fail. A human may miss some words that he knows as well as some he doesn’t know.
I often think about Mathematica, which I use to solve all kinds of symbolic math problems. The results that I get from this program are better than I can obtain by hand—much faster, often instantaneous, and without errors. It’s a good example of artificial intelligence, where the computer consistently outperforms humans in intelligence task. I also believe that is the ultimate destination of all AI.
When I was speaking about compiler output surpassing human output earlier, I was viewing this as part of a long-run trend. Most compilers don’t utilize logic programming, which is useful for emulating human reasoning, although we might be seeing some of that with the “whole program optimization” feature in C++.
Also, when I talk about compilers, I am speaking rather generally. In some sense, I am comparing a computer-managed process against a human-managed one. When we entrust more and more work to a computer, there may be more overhead initially, but, over time with more optimization and because of the computer’s advantages in speed, the computer-based approach will likely far surpass the manual labor.
I would include .NET garbage collection as a computer-managed process which competes well with C++’s finalistic determinization, a manual process, even though the latter has more domain information. Over time garbage collection will improve from better algorithms and coevolution. Through coevolution, processors have evolved to work better with compilers than with humans. Future architectures will be optimized for garbage collection. (I am reminded of processors that were optimized for Lisp decades ago.)
I believe future languages will be declarative and symbolic. Current control constructs will be replaced by various forms of search, which optimize away to conditionals and loops in the simplest case. Future languages will probably resemble functional and logical languages like Prolog more so than the imperative languages most developers use today. I suspect that our present day of writing low-level control constructs leads us to less sophisticated programs, because programming proceeds at a human pace rather than at a computer’s pace.
I wrote earlier that compilers can currently emit C code that can run faster than handwritten assembly code. Ian responded:
“Not that old chestnut!” This statement is only true for small values of developer capability.
Intel still release hand-crafted performance libraries. The ones I've used (for image processing) still beat the socks off anything produced by a compiler - and we're not talking something that's a few percent faster. These libraries run several times faster than the same algorithms written in C and compiled by the compiler.
Ian states that the statement is true only for small values of developer capability, but that probably still represents an overwhelming majority of developers I imagine—and not just those who are don’t know assembly language. He means you—mainstream developer.
I admit that the current C# and JIT compilers aren’t very good optimizers. The C++ compilers are probably better ones in the Windows platforms, though C/C++ have some difficulties with aliasing because of pointers.
I don’t think the Intel libraries are a good example. The libraries themselves could almost be considered part of compiler. Intel does develop their own optimizing compilers with its finer understanding of their processors, since it doesn’t have access to the commercial compilers of other vendors, which are more platform-agnostic. Intel contributes to these other compilers in the only way it can by providing libraries. The type of code that Intel produces in its performance libraries may very well be handwritten; however, the code is written by chip designers themselves, very likely aided by extensive computer analysis, and probably have undergone lengthy development and testing. Also, these libraries can make additional assumptions that a general compiler cannot. Users are not likely to write equivalent code without the deep level of knowledge that Intel programmers possess.
I also wrote that if the compiler has the same information about a problem as a human, the compiler should theoretically alway be able to produce superior code.
That's patently bogus. The human will always be able to do at least as well as the compiler because if all else fails, the human can work through the same set of rules the compiler is using. Indeed, there's an age-old technique for guaranteeing that you do at least as well as the compiler: get the compiler to write your first version for you, tweak what it gives you and always compare against the original when benchmarking. You are guaranteed to do at least as well as the compiler, because if you make any changes that make things worse, you just back them out again.
Here’s why, if a compiler encodes all the rules that a human would apply in optimizing the code and both the human and the computer have the same information, then a compiler can successively apply rules until no more can be applied. Since the application of one rule can affect the subsequent application of another rule, the compiler may have to backtrack to consider alternative reorderings of rules. Assuming the computer has applied all rules, a human could not perform better, because the computer, using the same rules, has already considered any enhancement the human could add.
Because humans built computers, my statement consists of few circularities. Anything produced by a compiler is also indirectly produced by a human. The other circularity that Ian mentions is that a human could modify the output of the compiler to produce better code than the compiler, unless the compiler has already considered all of the human’s enhancements prior to output. If a compiler supports inlined assembly as Visual C++ does, you could also use the same argument in reverse, in which you feed back the human-modified code into the compiler; the resulting code of the compiler will be as good as the human.
Optimized code is notoriously difficult for humans to navigate. Something as simple as reusing a variable for a different purpose in assembly can make code very hard to follow. When modifying optimized code, humans often introduce subtle errors. Many times, an “obvious” improvement to optimized code actually reduces performances.
In many cases, it will not be feasible for a human to reasonably understand the compiler output. Code may be generated as a fully optimized state machine, for example. The output of a neural network, used in machine vision and speech recognition, or a genetic algorithm is very hard to modify, if either technique was used in compiler.
Processors are also increasingly being designed for compilers and not for humans. I haven't written in assembly since the early 1990s, but I know that one considers not only instruction cycles but a host of new constraints like instruction ordering.
Ian then asserts that my axiom "if the compiler has the same information about a problem as a human" has never been valid.
The compiler doesn't have the same information. There is almost always domain-specific information that can be brought to bear on the problem which cannot be put into source code. (And in any case, compilers still don't make use of all the information that is technically available to them today.)
There are at least two situations where the compiler can extract additional domain information, without the human having to include explicit invariants in the code.
1) Profile-guided optimizations. In this case, the compiler uses information from profile data from an instrumented executable to reorder and optimize code. Profiling provides information to the compiler about how people actually use the product, about which code paths actually get hit often.
2) Whole program optimization. The compiler, in this case, considers the relationships between functions by producing theorems about each function, and then going back to each function and reoptimizing based on those theorems.
These two optimizations can allow a compiler to indirectly deduce doman-specific information. Futhermore, the information obtained can be even more valuable, because the compiler can generate a far greater amount of information, most of which may be nonobvious. These kinds of optimizations also make it difficult for a human to modify code without adversely affecting its performance, because the assumptions that the compiler was able to make are not explicit in the code output.
So your statement is untrue. It's also not really relevant. The important thing is that compilers are good enough. They certainly do a better job in the time they take than a human could do in the same time, so it's all about saving developer time, not machine cycles. Using higher level languages offers a host of benefits, but the quailty of the code is one of the costs, not one of the benefits - we trade quality of code for speed of development.
Of course the other reason it's not all that relevant is that an awful lot of code isn't limited by raw CPU speed. The usual killer is sitting around waiting for stuff to come out of main memory and into the cache. That's something that you can't fix by tweaking the compiled code - you need to address either the data structures or the algorithms or both.
So compiler output really isn't that hot. The key thing is that most of the time it doesn't need to be - it simply has to be good enough.
I agree the bottlenecks today has very little to do with instuction speed. The bottlenecks are memory access and the efficiency of algorithms and data structures used. The compiler may still be able to help with the latter two cases, but more likely it makes more sense for a company to focus on other productivity features. This only strengthens my argument that languages need to get higher-level. I don’t believe that quality of code is necessarily words with higher level languages. You are more likely to write higher quality code in C than in assembly and, also, in C++ than in C.
I also want to point out that compiler output is not limited to assembly, but also to intermediate languages. In such cases, the benefits of compiler-generated code over human-generated code may be more real in terms of performances, not just in productivity.
By the way, there are five words that can be made from exercise earlier—every combination of letters except AET.