When compilers surprise you

37 points by eranb 20 hours ago on lobsters | 5 comments

Fascinating - given the nature of computing, there probably is a close to infinity of arbitrary and case-specific optimization one can implement.

I wonder what gems one can find in interpreter-compiler hybrids of a JVM variety :)

chinmay | 2 minutes ago

when i read the first half my brain immediately went "i swear i've seen compilers just do n(n+1)/2 before", then i saw the second half and clang did :)

smlckz | 12 hours ago

From LLVM documentation about indvars pass:

Any use outside of the loop of an expression derived from the indvar is changed to compute the derived value outside of the loop, eliminating the dependence on the exit value of the induction variable. If the only purpose of the loop is to compute the exit value of some derived expression, this transformation will make the loop dead.

Sigh, compilers really need to understand before performing transformations like these.. I'll take this chance to ask: could we have benefited from expressing the intent directly*, than using "idioms" and letting the compiler guess the intent from "pattern matching" over such idioms; and, are undefined behaviour worth leaving undefined: now that we take matters of "safety" more seriously†, what benefits does undefined behaviour still have? This manner of tacit ((potential of) mis)understanding arising from (in)advertend use of undefined behaviour leaves one apprehensive.

* At a ''comfortable'' level of abstraction that would be appropriate for a particular language; so preferably not expanding/reifying down, and more so much not using inline assembly or intrinsics etc. directly. Some way we could provide more information, available to us, to the compiler, so that it could make more informed decision at optimization..

† Or it seems so: while it has been a matter of prudence and discipline, we can now afford better tools, constructs and mechanisms for ensuring some more safety; which doesn't mean we can bear so much with the difficulty of using these.. one can consider the (lack of) popularity of formal methods.

wrs | 8 hours ago

Isn’t this the step that enables vectorization in a later pass? Vectorization by pattern matching seems a lot easier than expressing the intent directly, especially since the exact vectorization capability (both operations and width) will be processor-dependent.

aw1621107 | 4 hours ago

From LLVM documentation about indvars pass:

Probably worth noting that the quoted bit is preceded by "If the trip count of a loop is computable, this pass also makes the following changes". To me that makes the quoted change more like basic constant folding/loop invariant hoisting.

could we have benefited from expressing the intent directly

Two potential responses off the top of my head:

Who's to say that what's in the code isn't already the author's preferred way to express intent?
Perhaps a more direct expression would be preferable in a vacuum, but if the language does not provide the required feature and no code in the wild uses such a thing then trying to match commonly used idioms is not the worst alternative you can choose. Making a compiler extension is another option but that has its own tradeoffs as well.

and, are undefined behaviour worth leaving undefined: now that we take matters of "safety" more seriously†, what benefits does undefined behaviour still have?

As with many things, it's a question of tradeoffs and how they apply to your particular use case. At a high level the more UB you have the more freedom the compiler/optimizer/runtime has to ~~break your program~~do its thing, but it also increases the risk of bugs due to said UB. Inversely, if you eliminate "too much" UB you may preclude otherwise desirable/important optimizations.

Also consider that most (all?) otherwise "safe' languages technically still have UB due to them providing escape hatches of one sort or another for those cases where what the language/runtime provides isn't sufficient.