Duncan,
Thanks for the thoughtful response. Some follow up:
From: Duncan Sands [mailto:[email protected]]
Sent: Tuesday, April 17, 2012 11:53 AM
To: Harris, Kevin
Cc: [email protected]
Subject: Re: [LLVMdev] Representing -ffast-math at the IR level
Hi Kevin,
1. Most compiler and back-end control of floating point behavior appears to be
motivated by controlling the loss or gain of a few low bits of precision on
a whole module scale. In fact, these concerns are usually insignificant for
programmers of floating-point intensive applications. The input to most
floating point computations have far lower significance than the
computations themselves, and therefore they have precision to burn. So the
vast majority of such app developers would happily trade precision for
performance, even as the default behavior. However, the place where trouble
DOES occur is with overflow and underflow behavior at critical points.
Changing the order of operations, or combining operations, can cause
overflows or underflows to occur that wouldn't otherwise occur, and vice
versa.
for the moment I'm distinguishing (mentally) between transformations that
introduce a uniformly bounded relative error, for example x+0 -> x, or
x/constant -> x * (1/constant) if constant and 1/constant are normal (and
not denormal), and those that can introduce an unbounded relative error.
Reassociation is an example of a transformation that can introduce unbounded
relative error, for example (1 + epsilon) - 1 -> 0 if epsilon is small enough,
while (1 - 1) + epsilon -> epsilon. I'm basically assuming that everyone is
happy with the transforms that introduce a bounded relative error - it sounds
to me like this is the distinction that you are making too. Transforms that
introduce unbounded relative error (like reassocation) are a can of worms, and
I'm not sure how best to handle them. So for the moment I'm not planning to
handle them, just gather ideas and discuss.
This is a reasonable distinction. How you could enforce it across the various
optimization passes is not obvious. Loss of precision problems are difficult
to diagnose even when strong fp correctness goals and methods are in place.
Sometimes this is beneficial, but it is almost always unexpected.
Underflows may sound less important in this regard, but they can be worse
than overflows, because they can mostly or completely eliminate the
significant bits, in complete silence, leaving the entire computation
worthless. Much of numerical analysis, especially in writing floating point
library functions, concerns the precise control of overflow and loss of
significance in specific operations. To the extent that optimizations which
make such control difficult or impossible, can render the use of a compiler
or backend unusable for that purpose.
2. While the use of metadata for control of LLVM behavior is attractive for its
simplicity and power, the philosophy that it can be safely ignored or even
removed in some optimization passes would seem to doom its effectiveness for
controlling floating point optimizations. For anyone trying to use source
language and compiler option mechanisms to control for fp overflow and
underflow, this approach would seem ill conceived.
I think there may be a misunderstanding here. True, the design of metadata is
that it is not wrong to drop it. However the compiler isn't trying to drop it,
it tries hard not to drop it: any cases of pointlessly dropped metadata are a
bug. In this fpmath metadata is analogous to tbaa (type based alias analysis
metadata): if it is dropped you get conservatively correct results, but some
optimizations are missed. Compiler writers don't like missing optimizations!
If you see any cases of fpmath metadata being dropped then please report it.
Yes, I (and others, obviously) have been confused before about the extent to
which metadata can be ignored or dropped. I think its use for providing
additional information that allows optimizations that would otherwise be
invalid is well motivated and reasonably straightforward. And your proposal
doesn't change that usage. Any attempt to provide tighter restrictions for fp
optimizations, however, would seem to muddy the situation, since it
would violate the basic assumption that the undecorated IR is the "most
conservative".
For the purpose of
providing a Front-End developer with a powerful platform for supporting
fp-intensive programming,
Let me just say up front that it is not clear to me that this is a goal of LLVM.
I realize that good fp precision and control is a fairly specialized niche, esp.
for an open source compiler. This is the main reason why I hesitated to comment
initially. I didn't even necessarily mean to inject any additional goals in this
space. But since you had made an effort in this direction, and generated some
thoughtful discussion already, I wanted to alert the community to some practical
issues they might not have considered.
the primary requirement is that the Front-end
should be able to precisely control optimizations that can change the fp
intermediate results under all optimization levels for each individual fp
operation specified in the IR. The vast majority of such usage can and
should chosen to default to high performance behavior. But it should be
possible for the front-end to precisely control IR re-ordering, operation
combining (including exploitation of mul-add hardware support), and
reactions to overflow and underflow conditions (using the exception handling
conditions and underlying the hardware support). By providing this power in
the IR, it allows a Front-end developer to reliably support source language
mechanisms (e.g. use of parentheses) and front-end recognized compiler
options (e.g. for fp exception handling) to respond to the needs of the
source language programmer for fp-intensive applications.
Given that LLVM doesn't even properly support rounding modes, I think you are
going to have to wait a few years at least before we are anywhere near something
like this. That said, we'd get there sooner (assuming we actually want to go
there) if you help - patches welcome!
Point taken. I definitely have fantasies in this area, but won't likely have
extra cycles to devote to this area in the near future. 
Regarding rounding modes specifically, in spite of the hardware support for these,
I think they are an even more specialized area than controlling for overflow /
underflow. They are almost never useful outside the context of fp library function
authorship, and there are several commercial compilers that support library
development adequately.
It should be possible to define one or more attribute flags for FP operations in
the IR with semantics that guarantee allowance or suppression of optimizations
that might create or eliminate overflow, underflow, or significant precision
loss. The implementation of such semantics in the existing optimization passes
might take a fair amount of work, I admit. But that is exactly what Front-End
developers and their source language programmers would most benefit from.
I'm pretty sure that building lots of flags into floating point operations is
not going to fly at this stage. Metadata allows us to grow lots of flags if we
want without much impact on the compiler. Once the metadata approach has
matured and shown its usefulness or limitations then we can consider baking
things into the IR or other such approaches. But that's a long way off.
The role of metadata as a prototyping vehicle is clear, and may indeed continue
to be useful in this space. Clarifying the role of metadata in cases where it
would restrict optimizations rather than permit them would seem to be a step in
the right direction.
-Kevin