[cairo] Speeding up _cairo_fixed_from_double
daniel.amelang at gmail.com
Thu Oct 26 13:26:55 PDT 2006
[Seems that this little function deserves its own thread]
Current implementation is slow on Intel PPro and ARM (-VFP) for
Intel: Most (all?) floor implementations (including the mathinline
version) on most (all?) platforms involve the dreaded fldcw
instruction, which is dreadfully slow on PPro arch.
ARM: Since FP is emulated in software, and since the current
implementation uses a number of FP instructions, this function is also
dreadfully slow on ARM.
"Magic number" solution
As pointed out by various people involved with cairo, there is a trick
that takes advantage of the IEEE754 FP standard that achieves what we
need in a much faster way than the current implementation. There are
at least three patches floating around that do pretty much the same
thing, with the most notable difference being the most "correct" ones
use a union (instead of a cast), one advantage of which is that it
ensures correct behavior even when strict aliasing is enabled (e.g.
missing the -fno-strict-aliasing flag in GCC).
See here for one of the "correct" versions:
See here for some theory behind the magic trick:
The "magic number" solution is great, but it has its downsides.
1) It assumes that the platform uses a IEEE754 double representation.
I don't know of any platforms that we care about that doesn't. The VFP
for ARM is said to be "near" compliant, but I think that that refers
to the arithmetic operations, not the binary representation, so we
should be OK here. The autoconf doesn't even propose a flag/check for
this (they say it's safe to assume this on modern systems). I,
personally don't think that we need any type of check for this, but
Carl disagrees, so I guess a quick configure-time random sample (i.e.
just check one double value) is probably good enough. So we can
introduce a new flag (DOUBLES_ARE_IEEE754 or something?) and a compile
time test and we'll have overcome this issue.
It's important to note that IEEE754 conformance is the _only_ check
necessary to know if this trick works. I get the impression from
previous discussions that Carl (and others?) thought that this trick
depended on more than that, and thus needed a configure-time check.
If you're interested, there's a ad-hoc way of detecting IEEE754 from
other defined symbols (or by using a C99 symbol) found here:
2) It needs to be _word_ endian aware. You'd think that a
WORDS_BIGENDIAN check would be enough to solve this one, but I'd like
to raise a warning flag before this is considered solved.
WORDS_BIGENDIAN (according to the docs) refers to the byte order,
_not_ the word order. The magic number approach needs to take word
order into consideration, not byte order. And unfortunately, on some
platforms (including ARM), the word ordering can be different than the
byte ordering. I think we need something like the
__IEEE_BIG_ENDIAN/__IEEE_LITTLE_ENDIAN flags found in newlib's
ieeefp.h. So we need a new flag/check in cairo to pick this up.
Am I missing anything else that is blocking this patch?
More information about the cairo