[Pixman] [PATCH 0/4] Meet the FPU-based implementation of the core pixel pipeline

Tue Sep 21 19:43:53 PDT 2010

>> I think it's worth mentioning explicitly that this patch series isn't
>> the whole implementation, which we have here. I'm going to post more
>> code in the coming days. However, following the "release early,
>> release often"-motto, I decided to share now that part that is ready
>> at this moment.

> There is already a 64bit pipeline that uses a16r16g16b16 intermediate
> pixels; it is used whenever the 10bpc formats are involved. However,
> it is also somewhat neglected in that transformations and gradients
> don't use it, and it is somewhat slow. If we are going to have a
> floating point pipeline, then it's pretty tempting to get rid of the
> 64bit one and just use the floating point one instead.

That's certainly my impression.  Already the only combiners which are
faster in 64-bit than in floating-point are the CLEAR and unmasked SRC
operators, and that is only because they go via the format converters
unconditionally (or, if already FP, are memory bottlenecked) in the
floating-point version.

> You could argue that the the general implementation should simply be
> falling back to the floating point one for formats that the general
> one couldn't handle. Also, the more cracktastic operators such as the
> conjoint/disjoint ones, could be done in floating point, and the 32
> bit versions could be deleted.

While the combiner part of the pipeline isn't in the patch just
submitted, rest assured that it does exist - along with a "fastpath"
that puts it together with the fetchers.  :-)

Actually, I took the trouble to optimise some of the PDF operators
while converting them, with the intended side-effect of also making
them more readable.  While doing so, I realised that component-alpha
implementation of even the HSL filters was quite reasonable - just
calculate the resultant colour independently of the alpha, and mask
off the result in the same way as the other PDF operators.  This
neatly fills in a generality hole.

> So basically, I think it would be interesting to think of making
> floating point pipe as the new 'canonical' one, deleting the 64 bit
> one, and considering the 32 bit one a 'fast path' that can be taken in
> some cases. This is just what I happen to be thinking...

That would certainly be interesting.

> In any case, we'll almost certainly want to accelerate this pipeline
> not only with NEON, but also SSE and AVX, so regardless of how it
> eventually gets integrated, that's worth keeping in mind. To do this
> properly, we'll need to solve the problem of how to install CPU
> specific fetchers.

This may be a question of whether, on a specific CPU, using a
vector-int-to-float conversion is faster than the three or four table
lookups as implemented here.  A scalar int-to-float conversion is
almost certainly slower.

Either way, we will need to solve the "GCC is crap at optimising"
problem.  One potential solution is to use LLVM directly, as opposed
to trying to use the various C front-ends assocated with it -
remembering that C is useless at expressing vector operations in a way
that is accessible to optimisers.

> Another thing to think about with floating point pixels is to perhaps do math in *linear* floating point. This is values where multiplying by N makes it N times brighter, no matter what the color. This requires gamma correction when converting the output to the screen.

An interesting idea, and certainly easier to do in floating-point than
in fixed-point, but...

I think the reason why fonts would look "thin" is that the coverage
computation for glyph rasterising is done as though the mask were in
linear space, while you seem to assume that the mask would also be
gamma corrected.  So for correct results with current font
rasterisers, the colour source and destination could be gamma
corrected but the mask would have to remain linear.  At present
there's no way to express this, so gamma correction would have to be
reqested explicitly somehow - perhaps in the same way that
component-alpha masking is presently requested.

 - Jonathan Morton