[Openicc] Xorg low level buffers - colour conversion

Gerhard Fuernkranz nospam456 at gmx.de
Fri Mar 7 18:37:44 PST 2008


Tomas Carnecky wrote:
> If the color conversion is just a matrix transformation, then that can be very easily done in a few lines of a shader program. However if it involves lookup tables and such additional data then the shader becomes a bit more complicated.

IMO the general case (apply a device link) typically involves
(tetrahedral) interpolation of multi-dimensional lookup tables (rather
big ones, with a magnitude of say 10000..100000 table entries), and only
special cases can be handled in a simpler way (e.g. TRC -> matrix ->
TRC), though I'm not sure whether this simpler computation will be
really much faster eventually.

Btw, can GPUs only do massive parallel floating point operations, or
also massive parallel integer operations (which is of course only of
interest if the latter are even faster than the FP operations then)?

Sorry for my ignorance, I'm also wondering, does one just need to write
the shader program for the color transformation of a single pixel, and
this program gets then vectorized and applied to each pixel
automatically (and in parallel) by OpenGL and the GPU?

> If you want to have the shader running on the GPU, you first have to 
> upload the data to the graphics card memory, then run the shader, and 
> then copy the result back to RAM. That adds some delay, especially the 
> reading back to system RAM, which is slow on the AGP bus (much faster
> on PCI-Express).

Even if reading back is not so fast, I'm wondering, whether processing a complete image by the GPU may be possibly still faster than doing the multi-dimensional interpolation for each pixel with the CPU? (For comparison, for 8-bit 3D color transformations I get about 14 Mpixel/s with Argyll's IMDI routines on my Mobile AMD Athlon(tm) 64 4000+, and for 16-bit 3D transformations it's about 3 Mpixel/s -- and the IMDI routines are certainly pretty fast integer interpolation routines (they don't use SIMD instructions, though)). In order to beat the 14 Mpixel/s it would be necesary to copy 3*14=42 Mbyte to the graphics card and to copy the same amount of data back in less than one second (since the interpolation takes some GPU time too). Is this reasonable? (And for the 16-bit transform we'd only need to beat 3 Mpixel/s, which implies copying 6*3=18 Mbyte/s forth and back, + interpolation on the GPU)

Regards,
Gerhard




More information about the openicc mailing list