[Openicc] Xorg low level buffers - colour conversion

Gerhard Fuernkranz nospam456 at gmx.de
Sat Mar 8 06:27:26 PST 2008


Tomas Carnecky wrote:
> The fragment shader (which is what is of interest here) is executed
> for every fragment (~pixel) separately. A simple (no-op) fragment
> shader looks like this:
>
> void main()
> {
>     gl_FragColor = gl_Color;
> }

... where I guess that gl_FragColor and gl_Color are both 3-element
vectors (R, G, B)?

Is gl_Color fixed to be a 3-element vector, or can gl_Color also be a
vector of different length (e.g. 4 for CMYK), which can then be
converted by the shader to RGB (while the image is sent directly in e.g.
CMYK color space to OpenGL)?

> AGP is not full-duplex and has a bandwidth of ~2GB/s (AGP 8x),
> PCI-Express is full-duplex and has a bandwidth of 4GB/s (each
> direction). My card which is on a PCI-Express x8 bus can transfer
> ~1GB/s of raw data. If you use some of the OpenGL extensions to do
> that asynchronously you can gain a bit speed if you're processing lots
> of different images sequentially (say, video).
> Processing 42MB of data within less then one second is very much possible.
>
> Btw, these transformation engines in the CM systems, do they use
> mmx/sse or are the routines otherwise optimized in assembler? Or is it
> all written in C?

The particular engine (IMDI) I was referring to as an example, is C code
(created by a generator program), no assembler and simd (mmx/sse)
instructions. So the speed certainly depends on the optimization
qualities of the compiler as well. Nevertheless this engine _is_ pretty
fast, particularly for 8-bit transformations. The LCMS engine has some
assembler code (at least if compiled under Windows), but I think to
remember that Marti mentioned that he even wants to get rid of the
assembler code, since today's compilers already do a good job anyway. I
don't know, what commercial engines are doing and using. And the example
numbers I mentioned are single-threaded; i.e. on a dual or quad core
cpu, a larger image could be possibly broken into chunks so that the
chunks can be processed in parallel by separate threads, which might
again multiply the throughput.

One more question, if one does not want to use the complete rendering
pipeline, but if one wants to do only the color transformation on the
GPU (i.e. send image data to GPU, do the transformation, and copy the
data back), is it still possible to do this via OpenGL and
GPU-independent shader programs? Or is proprietary GPU programming
necessary then?

Thanks,
Gerhard



More information about the openicc mailing list