[Pixman] pixman general_composite_rect
Soeren Sandmann
sandmann at daimi.au.dk
Fri Nov 12 07:48:28 PST 2010
kb pachauri <kb.pachauri at gmail.com> writes:
> I am working for developing pixman with OpenCL (only compositing function),
> and made it working, but performance is very very very slow..
>
> Main reason for slow performance is basically doing compositing line by
> line..
> 1) i think launching kernel per scan-line is too much overhead,
> 2) i think copying memory in scaline ways is also overhead.. (from host to
> device) [general recommendation in gpgpu computing is copy all the data once
> do the the task..]
If you have to copy data back and forth between host and device all
the time, that is certainly going to ruin any peformance advantage
from doing the processing on the GPU. It may be interesting instead to
look into keeping pixman_images in device memory at all time, and only
map them into host memory when someone explicitly asks for it.
This will require some redesign of the pixman API, but we have known
for a long time that this was coming.
> 3) also kernel are not optimized much (kernels are just combine function
> which are there in pixman_composite_32.c)
>
> after that i changed to launch the kernel for whole rectangle
> (width*height), basically remove the height for loop in
> general_composite_rect.. .
> (i.e. get all the src, dest, mask data for widht & height (whole rectangular
> area), do the compositing)
>
> now performance is much better than my last attempt.. (still not very
> promising though)
>
> how can i handle case where store var is not null i.e fetc dest, composite
> line & store dest?
I know almost nothing about OpenCL, but in Open*GL*, a fragment shader
can't read from the render target, so unless the combine function can
be implemented through glBlendFunc, a copy would have to be made of
the destination before running the kernel.
Looking over the OpenCL spec, it seems that it has the same
restriction in that an image can't be both read and written. However,
that doesn't seem to apply to "buffers", so maybe such an object could
be used?
Soren
More information about the Pixman
mailing list