[Pixman] [PATCH/RFC] Use OpenMP for bilinear scaled fast paths
sandmann at cs.au.dk
Mon Jun 25 11:29:04 PDT 2012
Chris Wilson <chris at chris-wilson.co.uk> writes:
> On Mon, 25 Jun 2012 02:00:27 +0300, Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:
>> Does it actually make sense? I remember somebody was strongly opposing
>> the idea of spawning threads in pixman in the past, but can't find
>> this e-mail right now.
You may be remembering an IRC discussion about it, where Joonas was
opposed to libraries spawning threads:
> The only caveat from my point of view is that pixman_image_composite()
> must be atomic as the current cairo_image_surface_t is meant to be
> synchronous. Or at least API added so that I can serialise the
The main concern from me is making sure that it doesn't cause issues in
the X server, which is known to do wacky things with signals and
possibly threads. But the answer to that is to just put it in and get it
> operations within cairo_image_surface_t. In the past, I believe we've
> suggested grander schemes that that would require us to expose the
> asynchronous nature to the user. However, simply using OpenMP to
> parallise the kernels should not leak across the interface and so it is
> acceptable. So it just boils down to whether this make maintenance
> harder and interferes with future plans...
At some point, I think grander schemes will be useful, where grander
scheme might mean rolling our own thread pool and/or adding an
asynchronous API to pixman.
One case is radial gradients. These are generated through iterators, and
I am not sure that OpenMP is up to the task of parallelizing those. That
is, it doesn't seem likely that OpenMP can deal with code like this:
iter_init (&src_iter, height);
iter_init (&dest_iter, height);
for (i = 0; i < height; ++i)
But that doesn't mean that OpenMP can't be used for the tings that it
will deal with.
> Is there a way to hint to OpenMP how many threads to use? As we know the
> memory characteristics for most of the routines, do we not want to hint to
> OMP not to use more threads than required to saturate memory bw?
We know the memory characteristics, but the arithmetic characteristics
are less predictable. If some operation is doing a lot of arithmetic, we
want more threads for it.
What would be the performance impact of just parallelizing as much as
possible? I suppose if one thread can saturate the memory bandwidth,
having more threads would just pointlessly occopy more cores that could
be used for other purposes. I don't know how much of a concern that
actually is though.
I suppose a JIT compiler might be able to make an estimate of the number
of cycles per cache line accessed for the code it generated.
> Otherwise it's a big win for such a tiny patch! Just need to cross-check
> that we don't introduce regression on the older single-core no-cache
> chips. :(
Even if it is a small performance regression on single-core chips, I
still think it's worth it. Single-core chips are quickly becoming a
thing of the past, and we could offer a --disable-omp configure argument
for embedded systems where the CPU is known to be single-core ahead of
More information about the Pixman