[Pixman] [PATCH 0/18] Add iterators to images

Andrea Canciani ranma42 at gmail.com
Sat Jan 15 03:47:29 PST 2011

On Fri, Jan 14, 2011 at 1:46 PM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
> On Wednesday 12 January 2011 12:46:39 Soeren Sandmann wrote:
>> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
>> > >  15 files changed, 743 insertions(+), 470 deletions(-)
>> >
>> > On the other hand, I'm not so happy about the implementation. The number
>> > of lines of code as well as the complexity has increased, and the
>> > performance seems to have dropped a little bit in some of the cases.
>> The performance numbers you posted seem very close to noise to me; in
>> some cases the new code is faster. Considering that this was measured
>> with the CPU specific fast paths turned off, most users won't actually
>> see even this small slowdown.
> Performance numbers for some of the tests look like they are more or less
> statistically relevant. It's not a big slowdown, but if such changes keep
> accumulating, then we got a problem.
>> > I really want to see some practical benefits which depend on these
>> > added iterators in the near future.
>> It isn't clear to me what specifically you are suggesting.
> More like I'm trying to figure out the purpose of all these changes and
> their
> place in the long term plans. If this code is only related to general
> implementation which just needs to be simple, clean and maintainable, then
> it
> is perfectly fine. If the iterators are also going to be used in the fast
> path
> functions eventually, then they are better to be also really fast.
>> One practical benefit of the patch set is that it fixes the bug where
>> destination properties aren't ignored as they should be. This bug
>> could probably be fixed some other way, but it doesn't seem to be
>> possible with separating source- and destination accessors in some
>> way. This separation is where most of the additional new lines are
>> coming from.
>> In fact, as far as I can tell, the iterators make the code _less_
>> complex. Compare the new loop in pixman-general.c:
>>     for (i = 0; i < height; ++i)
>>     {
>>         uint32_t *s, *m, *d;
>>         m = mask_iter.get_scanline (&mask_iter, NULL);
>>         s = src_iter.get_scanline (&src_iter, m);
>>         d = dest_iter.get_scanline (&dest_iter, NULL);
>>         compose (imp->toplevel, op, d, s, m, width);
>>         dest_iter.write_back (&dest_iter);
>>     }
>> with the old one:
>>     for (i = 0; i < height; ++i)
>>     {
>>         /* fill first half of scanline with source */
>>         if (fetch_src)
>>         {
>>             if (fetch_mask)
>>             {
>>                 /* fetch mask before source so that fetching of
>>                    source can be optimized */
>>                 fetch_mask (mask, mask_x, mask_y + i,
>>                             width, (void *)mask_buffer, 0);
>>                 if (mask_class == SOURCE_IMAGE_CLASS_HORIZONTAL)
>>                     fetch_mask = NULL;
>>             }
>>             if (src_class == SOURCE_IMAGE_CLASS_HORIZONTAL)
>>             {
>>                 fetch_src (src, src_x, src_y + i,
>>                            width, (void *)src_buffer, 0);
>>                 fetch_src = NULL;
>>             }
>>             else
>>             {
>>                 fetch_src (src, src_x, src_y + i,
>>                            width, (void *)src_buffer, (void
>>             *)mask_buffer);
>>             }
>>         }
>>         else if (fetch_mask)
>>         {
>>             fetch_mask (mask, mask_x, mask_y + i,
>>                         width, (void *)mask_buffer, 0);
>>         }
>>         if (store)
>>         {
>>             /* fill dest into second half of scanline */
>>             if (fetch_dest)
>>             {
>>                 fetch_dest (dest, dest_x, dest_y + i,
>>                             width, (void *)dest_buffer, 0);
>>             }
>>             /* blend */
>>             compose (imp->toplevel, op,
>>                      (void *)dest_buffer,
>>                      (void *)src_buffer,
>>                      (void *)mask_buffer,
>>                      width);
>>             /* write back */
>>             store (&(dest->bits), dest_x, dest_y + i, width,
>>                    (void *)dest_buffer);
>>         }
>>         else
>>         {
>>             /* blend */
>>             compose (imp->toplevel, op,
>>                      bits + (dest_y + i) * stride + dest_x,
>>                      (void *)src_buffer, (void *)mask_buffer, width);
>>         }
>>     }
> The old code had more lines of code for this part, but on the other hand all
> this code was in a single place as opposed to being spread around. In some
> ways, the new code is a bit more limiting. For example, fetching directly to
> destination without intermediate buffers for SRC operator might be more
> tricky to implement.

I tried to implement this and it looks much easier than I had expected.
To support what you're suggesting I need to pass a buffer for each iteration
of the source, so I'm changing the interface a little:
 - get_scanline receives an iterator, a buffer pointer and a mask pointer
 - the iterator initializers return the flag mask they respected. They
are allowed
   to ignore some of the flags (on a per-flag basis, we can still specify that
   some other flags have to always be respected).
 - I add a new flag ITER_ON_DEST, If this flag is not set, get_scanline can use
   any destination buffer as long as it returns a pointer to it; if
this flag is set,
   it *must* write the data on it (and return a pointer to it as well)
 - The various src_iterators now return the normal flags unless they
are get_noop
   (or direct bits access, but we never do that) in which case we leave the
   memcpy'ing work to the compositing function by clearing the ON_DEST flag.
 - The caller of a get_scanline function should always pass iter->buffer as
   the buffer unless the ON_DEST flag is enabled. In this case it can pass any
   destination buffer.

I got this far:

The new flag can probably be used in a fast path for source compositing when
the mask is null (or opaque).

I also tried to implement the "parameter, then color" idea:

It is currently far from perfect (in particular the "-1" hardcoded to clear...),
but it seems to provide a small but consistent performance improvement.
I put some TODOs related both to performance and correctness in the
commit message. I forgot to mention that this would make it easy to
really compute 16 bits components (thus having true wide support,
instead of expanding the narrow result) by having two gradient walker
parameter->color functions.

> Overall, the patch set looks good. It's just not everything seems to be rosy
> to me, I can see (and accept) some of the compromises it introduces.

I've been playing around with the iterators4 branch and it looks very
promising to me, but I noticed two minor coding issues.

In 9f3e73a2f147265020466ee094abd868a77f52ed the new get_scanline
functions  {conical,radial,linear_gradient}_get_scanline_* are added.
I think that a more consistent naming would be better.
In the same commit, _pixman_solid_fill_iter_init () is added twice in


More information about the Pixman mailing list