[Pixman] [PATCH 2/2] sse2, mmx: Remove initial unaligned loops in fetchers
siarhei.siamashka at gmail.com
Wed Aug 28 19:13:16 PDT 2013
On Thu, 29 Aug 2013 01:27:08 +0200
sandmann at cs.au.dk (Søren Sandmann) wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> > On Wed, 28 Aug 2013 16:01:27 -0400
> > Søren Sandmann <sandmann at cs.au.dk> wrote:
> >> From: Søren Sandmann Pedersen <ssp at redhat.com>
> >> Now that the general implementation guarantees that the iter buffers
> >> are aligned to 16 bytes, there is no longer any reason for the initial
> >> loop to bring the destination buffer up to an aligned position.
> > What about the direct fetch to destination optimization?
> With this patch, there is a new rule that "iterators may assume that
> iter->buffer is aligned", but the old rule that "the buffer returned by
> the iterator can be any alignment that the iterator wants" is still in
If the iterator can't work with the 4 byte aligned buffers, then it has
to work with an intermediate temporary buffer. There is nothing wrong
with this, but the performance is not going to be great in some cases.
> Since the direct fetch to destination optimization works by iterators
> returning a pointer directly into the image bits and since the combiners
> are still not allowed to make assumptions about the alignment of the
> buffers they are given, this still works.
Right, the combiners are working just fine directly with the destination
> There has been some talk in the past  about have an even-more-direct
> optimization, where a pointer to the destination image would be stored
> in iter->buffer so that SRC operations could be done by just running the
> source iterator.
Oops, I somehow assumed that it has been implemented already. Now I
see that this is not the case. It's always nice to have some room for
> With this new alignment assumption, such an optimization becomes even more
Implementing this optimization does not seem to be too difficult in
principle. I tried to hack a bit and here is the result:
Basically, we want to be able to update the pointer to the iterator
output buffer at any time in the middle of loop instead of setting it
in stone at the time of iterator creation.
Updating this temporary buffer pointer could be added as a new iterator
method call (extending the existing get_scanline/write_back/fini). But
then supporting "only compatible with the aligned buffer" iterators
would be a little bit more tricky. Either way, the required code
modifications seem to be rather small.
> but in any it was always going to require a rather complex internal API
> to work because iterators are not required to use iter->buffer at all.
Well, iter->buffer (or iter->direct_dest_buffer_ptr) is more like a
hint. If the iterator uses it, that's great. If it does not, then we
still can compare the returned pointer with the pointer to the current
destination image scanline. And fallback to using a combiner if the
pointers don't match.
The first patch in your series (aligning the temporary buffer) is fine.
It's just that the second patch is dropping the handling of unaligned
pointers in the fetchers, and this code still could have some use:
More information about the Pixman