[Pixman] [PATCH 06/12] vmx: implement fast path vmx_composite_over_n_8888_8888_ca

Wed Jul 15 06:11:37 PDT 2015

On Wed, Jul 15, 2015 at 4:06 PM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> On Wed, 15 Jul 2015 15:41:19 +0300
> Oded Gabbay <oded.gabbay at gmail.com> wrote:
>
>> >> > Or if we don't care about that, why?
>> >> I think that the speedups in this specific patch are more substantial
>> >> than the slowdowns. If it was the other way around, than I would have
>> >> removed this patch, like I did with another patch, which Siarhei
>> >> rejected because of it.
>> >
>> > But in theory, you should not get any slowdowns, right? Or did you
>> > actually expect that some things will slow down?
>> I'm not so sure I won't get any slowdowns. I guess it depends on the
>> size of the image and the amount of alignment that needs to be done
>> for that image
>>
>> e.g. if we have many small images, and for each image we need to do
>> unaligned operations first to make sure we are 16-bytes aligned, then
>> the unaligned operations may take more cycles then the cycles that are
>> saved from doing the vmx operations. There could be extreme cases,
>> where there is one vmx operation on aligned data and all the rest of
>> the operations are unaligned. Now, in the C implementation, you don't
>> care about unalignment, so you always work in 4 byte quanitites.
>>
>> Does that make sense ?
>
> Yeah, this kind of speculation is what I'm after. A plausible and
> acceptable reason why some things may slow down.
>
> I suppose I'm too new here to understand that without pointing it out.
>
> However, I think lowlevel-blt-bench's tests should be hitting some of
> those ugly tiny image cases, yet even the worst case there has +22%
> performance. Of course, it's possible that Cairo benchmarks happen to
> hit the ugly tiny images consistently badly, while llbb aims to cover
> all kinds of alignment equally.

Because llbb is a synthetic test, I would assume it has much less
alignment issues than "real-world" scenario, such as cairo benchmarks,
which are basically recorded traces of real application activity.

>
> It's all up to judgement, but I think one does need to at least ask the
> question "This is slightly odd... is something actually wrong?"
>
I agree, and that's why I'll add something like the above explanation
to the commit message.

Thanks,

   Oded

> I and Ben definitely did find something very strange about the
> Raspberry Pi 1 CPU.
>
>
> Thanks,
> pq