[Pixman] [PATCH 1/1] vmx: workarounds to fix powerpc little endian particularities

Fri May 29 06:04:38 PDT 2015

On 05/29/2015 08:45 AM, Siarhei Siamashka wrote:
> Sorry, I did not explain it properly earlier. I did not mean
> to replicate the whole 'pix_multiply' function in the little
> endian code path.
No worries. What I did felt odd indeed. Let us see :)

>
> The point is that we need to take care of some differences
> between big and little endian powerpc variants. And we want
> to isolate these differences in the smallest possible unit,
> while still keeping the code readable and maintainable.
>
> In your first patch, you treated the vec_mergel/vec_mergeh
> intrinsics as such smallest units and replaced them with the
> MERGEL/MERGEH macros. It works correctly in this context, but
> my complaint was that the code is not very clean this way. If
> the vec_mergel/vec_mergeh intrinsics were supposed to be endian
> agnostic and behave like the MERGEL/MERGEH macros, then I
> guess that the vec_mergel/vec_mergeh intrinsics would have
> been implemented this way in the first place. Redefining the
> meaning of the standard intrinsics looks ugly. Basically, the
> individual intrinsics are too small for using them for isolating
> big/little endian differences in a non-controversial way.
Got it. Yes, the problem is not actually what the intrinsics themselves do, but
how the vector settles when looking at it byte wise.
>
> Now in this new patch the whole 'pix_multiply' function is
> replicated. But it is too large. And duplicating a lot of
> code is not nice either.
>
> What I actually wanted to see was a new static force_inline function.
> This function can isolate the big/little endian differences, be
> reasonably small and do something meaningful by itself. For example,
> the SSE2 code has the 'unpack_128_2x128' function for this:
>
>      http://cgit.freedesktop.org/pixman/tree/pixman/pixman-sse2.c?id=pixman-0.32.6#n68
>
>      static force_inline void
>      unpack_128_2x128 (__m128i data, __m128i* data_lo, __m128i* data_hi)
>      {
>          *data_lo = _mm_unpacklo_epi8 (data, _mm_setzero_si128 ());
>          *data_hi = _mm_unpackhi_epi8 (data, _mm_setzero_si128 ());
>      }
>
> This function takes a single 128-bit vector with 8-bit elements as
> the input argument and produces two 128-bit vectors (low and high part)
> with 16-bit elements as the output. The code is simple and it is easy
> to see what it does.
>
> It would be great to have something similar in the powerpc code too.
> The 'force_inline' attribute ensures that the compiler inlines this
> function, so there is no call overhead. You can (and are also
> encouraged to) verify that the performance is not regressed by
> using the 'lowlevel-blt-bench' program in the 'test' directory.
Right, I will try to elaborate a solution to that point here.

Thanks for the review!
-- 

Fernando Seiti Furusato
IBM Linux Technology Center