[Pixman] [PATCH 1/1] pixman: Fixing endianness problems for powerpc archs

Siarhei Siamashka siarhei.siamashka at gmail.com
Tue May 26 20:05:58 PDT 2015


On Mon, 25 May 2015 21:43:39 -0400
Fernando Seiti Furusato <ferseiti at linux.vnet.ibm.com> wrote:

> I have made some changes to the file pixman-vmx.c, which uses vmx (aka altivec)
> to optimize pixman. Basically, what I did:
>  Changed the usage of vec_perm, vec_mergeh and vec_mergel. They were giving
> weird results when running on little endian. That was because the integer
> vectors were being cast to char, which made them be ordered and permuted byte
> by byte.
>  Replaced usage of vec_lvsl to direct unaligned assignment operation (=). That
> is because, according to Power ABI Specification, the usage of lvsl is
> deprecated on ppc64le.
> After those changes, all tests passed on both ppc64le and powerpc vms.
> 
> Signed-off-by: Fernando Seiti Furusato <ferseiti at linux.vnet.ibm.com>

Thanks for investigating the problem and proposing a patch. Looks like
the needed changes are not very invasive and this is a good thing.

The commit summary does not need the "pixman: " prefix. It would make
sense to use the "vmx: " prefix instead. Also explicitly stating in
the summary line that the problems are related to little endian powerpc 
would make it more descriptive.

Regarding the code itself, please check some comments below. If they
are addressed, I think that the fix will be good enough to be applied.

> ---
>  pixman/pixman-vmx.c | 61 +++++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 45 insertions(+), 16 deletions(-)
> 
> diff --git a/pixman/pixman-vmx.c b/pixman/pixman-vmx.c
> index c33631c..109104f 100644
> --- a/pixman/pixman-vmx.c
> +++ b/pixman/pixman-vmx.c
> @@ -34,13 +34,25 @@
>  
>  #define AVV(x...) {x}
>  
> +#ifndef __LITTLE_ENDIAN__

For the sake of consistency with the rest of the pixman code, can
we use "#ifdef WORDS_BIGENDIAN"?

> +#define MERGEH(a,b) vec_mergeh(a,b)
> +#define MERGEL(a,b) vec_mergel(a,b)
> +#define SPLAT_VEC AVV(						\
> +		0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x04,	\
> +		0x08, 0x08, 0x08, 0x08, 0x0C, 0x0C, 0x0C, 0x0C)
> +#else
> +#define MERGEH(a,b) vec_mergeh(b,a)
> +#define MERGEL(a,b) vec_mergel(b,a)

I'm not sure if overloading the 'mergeh'/'mergel' primitives
with a different meaning is a good idea. I looks a bit like
a "#define TRUE FALSE" approach, which may do the job but still
be confusing for the people trying to read the code (yes, I'm
exaggerating here a bit).

> +#define SPLAT_VEC AVV(             				\
> +		0x03, 0x03, 0x03, 0x03, 0x07, 0x07, 0x07, 0x07,	\
> +		0x0B, 0x0B, 0x0B, 0x0B, 0x0F, 0x0F, 0x0F, 0x0F)
> +#endif
> +
>  static force_inline vector unsigned int
>  splat_alpha (vector unsigned int pix)
>  {
>      return vec_perm (pix, pix,
> -		     (vector unsigned char)AVV (
> -			 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x04, 0x04,
> -			 0x08, 0x08, 0x08, 0x08, 0x0C, 0x0C, 0x0C, 0x0C));
> +		     (vector unsigned char)SPLAT_VEC);
>  }
>  
>  static force_inline vector unsigned int
> @@ -50,11 +62,11 @@ pix_multiply (vector unsigned int p, vector unsigned int a)
>  
>      /* unpack to short */
>      hi = (vector unsigned short)
> -	vec_mergeh ((vector unsigned char)AVV (0),
> +	MERGEH ((vector unsigned char)AVV (0),
>  		    (vector unsigned char)p);

This part of the code takes 8-bit elements from the high half of a
vector register, extends them to 16-bit and stores the result to
another vector register.

Can we introduce a new static force_inline function specifically for
this primitive operation and take care of the endian dependent stuff
there without introducing the MERGEH macro?

>      mod = (vector unsigned short)
> -	vec_mergeh ((vector unsigned char)AVV (0),
> +	MERGEH ((vector unsigned char)AVV (0),
>  		    (vector unsigned char)a);
>
>
>      hi = vec_mladd (hi, mod, (vector unsigned short)
> @@ -67,10 +79,10 @@ pix_multiply (vector unsigned int p, vector unsigned int a)
>  
>      /* unpack to short */
>      lo = (vector unsigned short)
> -	vec_mergel ((vector unsigned char)AVV (0),
> +	MERGEL ((vector unsigned char)AVV (0),
>  		    (vector unsigned char)p);
>      mod = (vector unsigned short)
> -	vec_mergel ((vector unsigned char)AVV (0),
> +	MERGEL ((vector unsigned char)AVV (0),
>  		    (vector unsigned char)a);

And do the same for MERGEL.

>      lo = vec_mladd (lo, mod, (vector unsigned short)
> @@ -125,25 +137,31 @@ over (vector unsigned int src,
>  }
>  
>  /* in == pix_multiply */
> +#ifndef __LITTLE_ENDIAN__
> +#define MASK_SHIFT(pointer) vec_lvsl(0,pointer)
> +#else
> +#define MASK_SHIFT(pointer) *((typeof(pointer ## _mask)*)pointer)

The big endian MASK_SHIFT variant is calculating the permutation
pattern for doing unaligned reads. And the result depends on the
alignment of the 'pointer' argument relative to 16 byte boundaries
in memory.

But this little endian MASK_SHIFT is just performing a memory read
instead, which does not make much sense.

From what I can see, the little endian altivec just can do unaligned
reads natively and have no need for constructing the permutation
pattern variables at all. The COMPUTE_SHIFT_MASK macros should be
just no-ops there because the little endian LOAD_VECTORS macros do
not use the 'vec_perm' intrinsic.

Could you please remove these redundant bits from the little endian
code path and also add a comment explaining the unaligned vector
reads handling differences between big endian and little endian
powerpc variants?

> +#endif
> +
>  #define in_over(src, srca, mask, dest)					\
>      over (pix_multiply (src, mask),					\
>            pix_multiply (srca, mask), dest)
>  
> -
>  #define COMPUTE_SHIFT_MASK(source)					\
> -    source ## _mask = vec_lvsl (0, source);
> +    source ## _mask = MASK_SHIFT(source);
>  
>  #define COMPUTE_SHIFT_MASKS(dest, source)				\
> -    source ## _mask = vec_lvsl (0, source);
> +    source ## _mask = MASK_SHIFT(source);
>  
>  #define COMPUTE_SHIFT_MASKC(dest, source, mask)				\
> -    mask ## _mask = vec_lvsl (0, mask);					\
> -    source ## _mask = vec_lvsl (0, source);
> +    mask ## _mask = MASK_SHIFT(mask);					\
> +    source ## _mask = MASK_SHIFT(source);
>  
>  /* notice you have to declare temp vars...
>   * Note: tmp3 and tmp4 must remain untouched!
>   */
>  
> +#ifndef __LITTLE_ENDIAN__
>  #define LOAD_VECTORS(dest, source)			  \
>      tmp1 = (typeof(tmp1))vec_ld (0, source);		  \
>      tmp2 = (typeof(tmp2))vec_ld (15, source);		  \
> @@ -162,11 +180,22 @@ over (vector unsigned int src,
>      v ## mask = (typeof(v ## mask))			  \
>  	vec_perm (tmp1, tmp2, mask ## _mask);
>  
> -#define LOAD_VECTORSM(dest, source, mask)				\
> -    LOAD_VECTORSC (dest, source, mask)					\
> -    v ## source = pix_multiply (v ## source,				\
> -                                splat_alpha (v ## mask));
> +#else
> +#define LOAD_VECTORS(dest, source)                        \
> +    v ## source = *((typeof(v ## source)*)source);        \
> +    v ## dest = *((typeof(v ## dest)*)dest);
> +
> +#define LOAD_VECTORSC(dest, source, mask)                 \
> +    v ## source = *((typeof(v ## source)*)source);        \
> +    v ## dest = *((typeof(v ## dest)*)dest);              \
> +    v ## mask = *((typeof(v ## mask)*)mask);
>  
> +#endif
> +
> +#define LOAD_VECTORSM(dest, source, mask)                               \
> +    LOAD_VECTORSC (dest, source, mask)                                  \
> +    v ## source = pix_multiply (v ## source,                            \
> +                                splat_alpha (v ## mask));
>  #define STORE_VECTOR(dest)						\
>      vec_st ((vector unsigned int) v ## dest, 0, dest);
>  

-- 
Best regards,
Siarhei Siamashka


More information about the Pixman mailing list