[cairo] Pushing vmx patches in pixman

Tue Apr 22 16:26:42 PDT 2008

Luca Barbato <lu_zero at gentoo.org> writes:

> > In addition to mailing the patches, it's also convenient if you push
> > your git branch to a public place. And this is really required before
> > you push directly to cairo or pixman repositories so we can ensure
> > your commits look exactly the way they should.
> >
> 
> I published the branch as you suggested and here is the current diff
> from master.

I don't have a PowerPC, so I can't test this myself. The minimum
amount of testing that needs to be done is:

        (a) An X server linked to pixman works and displays a GNOME
        desktop correctly, specifically without artefacts in the icon
        rendering.

        (b) Rendercheck run against the X server generally
        passes. There are a number of options to this program. The
        most important formats to test are 16 and 24 bit RGB formats,
        and 8 bit alpha formats. The most important operations are
        BLEND, OVER, ADD, IN, CLEAR.

        (c) The cairo test suite passes when run like this: 

                env CAIRO_TEST_TARGET=image make test

It would also be good to see cairo-perf results to verify that this is
actually a speed-up.

Below are some comments on the code, but I don't know the VMX
instruction set, so there is a limit to how much I can comment.

One general thing I noticed is that the code uses 'inline' without any
qualifiers. In my experience this causes gcc to not always inline the
functions. 

In the MMX code we have this:

#ifdef __GNUC__
#    ifdef __ICC
#        define MC(x)  M64(c.mmx_##x)
#    else
#        define MC(x) ((__m64)c.mmx_##x)
#    endif
#    define inline __inline__ __attribute__ ((__always_inline__))
#endif

Maybe the Altivec code needs it too? If you did this, maybe the
LOAD_VECTORS() etc. macros could become functions so you didn't have
to declare local variables everywhere you use them?

This:

> +#ifdef USE_VMX
> +
> +       if (pixman_have_vmx())
> +           info = get_fast_path (vmx_fast_paths, op, pSrc, pMask,
> pDst, pixbuf);
> +       if (!info)
> +#endif

annoys me because the "if (!info)"'s magically apply to a following if
statement.

Can we just do this instead:

        #ifdef USE_VMX
        if (!info && pixman_have_vmx())
                info = get_fast_path (...);
        #endif

for all the architecture specific clauses? (I know this was not
introduced by you, but I think we should get this fixed before it
spreads further).

> +static sigjmp_buf jmp;
> +static volatile sig_atomic_t in_test = 0;
> +
> +static void vmx_test (int sig) {
> +    if (! in_test) {
> +        signal (sig, SIG_DFL);
> +        raise (sig);
> +    }
> +    in_test = 0;
> +    siglongjmp (jmp, 1);
> +}
> +
> +pixman_bool_t pixman_have_vmx (void) {
> +    signal (SIGILL, vmx_test);
> +    if (sigsetjmp (jmp, 1)) {
> +        signal (SIGILL, SIG_DFL);
> +    } else {
> +        in_test = 1;
> +        asm volatile ( "vor 0, 0, 0" );
> +        signal (SIGILL, SIG_DFL);
> +        return 1;
> +    }
> +    return 0;
> +}

This strikes me a overly complicated. Is there any reason we can't
just have a global variable that gets set in the signal handler, so we
don't have to think about longjumps?  Also, I really don't think
raising a signal on every compositing operation is a good idea.

Can we do something simple like having a couple of global variables:

        pixman_bool_t initialized;
        pixman_bool_t have_vmx = TRUE;

then do

        if (!initialized) {
                signal (SIGILL, vmx_sigill_handler);
                asm volatile ("vor 0, 0, 0");
                signal (SIGILL, SIG_DFL);

                initialized = TRUE;
        }

        return have_vmx;

and have the vmx_sigill_handller() just set the have_vmx variable to
FALSE?

I also think the same thing should be done in the __APPLE__ branch
since a system call per compositing operation is likely going to be
pretty slow.

> +    VMX = 0x4,

I don't think you use this VMX constant anywhere, and the CPUFeatures
enum is more intended for x86 CPU's anyway. In any case it certainly
shouldn't alias with SSE ...

> #define LOAD_VECTOR (source) \
>         tmp1 = (typeof(v ## source))vec_ld(0, source); \
>         tmp2 = (typeof(v ## source))vec_ld(15, source); \
>         v ## source = (typeof(v ## source)) \
>                        vec_perm(tmp1, tmp2, source ## _mask);

This macro is not used at all, and it wouldn't work if it were because
of the space between LOAD_VECTOR and (source).

Finally, you have cutted and pasted all the fbMulAdd macros. I can see
why you did that, but it would be better if we can make the combine.pl
generate them in their own headerfiles (pixman-arithmetic{32,64}.h,
maybe) so we don't have to have this code duplicated.

Thanks,
Soren