[Pixman] [PATCH 2/2] ARM: Add 'neon_composite_over_n_8888_0565' fast path

Taekyun Kim podain77 at gmail.com
Wed Apr 13 02:11:59 PDT 2011


For a particular pixel block we should do following steps.
(for over_n_8888_0565 case)

1. fetch dest
2. fetch mask
3. combine_mask_ca
4. convert dest to x888
5. combine_over_ca part A
6. combine_over_ca part B
7. convert result to 0565
8. store result
(put cache preload somewhere)

Your version is the case with
head = (3, 4, 5)
tail = (6, 7)
tail_head = (6, 7, 3, 4, 5, 8) with 1, 2 is in the middle of block 6

We can figure out input/output/temp registers of each block.
So the dependency chain and critical path can be identified.

Let's see core tail_head block

.macro  n_8888_0565_ca_tail_head
    6. combine_over_ca part B
        vrshr.u16   q10, q6, #8
        vrshr.u16   q14, q7, #8
            1. fetch dest
        vrshr.u16   q15, q11, #8
        vraddhn.u16 d16, q10, q6
        vraddhn.u16 d17, q14, q7
        vraddhn.u16 d18, q15, q11
            2. fetch mask
        /* bubble if above block 2 does not exist */
        vqadd.u8    q8,  q0, q8
        /* bubble if above block 2 does not exist */
        vqadd.u8    d18, d2, d18
        /* bubble with following block 7 */

    7. convert result to 0565
        vshll.u8    q14, d18, #8
        vshll.u8    q10, d17, #8
        vshll.u8    q15, d16, #8
        vsri.u16    q14, q10, #5
        /* bubble */
        vsri.u16    q14, q15, #11

    cache_preload 8, 8
    3. combine_mask_ca
    4. convert dest to x888
    5. combine_over_ca part A
    8. store destination
.endm

I marked bubbles that I could find.
Here we can make step 3 independent(or less dependent) from above step 6 and
7 by proper allocation of registers.
So we can insert some instructions of step 3 into the above bubble
positions.
Output of step 1(fetch dest) will be read in step 4 and output of step
2(fetch mask) will be read in step 3.
So I think you can fetch mask first and then dest at the beginning of
tail_head block and remaining bubbles can be filled with instructions from
step 3.

Maybe this does not work, or there can be some other better ways to achieve
optimal performance.

-- 
Best Regards,
Taekyun Kim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/pixman/attachments/20110413/dd5d16ae/attachment.htm>


More information about the Pixman mailing list