[Pixman] An iwmmxt optimised shadowUpdateRotate16_270YX, with problems.
dromede at gmail.com
Fri Aug 26 11:54:11 PDT 2011
On Fri, Aug 26, 2011 at 2:24 PM, Siarhei Siamashka <
siarhei.siamashka at gmail.com> wrote:
> On Fri, Aug 26, 2011 at 2:21 PM, Marko Katić <dromede at gmail.com> wrote:
> > Hi there!
> > I'm trying to optimise shadow copies from landscape oriented 16bit
> > to portrait 16bit fb proper. This is done in shadowUpdateRotate16_270YX
> > which is located in miext/shadow/shrotpackYX.h. My optimisation is aimed
> > pxa27x and xscale3 arm processors only since it uses iwmmxt asm code. I
> > guess it could easily be ported to x86 mmx code too.
> > As I understand, the current implementation copies a single pixel at a
> > like this:
> > *win = *sha++;
> > win += WINSTEPX(winStride);
> > This also means that we're stepping over entire cachelines since every
> > of a single shadowfb line has to be copied to a new line of fb proper.
> > My patch tries to copy 4x4 pixel blocks prerotated to portrait
> > Basically, it takes 4 lines of shadowfb and divides it into 4x4 blocks.
> > Then it rotates them and copies them to fb proper. This way, instead of
> > copying a single pixel per fb proper line, it copies four. The rotation
> > is
> > done in iwmmxt asm and takes about 0.9 instructions per pixel (assuming
> > 4x4 block is already in iwmmxt registers). 4x4 blocks imply that the
> > rectangle
> > to be copied is width and height aligned to 4 pixels. If not, the patch
> > reduces the rectangle to proper alignment with single pixel copies for
> > and height.
> > It doesn't really work and i can't find a reason why. The inital Xfbdev
> > screen is looking fine, but when i start moving the pointer or windows,
> > i get is garbage.
> > The patch was tested on kdrive 126.96.36.199 running in qemu and on a Zaurus
> > C-1000.
> > If anyone has any suggestions, please do tell.
> To get the best performance, it is important to take cache and TLB
> misses into account. The 4x4 pixel blocks may be a bit too small.
> I think it may be interesting to integrate iwmmxt optimizations into
> pixman and then use pixman for doing these rotations in xserver. Right
> now there is more or less cache friendly C implementation for rotation
> in pixman:
> Rotation is at least partially covered in the pixman test suite ('make
> check'), so detecting and fixing the most obvious bugs could be a bit
> easier than watching for image corruption on real use. Also the
> 'affine-test' program from the test suite can be used as an example of
> doing rotations on memory buffers with pixman:
> Best regards,
> Siarhei Siamashka
I chose 4x4 block size for my initial implementation. First i wanted to get
it working. The block size could easily be resized to 8x4, 8x8 or even 12x4
maybe. TLB misses could be completely avoided or at least cut in less than
half, i'm also working on an improved pxafb driver that splits the
framebuffer between internal sram (256K on a pxa270) and ram. The ram part
is placed in a section mapping, greatly reducing the number of TLB entries.
It may be possible to place the internal sram in a section mapping too, i
have to try this. If it's possible, the fb proper and the shadowfb would
only need 2 tlb entries.
Please understand that I could barely write my own name in C (or any other
language) less than a year ago. Pixman sources look very scary to me and i
dont think i'm up to the task of reimplementing or optimising anything in
pixman. Also, it seems to me that armv5te or older architectures aren't the
main targets for optimisations in pixman. My main objective is to create a
decent performing Kdrive for my Zaurus. To this end, i have written some
patches and also, adapted some of your own armv6 code which you wrote for
the Maemo Kdrive.
But i digress...
Could alignment be the problem with my patch? The shadowfb and fb proper
lines are probably not 64 bit aligned before calling iwmmxt_rotate_copy...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pixman