[Pixman] [cairo] [PATCH] Added MIPS32R2 and MIPS DSP ASE optimized functions

Mon Nov 22 11:22:35 PST 2010

On Nov 21, 2010, at 9:47 AM, Siarhei Siamashka wrote:

> On Monday 13 September 2010 19:21:13 Georgi Beloev wrote:
>> On Sep 12, 2010, at 5:08 AM, Soeren Sandmann wrote:
>>> I'm hoping someone with more experience with MIPS than I have can give
>>> more detailed comments on the assembly.
>> 
>> Thanks for the all the comments! I'll update the code per your comments and
>> submit another patch. Should I mail it just to the pixman list or CC both
>> cairo and pixman?
> 
> Hi, any updates for your pixman MIPS optimization patches? Maybe some parts are
> more ready than the others and can be submitted separately?

Hi Siarhei,
This has been on the back burner for the past few weeks -- I was busy with other projects recently. I hope to get back to wrapping up this work soon.

> Since that time, I also got some MIPS based device for my home gadgets
> collection :) It's a router with a plain and boring "MIPS 24Kc V7.4" CPU which
> only supports MIPS32R2 ISA (without any DSP or SIMD extensions).

Congrats -- it's always fun to play with new gadgets :)

> Indeed, using PrepareForStore prefetch for the destination buffer is *very*
> useful, providing approximately 1.5x-2x improvement for memcpy-alike code
> (blits and composite operations) and approximately 3x performance
> improvement for memset-alike code (fills).
> 
> Because "Write-back with write allocation" caching policy seems to be used
> by default in mips linux, bulk linear stores of pixel data to memory are slow
> (data is first *read* into cache, gets modified there and is written back
> later). And PrepareForStore prefetch solves this problem, preventing
> unnecessary allocation. Though if the caching policy is changed to
> "Write-through, no write allocate", then PrepareForStore prefetch actually
> hurts performance, which is kind of sad. Nevertheless, the following
> (admittedly very old) post implies that only modes 2 and 3 are relevant for
> MIPS ("Uncached" and "Write-back with write allocation"):
> http://www.spinics.net/lists/mips/msg11750.html

Yes, using prefetch instructions is somewhat tricky. They often improve performance but always cost cycles (at least on simple scalar CPUs). My take is that their use should only be justified if you know enough about the target system to make sure that there will be a real benefit. Something like pixman can be used on any embedded system ranging from a simple PIC32 to a gigahertz 74K core and the results of using prefetch instruction will vary a lot.

> So what are the next plans? Realistically, optimizing prefetches is the only
> source of major performance improvement that I would expect from pixman
> assembly optimizations on MIPS32R2.

I have to address the comments I received when I submitted the patch the first time. There are no plans beyond this point at the moment since the project I worked on is now over.

> PS. It's interesting how all this write-allocate stuff is handled on different
> architectures: x86 can use special non-temporal store instructions and ARM
> just implements delayed allocation in Cortex-A8.

Yes, and I've seen great performance improvement on x86 memcpy/memset operations. Caching is nice and easy to use -- because it is mostly invisible to the programmer -- but sometimes gets in the way :)

Cheers,
-- Georgi