EXA and damage performance problem

Maarten Maathuis madman2003 at gmail.com
Tue Nov 29 14:36:27 PST 2011


On Tue, Nov 29, 2011 at 11:29 PM, Christoph Bartoschek
<bartoschek at or.uni-bonn.de> wrote:
> Am 29.11.2011 23:19, schrieb Maarten Maathuis:
>>
>> On Tue, Nov 29, 2011 at 2:33 PM, Christoph Bartoschek
>> <bartoschek at or.uni-bonn.de>  wrote:
>>>
>>> Hi,
>>>
>>> I am moving the thread "EXA performance problem" from xorg to xorg-devel
>>> and
>>> hope to get some help here.
>>>
>>> To sum up the problem: We use an application that displays vector
>>> pictures.
>>> We use it mostly to display pictures with millions of rectangles. Using
>>> our
>>> old X11 thin clients (XFree86) the performance was acceptable. The speed
>>> was
>>> about 1 mio rectangles per second. After upgrading to newer thin clients
>>> (Xorg) the performance dropped significantly.
>>>
>>> I have a testcase where displaying the picture now takes 90 seconds. It
>>> was
>>> below one second on the older thin clients.
>>>
>>> The profiler says that 95% of the runtime is spent in pixman region
>>> operations.
>>>
>>> The application draws polyRectangle most of the time. And I see that
>>> nearly
>>> 100% of time is spent in damagePolyRectangle and the functions below.
>>>
>>> 33% of the time in damagePolyRectangle is spent in the while loop to
>>> construct the damage region. The algorithm runs in O(n^2) because it adds
>>> one rectangle at a time. This can be fixed by constructing the damage
>>> region
>>> in one step. The attached patch does this.
>>>
>>> However after fixing this most of the time is spent in ExaCheckPolylines
>>> which is called by this chain:
>>>
>>>
>>> damagePolyRectangle ->  miPolyRectangle ->  exaPolylines ->
>>>  ExaCheckPolylines
>>>
>>> I've measured the runtime of the steps in ExaCheckPolylines:
>>>
>>>
>>> void
>>> ExaCheckPolylines (DrawablePtr pDrawable, GCPtr pGC,
>>>                  int mode, int npt, DDXPointPtr ppt)
>>> {
>>>  EXA_PRE_FALLBACK_GC(pGC);
>>>  EXA_FALLBACK(("to %p (%c), width %d, mode %d, count %d\n",
>>>                pDrawable, exaDrawableLocation(pDrawable),
>>>                pGC->lineWidth, mode, npt));
>>>
>>>  exaPrepareAccess (pDrawable, EXA_PREPARE_DEST);       // Step1: 55 s
>>>  exaPrepareAccessGC (pGC);                             // Step2: 2.4 s
>>>  pGC->ops->Polylines (pDrawable, pGC, mode, npt, ppt); // Step3: 2.4 s
>>>  exaFinishAccessGC (pGC);                              // Step4: 2.2 s
>>>  exaFinishAccess (pDrawable, EXA_PREPARE_DEST);        // Step5: 2.2 s
>>>  EXA_POST_FALLBACK_GC(pGC);
>>> }
>>>
>>> We see that exaPrepareAccess needs most of the time. Is that expected?
>>
>> I don't know which driver this is (and which type of EXA), but worst
>> case scenario the destination is a tiled frontbuffer that gets copied
>> back and forth for every operation (you want to see the framebuffer,
>> so you can't wait). If it's done using a hardware copy the software
>> needs to wait for the copy to be finished. The other way around can be
>> faster (and relatively non-blocking) depending on how it's
>> implemented. I think the interfaces inside the xserver are the main
>> reason it's done this way. The truth is that the whole thing was never
>> designed for modern hardware, so EXA can only do so much. You could
>> define new interfaces inside the xserver, but if your app does a call
>> for each rectangle, then that won't help much. At some point it
>> becomes easier to change the app if you can (rendering to a pixmap
>> instead of the frontbuffer should help a lot already if you are
>> bottlenecked by frontbuffer copies).
>>
>>>
>>> Inside there are several operations on the damage region. This makes
>>> damagePolyRectangle a quadratic algorithm.
>>>
>>> For N rectangles the damage region has O(N) rectangles. And for each
>>> Rectangle there are operations on the damage region. The result is
>>> O(N^2).
>>>
>>> Is it necessary to call exaPrepareAccess for each of the rectangles?
>>
>> No, but unless the app gives you all rectangles at once i don't see
>> any other way.
>
> I do not know whether it gives all rectangles at once. But I see that
> damagePolyRectangle is called with chunks of 2044 rectangles.

Then consider making a multiPolylines or multiPolyRectangle interface
or something like that, then you can override the mi implementation in
exa.

>
> It is miPolyRectangle that iterates over all rectangles.
>
> Christoph
>
>



-- 
Far away from the primal instinct, the song seems to fade away, the
river get wider between your thoughts and the things we do and say.


More information about the xorg-devel mailing list