EXA and damage performance problem

Tue Nov 29 14:36:27 PST 2011

On Tue, Nov 29, 2011 at 11:29 PM, Christoph Bartoschek
<bartoschek at or.uni-bonn.de> wrote:
> Am 29.11.2011 23:19, schrieb Maarten Maathuis:
>>
>> On Tue, Nov 29, 2011 at 2:33 PM, Christoph Bartoschek
>> <bartoschek at or.uni-bonn.de>  wrote:
>>>
>>> Hi,
>>>
>>> I am moving the thread "EXA performance problem" from xorg to xorg-devel
>>> and
>>> hope to get some help here.
>>>
>>> To sum up the problem: We use an application that displays vector
>>> pictures.
>>> We use it mostly to display pictures with millions of rectangles. Using
>>> our
>>> old X11 thin clients (XFree86) the performance was acceptable. The speed
>>> was
>>> about 1 mio rectangles per second. After upgrading to newer thin clients
>>> (Xorg) the performance dropped significantly.
>>>
>>> I have a testcase where displaying the picture now takes 90 seconds. It
>>> was
>>> below one second on the older thin clients.
>>>
>>> The profiler says that 95% of the runtime is spent in pixman region
>>> operations.
>>>
>>> The application draws polyRectangle most of the time. And I see that
>>> nearly
>>> 100% of time is spent in damagePolyRectangle and the functions below.
>>>
>>> 33% of the time in damagePolyRectangle is spent in the while loop to
>>> construct the damage region. The algorithm runs in O(n^2) because it adds
>>> one rectangle at a time. This can be fixed by constructing the damage
>>> region
>>> in one step. The attached patch does this.
>>>
>>> However after fixing this most of the time is spent in ExaCheckPolylines
>>> which is called by this chain:
>>>
>>>
>>> damagePolyRectangle ->  miPolyRectangle ->  exaPolylines ->
>>>  ExaCheckPolylines
>>>
>>> I've measured the runtime of the steps in ExaCheckPolylines:
>>>
>>>
>>> void
>>> ExaCheckPolylines (DrawablePtr pDrawable, GCPtr pGC,
>>>                  int mode, int npt, DDXPointPtr ppt)
>>> {
>>>  EXA_PRE_FALLBACK_GC(pGC);
>>>  EXA_FALLBACK(("to %p (%c), width %d, mode %d, count %d\n",
>>>                pDrawable, exaDrawableLocation(pDrawable),
>>>                pGC->lineWidth, mode, npt));
>>>
>>>  exaPrepareAccess (pDrawable, EXA_PREPARE_DEST);       // Step1: 55 s
>>>  exaPrepareAccessGC (pGC);                             // Step2: 2.4 s
>>>  pGC->ops->Polylines (pDrawable, pGC, mode, npt, ppt); // Step3: 2.4 s
>>>  exaFinishAccessGC (pGC);                              // Step4: 2.2 s
>>>  exaFinishAccess (pDrawable, EXA_PREPARE_DEST);        // Step5: 2.2 s
>>>  EXA_POST_FALLBACK_GC(pGC);
>>> }
>>>
>>> We see that exaPrepareAccess needs most of the time. Is that expected?
>>
>> I don't know which driver this is (and which type of EXA), but worst
>> case scenario the destination is a tiled frontbuffer that gets copied
>> back and forth for every operation (you want to see the framebuffer,
>> so you can't wait). If it's done using a hardware copy the software
>> needs to wait for the copy to be finished. The other way around can be
>> faster (and relatively non-blocking) depending on how it's
>> implemented. I think the interfaces inside the xserver are the main
>> reason it's done this way. The truth is that the whole thing was never
>> designed for modern hardware, so EXA can only do so much. You could
>> define new interfaces inside the xserver, but if your app does a call
>> for each rectangle, then that won't help much. At some point it
>> becomes easier to change the app if you can (rendering to a pixmap
>> instead of the frontbuffer should help a lot already if you are
>> bottlenecked by frontbuffer copies).
>>
>>>
>>> Inside there are several operations on the damage region. This makes
>>> damagePolyRectangle a quadratic algorithm.
>>>
>>> For N rectangles the damage region has O(N) rectangles. And for each
>>> Rectangle there are operations on the damage region. The result is
>>> O(N^2).
>>>
>>> Is it necessary to call exaPrepareAccess for each of the rectangles?
>>
>> No, but unless the app gives you all rectangles at once i don't see
>> any other way.
>
> I do not know whether it gives all rectangles at once. But I see that
> damagePolyRectangle is called with chunks of 2044 rectangles.

Then consider making a multiPolylines or multiPolyRectangle interface
or something like that, then you can override the mi implementation in
exa.

>
> It is miPolyRectangle that iterates over all rectangles.
>
> Christoph
>
>

-- 
Far away from the primal instinct, the song seems to fade away, the
river get wider between your thoughts and the things we do and say.