[Mesa-dev] [RFC] Fix attempt for Mesa + X-Server 1.20 + modesetting-ddx hangs on KDE5.

Mario Kleiner mario.kleiner.de at gmail.com
Sun May 6 17:56:58 UTC 2018


On Sun, May 6, 2018 at 1:51 PM, Tobias Klausmann
<tobias.johannes.klausmann at mni.thm.de> wrote:
> Hi,
>
> fyi: there is another bugreport #106372 [1], where i bisected the problem in
> the xserver and found a problematic commit, with code which can easily be
> reverted (patch in the bugreport), maybe you could check if that fixes the
> issue as well!

Hi Tobias,

thanks for the info. Yes, that's consistent with the Mesa bug and why
it apparently happens only 1.20 modesetting-ddx - or infrequently
enough on other ddx'en for nobody making a connection.

1. Mesa feeds way too large (way in the future)  >> 2^32 targetMsc's
into the PresentPixmap request, due to the Mesa bug.

2. Other ddx truncate the way too large targetMsc back to < 2^32 when
using the old drmWaitVblank ioctl to queue a vblank event, and due to
the magic of integer 32 bit truncation, most or all of the damage is
undone. Maybe no glitch, or only a hang of a few frames duration, or
only very infrequent long hangs, depending on the exact timing of
client vs. server execution, what and how much drawing plasmashell
does, etc.

3. modesetting-ddx directly queues the too large targetMsc via the new
drmCrtcQueueSequence ioctl if running on Linux 4.15 or later, and the
kernel dutyfully waits forever -> Hang.

I think in Michel's debug patch, only applying the #if 0 for the
ms_queue_vblank() function should be enough for the ddx to work around
the Mesa bug. Fixing client bugs in the server is probably not a good
idea though, given that we know it is a Mesa bug.

I think i found - and hopefully fixed - three other bugs in the
modesetting-ddx vblank handling, but they would only help for other
issues, not this specific one.

thanks,
-mario

>
> PS: I looked into bugzilla last weekend where i bisected this issue and did
> not recheck when opening the actual bugreport (sorry for that)
>
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=106372
>
> Greetings,
>
> Tobias
>
>
>
> On 5/4/18 3:45 PM, Mario Kleiner wrote:
>>
>> Two patches, solving the same problem in two different ways, the 1st
>> one ready to go, the 2nd one would need the debug statements removed.
>>
>> Only apply one of those for testing, the 2nd one will be useless with
>> the 1st one applied, but demonstrates the problem.
>>
>> So X-Server 1.20 RC + modesetting-ddx with DRI3/Present hangs at least
>> KDE-5's plasmashell and makes KDE-5 unusable with that setup.
>>
>> As KDE's plasmashell uses QT-5's QtQuick OpenGL based rendering api's
>> to render scene-graphs, this bug might affect other QT applications
>> as well.
>>
>> This fix works, but it points to some problems in modesetting-ddx's
>> current vblank handling, because other ddx'en seem to be mostly
>> unaffected by this Mesa bug.
>>
>> The problem is that neither of these two fixes is a proper final
>> solution, but better than nothing. It leaves the OML_sync_control
>> extensions glXWaitForSbcOML(), glXWaitForMscOML() calls and the
>> SGI_video_sync glXWaitVideoSyncSGI() functions broken for some
>> use patterns.
>>
>> The real problem, if i understand it correctly, is the way the life-time
>> of dri3_drawables and loader_dri3_drawables is managed atm. by Mesa's
>> bindContext() functions. Whenever glXMakeCurrent() etc. are called to
>> assign new/different GLXDrawables to the same context (ie. one context
>> reused for drawing into many different drawables, as opposed to using
>> one dedicated context for each drawable), we destroy the underlying
>> DRIDrawables/dri3_drawables_loader_dri3_drawables and they lose all
>> state wrt. pending bufferswaps, msc, sbc, ust.
>>
>> Nothing in the specs says that clients should expect to lose such
>> state on a GLXDrawable d1 whenever they reassign drawables other than
>> d1 to a GL context. A sequence like...
>>
>> 1.glXMakeCurrent(context, drawable1);
>> 2.draw draw draw
>> 3.glXSwapbuffers(context, drawable1);
>> 4.glXMakeCurrent(context, drawable2); // drawable 1 loses all state!
>> 5.glXWaitForSbcOML(dpy, drawable1, ...);
>>
>> ... would probably cause a hang of the client in glXWaitForSbcOML, as
>> the function requires information stored in the "original" drawable1
>> up to step 3, but lost in step 4 due to dri3_drawable destruction.
>>
>> Patch 1 has a potentially large performance impact when switching
>> drawables on a given context, due to the enforced wait on swap completion,
>> but might save OML clients which do waits for sbc,msc on a separate
>> thread,
>> whereas patch 2 doesn't have a performance impact, but doesn't even
>> partially solve trouble with OML_sync_control.
>>
>> However, i'm totally out of time atm. and probably not the right person
>> to think about a better solution, and by dumb luck, my own application
>> doesn't recycle the same context for different drawables, but uses a
>> dedicated context for each drawable, so it dodges this bullet.
>>
>> Therefore one of these patches is either a good enough fix for the KDE
>> hang problems atm. or a diagnosis of the problem as a starting point for
>> brighter people to deal with the root cause ;-)
>>
>> Thanks,
>> -mario
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the xorg-devel mailing list