linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

Mario Kleiner mario.kleiner.de at gmail.com
Wed Jan 20 21:31:26 PST 2016


On 01/21/2016 04:43 AM, Michel Dänzer wrote:
> On 21.01.2016 05:32, Mario Kleiner wrote:
>>
>> So the problem is that AMDs hardware frame counters reset to
>> zero during a modeset. The old DRM code dealt with drivers doing that by
>> keeping vblank irqs enabled during modesets and incrementing vblank
>> count by one during each vblank irq, i think that's what
>> drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.
>
> Right, looks like there's been a regression breaking this. I suspect the
> problem is that vblank->last isn't getting updated from
> drm_vblank_post_modeset. Not sure which change broke that though, or how
> to fix it. Ville?
>

The whole logic has changed and the software counter updates are now 
driven all the time by the hw counter.

>
> BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
> exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
> vblank counters"). I've been meaning to track that down since then; one
> of these days hopefully, but if anybody has any ideas offhand...
>
>

I spent the last few hours reading through the drm and radeon code and i 
think what should probably work is to replace the 
drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on 
calls. These are apparently meant for drivers whose hw counters reset 
during modeset, and seem to reinitialize stuff properly and release 
clients queued vblank events to avoid blocking - not tested so far, just 
looked at the code.

Once drm_vblank_off is called, drm_vblank_get will no-op and return an 
error, so clients can't enable vblank irqs during the modeset - pageflip 
ioctl and waitvblank ioctl would fail while a modeset happens - 
hopefully userspace handles this correctly everywhere.

It would also cause radeons power management to not sync its actions to 
vblank if it would get invoked during a modeset, but that seems to be 
handled by a 200 msec timeout and hopefully only cause visual glitches - 
or invisible glitches while the crtc is blanked during modeset?

There could be another tiny race with the new "vblank counter bumping" 
logic from commit 5b5561b ("drm/radeon: Fixup hw vblank counters/ts 
...") if drm_update_vblank_counter() would be called multiple times in 
quick succession within the "radeon_crtc->lb_vblank_lead_lines" 
scanlines before start of real vblank iff at the same time a modeset 
would happen and set radeon_crtc->lb_vblank_lead_lines to a smaller 
value due to a change in horizontal mode resolution. That needs a 
modeset to happen to a higher horizontal resolution just exactly when 
the scanout is in exactly the right 5 or so scanlines and some client is 
calling drm_vblank_get() to enable vblank irqs at the same time, but it 
would cause the same hang if it happened - not that likely to happen 
often, but still not nice, also Murphy's law... If we could switch to 
drm_vblank_off/on instead of drm_vblank_pre/post_modeset we could remove 
those race as well by forbidding any vblank irq related activity during 
a modeset.

I'll hack up a patch for demonstration now.


More information about the dri-devel mailing list