Handling pageflip timeouts

Xaver Hugl xaver.hugl at kde.org
Wed Mar 13 14:45:47 UTC 2024


Hi all,

This was already discussed on IRC, but I think this should be on the
mailing list as well and get some more official conclusion that's
written down somewhere.

Recently I've experienced a GPU reset, which the system successfully
recovered from, but the display was still stuck - because amdgpu hit a
pageflip timeout, which causes the compositor to wait for a pageflip
event that will never come. Some other experiments I did before showed
that even if the compositor tries submitting new atomic commits after
a timeout, those commits are rejected with EBUSY, presumably because
the timed out pageflip is still considered "pending" on the kernel
side.

After restarting the compositor, everything continued to work
correctly, so this state can be recovered from. Because of that I
think it would be useful for the kernel to act on pageflip timeouts
differently. It should
- signal the pageflip's completion to userspace
- maybe have a new event for "pageflip failed" to give userspace more
correct information in the future
- allow new commits to happen afterwards

Another case discussed was when the device is completely removed.
Right now, if a pageflip is pending when that happens, userspace never
gets the event for pageflip completion, just like with the GPU reset.
KWin ignores pending pageflips on hotunplug, because the device is
removed it's not a big issue, but uAPI wise I would expect a pageflip
event to arrive for all commits that request them, no matter what -
and if that is not possible or desirable, uAPI has to be changed, for
example by introducing the mentioned "pageflip failed" event.

Looking forward to some answers,
Xaver Hugl


More information about the dri-devel mailing list