[Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Tue Jul 28 03:21:54 UTC 2020


https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #103 from mnrzk at protonmail.com ---
(In reply to Nicholas Kazlauskas from comment #95)
> Created attachment 290583 [details]
> 0001-drm-amd-display-Force-add-all-CRTCs-to-state-when-us.patch
> 
> So the sequence looks like the following:
> 
> 1. Non-blocking commit #1 requested, checked, swaps state and deferred to
> work queue.
> 
> 2. Non-blocking commit #2 requested, checked, swaps state and deferred to
> work queue.
> 
> Commits #1 and #2 don't touch any of the same core DRM objects (CRTCs,
> Planes, Connectors) so Commit #2 does not stall for Commit #1. DRM Private
> Objects have always been avoided in stall checks, so we have no safety from
> DRM core in this regard.
> 
> 3. Due to system load commit #2 executes first and finishes its commit tail
> work. At the end of commit tail, as part of DRM core, it calls
> drm_atomic_state_put().
> 
> Since this was the pageflip IOCTL we likely already dropped the reference on
> the state held by the IOCTL itself. So it's going to actually free at this
> point.
> 
> This eventually calls drm_atomic_state_clear() which does the following:
> 
> obj->funcs->atomic_destroy_state(obj, state->private_objs[i].state);
> 
> Note that it clears "state" here. Commit sets "state" to the following:
> 
> state->private_objs[i].state = old_obj_state;
> obj->state = new_obj_state;
> 
> Since Commit #1 swapped first this means Commit #2 actually does free Commit
> #1's private object.
> 
> 4. Commit #1 then executes and we get a use after free.
> 
> Same bug, it's just this was never corrupted before by the slab changes.
> It's been sitting dormant for 5.0~5.8.
> 
> Attached is a patch that might help resolve this.

So I just got around to testing this patch and so far, not very promising.

Right now I can't comment on if the bug in question was resolved but this
just introduced some new critical bugs for me.

I first tried this on my bare metal system w/ my RX 480 and it boots into
lightdm just fine. As soon as I log in and start up XFCE however, one of my
two monitors goes black (monitor reports being asleep) but my cursor seems
to drift into the other monitor just fine. So after that, I check the
display settings and both monitors are detected. So I tried re-enabling the
off monitor and then both monitors work fine.

After that, another bug: I now have two cursors, one only works on my right
monitor and the other only stays in one position.

At this point, I recompiled and remade the initramfs, and sure enough, same
issues. This time, however, changing the display settings didn't "fix" the
issue with one monitor being blank; the off monitor activated, but the
previously working one just froze.

I also tried this on my VM passing through my GPU w/ vfio-pci; similar
issues. Lightdm worked fine but when I started KDE Plasma, it started
flashing white and one of my monitors just became blank. This time, I
couldn't enable the blank display from the settings, it just didn't show
up. Xrandr only showed one output as well; switching HDMI outputs still
only lets me use the monitor on the "working" HDMI port.

I don't exactly know how I would go about debugging this since there's just
too many bugs to count. I also don't know if it would be worth it at all.

Do you have any idea why this would occur? This patch only seems to force
synchronisation, I don't quite know why it would break my system so much.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list