[Bug 207383] [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Tue Jul 28 03:39:18 UTC 2020


https://bugzilla.kernel.org/show_bug.cgi?id=207383

--- Comment #104 from mnrzk at protonmail.com ---
(In reply to mnrzk from comment #103)
> (In reply to Nicholas Kazlauskas from comment #95)
> > Created attachment 290583 [details]
> > 0001-drm-amd-display-Force-add-all-CRTCs-to-state-when-us.patch
> > 
> > So the sequence looks like the following:
> > 
> > 1. Non-blocking commit #1 requested, checked, swaps state and deferred to
> > work queue.
> > 
> > 2. Non-blocking commit #2 requested, checked, swaps state and deferred to
> > work queue.
> > 
> > Commits #1 and #2 don't touch any of the same core DRM objects (CRTCs,
> > Planes, Connectors) so Commit #2 does not stall for Commit #1. DRM Private
> > Objects have always been avoided in stall checks, so we have no safety from
> > DRM core in this regard.
> > 
> > 3. Due to system load commit #2 executes first and finishes its commit tail
> > work. At the end of commit tail, as part of DRM core, it calls
> > drm_atomic_state_put().
> > 
> > Since this was the pageflip IOCTL we likely already dropped the reference
> on
> > the state held by the IOCTL itself. So it's going to actually free at this
> > point.
> > 
> > This eventually calls drm_atomic_state_clear() which does the following:
> > 
> > obj->funcs->atomic_destroy_state(obj, state->private_objs[i].state);
> > 
> > Note that it clears "state" here. Commit sets "state" to the following:
> > 
> > state->private_objs[i].state = old_obj_state;
> > obj->state = new_obj_state;
> > 
> > Since Commit #1 swapped first this means Commit #2 actually does free
> Commit
> > #1's private object.
> > 
> > 4. Commit #1 then executes and we get a use after free.
> > 
> > Same bug, it's just this was never corrupted before by the slab changes.
> > It's been sitting dormant for 5.0~5.8.
> > 
> > Attached is a patch that might help resolve this.
> 
> So I just got around to testing this patch and so far, not very promising.
> 
> Right now I can't comment on if the bug in question was resolved but this
> just introduced some new critical bugs for me.
> 
> I first tried this on my bare metal system w/ my RX 480 and it boots into
> lightdm just fine. As soon as I log in and start up XFCE however, one of my
> two monitors goes black (monitor reports being asleep) but my cursor seems
> to drift into the other monitor just fine. So after that, I check the
> display settings and both monitors are detected. So I tried re-enabling the
> off monitor and then both monitors work fine.
> 
> After that, another bug: I now have two cursors, one only works on my right
> monitor and the other only stays in one position.
> 
> At this point, I recompiled and remade the initramfs, and sure enough, same
> issues. This time, however, changing the display settings didn't "fix" the
> issue with one monitor being blank; the off monitor activated, but the
> previously working one just froze.
> 
> I also tried this on my VM passing through my GPU w/ vfio-pci; similar
> issues. Lightdm worked fine but when I started KDE Plasma, it started
> flashing white and one of my monitors just became blank. This time, I
> couldn't enable the blank display from the settings, it just didn't show
> up. Xrandr only showed one output as well; switching HDMI outputs still
> only lets me use the monitor on the "working" HDMI port.
> 
> I don't exactly know how I would go about debugging this since there's just
> too many bugs to count. I also don't know if it would be worth it at all.
> 
> Do you have any idea why this would occur? This patch only seems to force
> synchronisation, I don't quite know why it would break my system so much.

This just gets even weirder the more I test it out. Swapping the two
monitors (i.e. swapping the HDMI ports used for each monitor) seems to fix
the issue completely on my VM (at least from 1 minute of testing), but on
the host it fixes some of the issues (my cursor still disappears on one of
my monitors).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list