[PATCH v2] drm/atomic-helpers: Invoke end_fb_access while owning plane state
Alyssa Ross
hi at alyssa.is
Wed Nov 29 13:49:36 UTC 2023
Thomas Zimmermann <tzimmermann at suse.de> writes:
> Hi
>
> Am 27.11.23 um 17:25 schrieb Alyssa Ross:
>> Thomas Zimmermann <tzimmermann at suse.de> writes:
>>
>>> Invoke drm_plane_helper_funcs.end_fb_access before
>>> drm_atomic_helper_commit_hw_done(). The latter function hands over
>>> ownership of the plane state to the following commit, which might
>>> free it. Releasing resources in end_fb_access then operates on undefined
>>> state. This bug has been observed with non-blocking commits when they
>>> are being queued up quickly.
>>>
>>> Here is an example stack trace from the bug report. The plane state has
>>> been free'd already, so the pages for drm_gem_fb_vunmap() are gone.
>>>
>>> Unable to handle kernel paging request at virtual address 0000000100000049
>>> [...]
>>> drm_gem_fb_vunmap+0x18/0x74
>>> drm_gem_end_shadow_fb_access+0x1c/0x2c
>>> drm_atomic_helper_cleanup_planes+0x58/0xd8
>>> drm_atomic_helper_commit_tail+0x90/0xa0
>>> commit_tail+0x15c/0x188
>>> commit_work+0x14/0x20
>>>
>>> For aborted commits, it is still ok to run end_fb_access as part of the
>>> plane's cleanup. Add a test to drm_atomic_helper_cleanup_planes().
>>>
>>> v2:
>>> * fix test in drm_atomic_helper_cleanup_planes()
>>>
>>> Reported-by: Alyssa Ross <hi at alyssa.is>
>>> Closes: https://lore.kernel.org/dri-devel/87leazm0ya.fsf@alyssa.is/
>>> Suggested-by: Daniel Vetter <daniel at ffwll.ch>
>>> Fixes: 94d879eaf7fb ("drm/atomic-helper: Add {begin,end}_fb_access to plane helpers")
>>> Signed-off-by: Thomas Zimmermann <tzimmermann at suse.de>
>>> Cc: <stable at vger.kernel.org> # v6.2+
>>> ---
>>> drivers/gpu/drm/drm_atomic_helper.c | 17 +++++++++++++++++
>>> 1 file changed, 17 insertions(+)
>>
>> Got this basically immediately. :(
>
> I've never seen such problems on other systems. Is there anything
> different about the Mac systems? How do you trigger these errors?
My understanding is that all sorts of things are different, but I don't
know too much about the details. There's of course a chance that there
could be some other change in the Asahi Linux kernel that causes this
problem to surface — as I said, I reviewed the diff with mainline and
didn't see anything that looked relevant, but I could well have missed
something. I don't think I can test mainline directly, as it doesn't
yet support enough of the hardware — for slightly older Apple Silicon
Mac models, I think enough is upstream that this would be possible, but
I don't have access to any.
I started off encountering these errors every few days. I noticed them
because they would sometimes result in my system either starting to
freeze for 10 seconds at a time, or until I switched VT. They seem to
correlate with the system being under high CPU load. I was also able to
substantially increase the frequency with which they occurred by adding
logging to the kernel — even just drm.debug=0x10 makes a big difference,
and when I also added a few dump_backtrace() calls when I was trying to
understand the code and diagnose the problem, I would relatively
consistently encounter an Oops within a few minutes of load.
BTW: v3 is looking good so far. I've only been testing it since this
morning, though, so I'll keep trying it out for a bit longer before I
declare the problem to have been solved and send a Tested-by.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20231129/07e0fb34/attachment.sig>
More information about the dri-devel
mailing list