[PATCH] drm/amd/display: Fix overlay validation by considering cursors
Michel Dänzer
michel at daenzer.net
Tue Jun 8 07:47:14 UTC 2021
On 2021-06-07 8:45 p.m., Sean Paul wrote:
>
>
> On Mon, Jun 7, 2021 at 2:37 PM Harry Wentland <harry.wentland at amd.com <mailto:harry.wentland at amd.com>> wrote:
>
> On 2021-06-07 2:19 p.m., Sean Paul wrote:
> > On Tue, May 18, 2021 at 2:58 PM Rodrigo Siqueira
> > <Rodrigo.Siqueira at amd.com <mailto:Rodrigo.Siqueira at amd.com>> wrote:
> >>
> >> On 05/14, Mark Yacoub wrote:
> >>> On Fri, May 14, 2021 at 12:31 PM Mark Yacoub <markyacoub at google.com <mailto:markyacoub at google.com>> wrote:
> >>>>
> >>>> On Fri, May 14, 2021 at 11:28 AM Harry Wentland <harry.wentland at amd.com <mailto:harry.wentland at amd.com>> wrote:
> >>>>>
> >>>>> On 2021-05-14 7:47 a.m., Rodrigo Siqueira wrote:
> >>>>>> A few weeks ago, we saw a two cursor issue in a ChromeOS system. We
> >>>>>> fixed it in the commit:
> >>>>>>
> >>>>>> drm/amd/display: Fix two cursor duplication when using overlay
> >>>>>> (read the commit message for more details)
> >>>>>>
> >>>>>> After this change, we noticed that some IGT subtests related to
> >>>>>> kms_plane and kms_plane_scaling started to fail. After investigating
> >>>>>> this issue, we noticed that all subtests that fail have a primary plane
> >>>>>> covering the overlay plane, which is currently rejected by amdgpu dm.
> >>>>>> Fail those IGT tests highlight that our verification was too broad and
> >>>>>> compromises the overlay usage in our drive. This patch fixes this issue
> >>> nit: s/drive/driver?
> >>>>>> by ensuring that we only reject commits where the primary plane is not
> >>>>>> fully covered by the overlay when the cursor hardware is enabled. With
> >>>>>> this fix, all IGT tests start to pass again, which means our overlay
> >>>>>> support works as expected.
> >>>>>>
> >>>>>> Cc: Tianci.Yin <tianci.yin at amd.com <mailto:tianci.yin at amd.com>>
> >>>>>> Cc: Harry Wentland <harry.wentland at amd.com <mailto:harry.wentland at amd.com>>
> >>>>>> Cc: Nicholas Choi <nicholas.choi at amd.com <mailto:nicholas.choi at amd.com>>
> >>>>>> Cc: Bhawanpreet Lakha <bhawanpreet.lakha at amd.com <mailto:bhawanpreet.lakha at amd.com>>
> >>>>>> Cc: Nicholas Kazlauskas <Nicholas.Kazlauskas at amd.com <mailto:Nicholas.Kazlauskas at amd.com>>
> >>>>>> Cc: Mark Yacoub <markyacoub at google.com <mailto:markyacoub at google.com>>
> >>>>>> Cc: Daniel Wheeler <daniel.wheeler at amd.com <mailto:daniel.wheeler at amd.com>>
> >>>>>>
> >>>>>> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira at amd.com <mailto:Rodrigo.Siqueira at amd.com>>
> >>>>>> ---
> >>>>>> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 +++++++++-
> >>>>>> 1 file changed, 9 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >>>>>> index ccd67003b120..9c2537a17a7b 100644
> >>>>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >>>>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >>>>>> @@ -10067,7 +10067,7 @@ static int validate_overlay(struct drm_atomic_state *state)
> >>>>>> int i;
> >>>>>> struct drm_plane *plane;
> >>>>>> struct drm_plane_state *old_plane_state, *new_plane_state;
> >>>>>> - struct drm_plane_state *primary_state, *overlay_state = NULL;
> >>>>>> + struct drm_plane_state *primary_state, *cursor_state, *overlay_state = NULL;
> >>>>>>
> >>>>>> /* Check if primary plane is contained inside overlay */
> >>>>>> for_each_oldnew_plane_in_state_reverse(state, plane, old_plane_state, new_plane_state, i) {
> >>>>>> @@ -10097,6 +10097,14 @@ static int validate_overlay(struct drm_atomic_state *state)
> >>>>>> if (!primary_state->crtc)
> >>>>>> return 0;
> >>>>>>
> >>>>>> + /* check if cursor plane is enabled */
> >>>>>> + cursor_state = drm_atomic_get_plane_state(state, overlay_state->crtc->cursor);
> >>>>>> + if (IS_ERR(cursor_state))
> >>>>>> + return PTR_ERR(cursor_state);
> >>>>>> +
> >>>>>> + if (drm_atomic_plane_disabling(plane->state, cursor_state))
> >>>>>> + return 0;
> >>>>>> +
> >>>>>
> >>>>> I thought this breaks Chrome's compositor since it can't handle an atomic commit rejection
> >>>>> based solely on whether a cursor is enabled or not.
> >>> For context: To use overlays (the old/current async way), Chrome tests
> >>> if an overlay strategy will work by doing an atomic commit with a TEST
> >>> flag to check if the combination would work. If it works, it flags
> >>> this planes configuration as valid. As it's valid, it performs an
> >>> atomic page flip (which could also include the cursor).
> >>> So to Harry's point, you would pass an atomic test but fail on an
> >>> atomic page flip with the exact same configuration that's been flagged
> >>> as valid. Failing a pageflip causes Chrome to crash (Reset the GPU
> >>> process cause something went horribly wrong when it shouldn't).
> >>
> >> Hi Mark and Sean,
> >>
> >> I don't know if I fully comprehended the scenario which in my patch
> >> might cause a ChromeOS crash, but this is what I understood:
> >>
> >
> > Chrome compositor only does an atomic test when the layout geometry
> > changes (ie: plane is added/removed/resized/moved). This does not take
> > into consideration the cursor. Once the atomic test has been validated
> > for a given layout geometry (set of planes/fbs along with their sizes
> > and locations), Chrome assumes this will continue to be valid.
> >
> > So the situation I'm envisioning is if the cursor is hidden, an
> > overlay-based strategy will pass in the kernel. If at any time the
> > cursor becomes visible, the kernel will start failing commits which
> > causes Chrome's GPU process to crash.
> >
> > In general I'm uneasy with using the cursor in the atomic check since
> > it feels like it could be racy independent of the issue above. I
> > haven't dove into the helper code enough to prove it, just a
> > spidey-sense.
> >
>
> Isn't the cursor supposed to be just another plane? If so, shouldn't
> adding/removing the cursor trigger an atomic test?
>
>
> Chrome is using legacy cursor.
>
> Yes it will trigger an atomic test in the kernel, and that atomic test will fail. Unfortunately Chrome does not expect a failure so it will crash.
The solution is probably indeed for atomic check to reject state which couldn't work if the cursor was enabled, even if the cursor is currently disabled. Otherwise one can hit various surprising errors via legacy APIs, as described in b836a274b797 "drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC is".
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and X developer
More information about the amd-gfx
mailing list