[Intel-gfx] [PATCH] drm/atomic-helpers: remove legacy_cursor_update hacks

Maxime Ripard maxime at cerno.tech
Thu Apr 28 08:08:47 UTC 2022


Hi Daniel,

On Wed, Apr 13, 2022 at 01:20:11PM +0200, Daniel Vetter wrote:
> On Wed, 13 Apr 2022 at 01:36, Abhinav Kumar <quic_abhinavk at quicinc.com> wrote:
> > On 4/8/2022 9:04 PM, Abhinav Kumar wrote:
> > >
> > >
> > > On 4/7/2022 4:12 PM, Rob Clark wrote:
> > >> On Thu, Apr 7, 2022 at 3:59 PM Abhinav Kumar
> > >> <quic_abhinavk at quicinc.com> wrote:
> > >>>
> > >>> Hi Rob and Daniel
> > >>>
> > >>> On 4/7/2022 3:51 PM, Rob Clark wrote:
> > >>>> On Wed, Apr 6, 2022 at 6:27 PM Jessica Zhang
> > >>>> <quic_jesszhan at quicinc.com> wrote:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 3/31/2022 8:20 AM, Daniel Vetter wrote:
> > >>>>>> The stuff never really worked, and leads to lots of fun because it
> > >>>>>> out-of-order frees atomic states. Which upsets KASAN, among other
> > >>>>>> things.
> > >>>>>>
> > >>>>>> For async updates we now have a more solid solution with the
> > >>>>>> ->atomic_async_check and ->atomic_async_commit hooks. Support for
> > >>>>>> that
> > >>>>>> for msm and vc4 landed. nouveau and i915 have their own commit
> > >>>>>> routines, doing something similar.
> > >>>>>>
> > >>>>>> For everyone else it's probably better to remove the use-after-free
> > >>>>>> bug, and encourage folks to use the async support instead. The
> > >>>>>> affected drivers which register a legacy cursor plane and don't
> > >>>>>> either
> > >>>>>> use the new async stuff or their own commit routine are: amdgpu,
> > >>>>>> atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and
> > >>>>>> vmwgfx.
> > >>>>>>
> > >>>>>> Inspired by an amdgpu bug report.
> > >>>>>>
> > >>>>>> v2: Drop RFC, I think with amdgpu converted over to use
> > >>>>>> atomic_async_check/commit done in
> > >>>>>>
> > >>>>>> commit 674e78acae0dfb4beb56132e41cbae5b60f7d662
> > >>>>>> Author: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
> > >>>>>> Date:   Wed Dec 5 14:59:07 2018 -0500
> > >>>>>>
> > >>>>>>        drm/amd/display: Add fast path for cursor plane updates
> > >>>>>>
> > >>>>>> we don't have any driver anymore where we have userspace expecting
> > >>>>>> solid legacy cursor support _and_ they are using the atomic
> > >>>>>> helpers in
> > >>>>>> their fully glory. So we can retire this.
> > >>>>>>
> > >>>>>> v3: Paper over msm and i915 regression. The complete_all is the only
> > >>>>>> thing missing afaict.
> > >>>>>>
> > >>>>>> v4: Fixup i915 fixup ...
> > >>>>>>
> > >>>>>> References: https://bugzilla.kernel.org/show_bug.cgi?id=199425
> > >>>>>> References:
> > >>>>>> https://lore.kernel.org/all/20220221134155.125447-9-maxime@cerno.tech/
> > >>>>>>
> > >>>>>> References: https://bugzilla.kernel.org/show_bug.cgi?id=199425
> > >>>>>> Cc: Maxime Ripard <maxime at cerno.tech>
> > >>>>>> Tested-by: Maxime Ripard <maxime at cerno.tech>
> > >>>>>> Cc: mikita.lipski at amd.com
> > >>>>>> Cc: Michel Dänzer <michel at daenzer.net>
> > >>>>>> Cc: harry.wentland at amd.com
> > >>>>>> Cc: Rob Clark <robdclark at gmail.com>
> > >>>>>
> > >>>>> Hey Rob,
> > >>>>>
> > >>>>> I saw your tested-by and reviewed-by tags on Patchwork. Just curious,
> > >>>>> what device did you test on?
> > >>>>
> > >>>> I was testing on strongbad.. v5.18-rc1 + patches (notably, revert
> > >>>> 80253168dbfd ("drm: of: Lookup if child node has panel or bridge")
> > >>>>
> > >>>> I think the display setup shouldn't be significantly different than
> > >>>> limozeen (ie. it's an eDP panel).  But I didn't do much start/stop
> > >>>> ui.. I was mostly looking to make sure cursor movements weren't
> > >>>> causing fps drops ;-)
> > >>>>
> > >>>> BR,
> > >>>> -R
> > >>>
> > >>> start ui/ stop ui is a basic operation for us to use IGT on msm-next.
> > >>> So we cannot let that break.
> > >>>
> > >>> I think we need to check whats causing this splat.
> > >>>
> > >>> Can we hold back this change till then?
> > >>
> > >> Can you reproduce on v5.18-rc1 (plus 80253168dbfd)?  I'm running a
> > >> loop of stop ui / start ui and hasn't triggered a splat yet.
> > >>
> > >>   Otherwise maybe you can addr2line to figure out where it crashed?
> > >>
> > >> BR,
> > >> -R
> > >
> > > So this is not a crash. Its a warning splat coming from
> > >
> > > https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c#L785
> > >
> > >
> > > Looks like the complete_commit() which should signal the event has not
> > > happened before the next cursor commit.
> > >
> > > Somehow this change is affecting the flow to miss the event signaling
> > > that the event is done.
> > >
> > > We tried a couple of approaches but couldnt still fix the warning.
> > >
> > > Will continue to check further next week.
> > >
> > >>
> > >>> Thanks
> > >>>
> > >>> Abhinav
> >
> > After checking this more this week, I think the current patch needs to
> > be changed a bit.
> >
> > So, here you are removing the complete_all part and leaving that to the
> > individual drivers, which is fine.
> >
> > But, you are also removing the continue part which I think seems
> > incorrect and causing these warnings for MSM driver.
> >
> > @@ -2135,12 +2128,6 @@  int drm_atomic_helper_setup_commit(struct
> > drm_atomic_state *state,
> >                         continue;
> >                 }
> >
> > -               /* Legacy cursor updates are fully unsynced. */
> > -               if (state->legacy_cursor_update) {
> > -                       complete_all(&commit->flip_done);
> > -                       continue;
> > -               }
> > -
> >
> > Thats because MSM driver thinks that if the previous crtc_state->event
> > was not consumed, then something went wrong and throws a warning.
> >
> >         if (!new_crtc_state->event) {
> >              commit->event = kzalloc(sizeof(*commit->event),
> >                          GFP_KERNEL);
> >              if (!commit->event)
> >                  return -ENOMEM;
> >
> >              new_crtc_state->event = commit->event;
> >          }
> >
> > But for a cursor update, we should not or need not populate the event at
> > all because it is async.
> >
> > So i think we should still keep the continue, rest of the patch is fine.
> >
> > @@ -2128,6 +2128,9 @@ int drm_atomic_helper_setup_commit(struct
> > drm_atomic_state *state,
> > continue;
> > }
> >
> > + if (state->legacy_cursor_update)
> > +      continue;
> > +
> >
> > Let me know your comments.
> 
> Thanks a lot for your excellent analysis. I need to think this through
> some more and figure out what exactly we should be doing.

We integrated this in the (downstream) RaspberryPi kernel, and it seems
to trigger some weird regressions:

  - If we move the cursor under X, the primary plane update is stuck:
    https://github.com/raspberrypi/linux/issues/4988

  - Switching back and forth between VT gets the kernel stuck (with a
    locking issue in fb_release?)
    https://github.com/raspberrypi/linux/issues/5011

I haven't been able to look into it at this point, I'll let you know if
I find anything relevant

Maxime
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20220428/2e147329/attachment-0001.sig>


More information about the Intel-gfx mailing list