[PATCH] drm/atomic_helper: duplicate state for drm_private_obj

Daniel Vetter daniel at ffwll.ch
Fri Jun 26 13:46:42 UTC 2020


On Thu, Jun 25, 2020 at 09:24:38AM -0700, Rob Clark wrote:
> On Thu, Jun 25, 2020 at 8:55 AM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> >
> > On Thu, Jun 25, 2020 at 4:17 PM Rob Clark <robdclark at gmail.com> wrote:
> > >
> > > On Thu, Jun 25, 2020 at 5:35 AM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> > > >
> > > > On Thu, Jun 25, 2020 at 1:58 PM Shawn Guo <shawnguo at kernel.org> wrote:
> > > > >
> > > > > From: Shawn Guo <shawn.guo at linaro.org>
> > > > >
> > > > > The msm/mdp5 driver uses drm_private_obj as its global atomic state,
> > > > > which keeps the assignment of hwpipe to plane.  With drm_private_obj
> > > > > missing from duplicate state call, mdp5 suspend works with no problem
> > > > > only for the very first time.  Any subsequent suspend will hit the
> > > > > following warning, because hwpipe assignment doesn't get duplicated for
> > > > > suspend state.  Adding drm_private_obj handling for duplicate state call
> > > > > fixes the problem.
> > > >
> > > > If the driver needs a private state, it's supposed to duplicate that
> > > > in its atomic_check functionality. This isn't the helper's job.
> > > >
> > > > If this is a bug in msm code, then pretty sure if you try hard enough,
> > > > you can hit the exact same bug from userspace too. Maybe good idea to
> > > > try and reproduce this with igt or something.
> > >
> > > The problem is how duplicate_state is used by the atomic
> > > suspend/resume helpers.  They duplicate the running state on suspend,
> > > forgetting to duplicate the global state.  Then everything is
> > > disabled, the driver correctly duplicates and updates it's global
> > > atomic state releasing the hwpipe.
> > >
> > > But then on resume, we are re-applying plane state that thinks it
> > > already has a hwpipe assigned (because that is part of the plane state
> > > that was duplicated), without reapplying the matching global state.
> > >
> > > On a normal atomic commit, we would duplicate the plane state that has
> > > the hwpipe disabled, which would be in sync with the drivers global
> > > state.  But since that is not what the atomic resume helper does, we
> > > hit the situation where the plane state and the global state are out
> > > of sync.
> > >
> > > So the driver is dtrt, the problem really is with the helpers.  I
> > > think this patch is the right thing to do.  It is incorrect for the
> > > suspend/resume helpers to assume that they can re-apply duplicated
> > > state without including the global state.
> >
> > Hm, this is a bit awkward. Generally the assumption is that you should
> > be recomputing the derived state (like hwpipe) no matter what. If your
> > driver doesn't do that, then all kinds of things can leak from the
> > pre-resume to the post-resume side of things, that's kinda why I'm not
> > thrilled with this patch, I think it has good potential to open up a
> > can of worms. Iirc this patch has come up in the past, and in those
> > cases it was a driver bug.
> >
> > For this case, why is msm reusing a hw pipe assignment of a disabled plane?
> 
> Because it is part of the plane state that is being restored.
> 
> Since resume uses the old state saved before the
> drm_atomic_helper_disable_all(), rather than duplicating the current
> state, we end up with this mismatch between global and plane state.  I
> think stashing away the old state is probably ok, but we can't just do
> it piecemeal without including the global state.
> 
> I suppose part of the problem is the hwpipe (and other such
> dynamically assigned resources) touch both private and plane (and
> crtc) state.  The global state object knows which resources are
> assigned to which plane/crtc.  But the plane/crtc state knows which of
> the (potentially) two hwpipe/mixers is "left" (primary) and "right"
> (secondary).

Yeah I get all that, what I meant is: Why don't you just blindly recompute
the hwpipe assignment every time a full modeset is done? Caching it for
pure plane flips makes sense, but drm_crtc_needs_modset == true and just
throw it all overboard and assign new ones I think would also solve this
problem. Since the hwpipe global state would indicate that all pipes are
unallocated that should work (I hope).

Imo just recomputing state is a good atomic pattern, it avoids drivers
getting stuck in a corner somewhere you can't reset them out of anymore.

My question here was, why can't you do that?

> > Unfortunately we can't copy the drm state into the overall structure
> > to solve this, since that would miss driver-private properties. So
> > unfortunately we can't make this match a real atomic commit from
> > userspace perfectly.
> 
> I'm not sure I understand this?  The driver private properties
> would/should be part of one of the state objs (plane/crtc/global)?  If
> the atomic state (including global) represents the entirety of the hw
> state, you should be able to stash off N different versions of them
> and re-apply them in any order you want because they are all
> self-consistent.

So for normal atomic commit we have:

1. duplicate current state
2. set properties

But for resume helpers it's some random older state, so the expectations
break a bit. We could approximate that using something like:

1. duplicate current state into curr_state
2. set properties using a memcpy of the drm core state structure, leaving
the driver private stuff as-is.

But a) there's also some non-property state in drm state structures and b)
properties which are driver extensions and set into the driver part of the
state would get lost.

So also not great.

> > Another option would be if msm just copies the private state it needs
> > to not go boom.
> >
> > Doing this unconditionally might break other drivers that rely on
> > private state not being duplicated, but I guess that would also be
> > somewhat of a driver bug.
> 
> I guess we could duplicate our own version of
> drm_atomic_helper_suspend().. or maybe add a 'duplicate_global' param
> to drm_atomic_helper_suspend().
> 
> I'm not too sure how many drivers these days are using global atomic
> state, so not sure how many would be potentially broken, but opting in
> to duplicating global state seems reasonable if necessary.

dp mst uses it to track it's stuff at least, and I think it's spreading
quite a bit into drivers using them to track all kinds of things, not just
a single global state.
-Daniel

> 
> BR,
> -R
> 
> > -Daniel
> >
> > >
> > > BR,
> > > -R
> > >
> > > > -Daniel
> > > >
> > > >
> > > > > $ echo mem > /sys/power/state
> > > > > [   38.111144] PM: suspend entry (deep)
> > > > > [   38.111185] PM: Syncing filesystems ... done.
> > > > > [   38.114630] Freezing user space processes ... (elapsed 0.001 seconds) done.
> > > > > [   38.115912] OOM killer disabled.
> > > > > [   38.115914] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> > > > > [   38.122170] ------------[ cut here ]------------
> > > > > [   38.122212] WARNING: CPU: 0 PID: 1747 at drivers/gpu/drm/msm/disp/mdp5/mdp5_pipe.c:145 mdp5_pipe_release+0x90/0xc0
> > > > > [   38.122215] Modules linked in:
> > > > > [   38.122222] CPU: 0 PID: 1747 Comm: sh Not tainted 4.19.107-00515-g9d5e4d7a33ed-dirty #323
> > > > > [   38.122224] Hardware name: Square, Inc. T2 Devkit (DT)
> > > > > [   38.122228] pstate: 40000005 (nZcv daif -PAN -UAO)
> > > > > [   38.122230] pc : mdp5_pipe_release+0x90/0xc0
> > > > > [   38.122233] lr : mdp5_pipe_release+0x90/0xc0
> > > > > [   38.122235] sp : ffff00000d13b7f0
> > > > > [   38.122236] x29: ffff00000d13b7f0 x28: 0000000000000000
> > > > > [   38.122240] x27: 0000000000000002 x26: ffff800079adce00
> > > > > [   38.122243] x25: ffff800079405200 x24: 0000000000000000
> > > > > [   38.122246] x23: ffff80007a78cc08 x22: ffff80007b1cc018
> > > > > [   38.122249] x21: ffff80007b1cc000 x20: ffff80007b317080
> > > > > [   38.122252] x19: ffff80007a78ce80 x18: 0000000000020000
> > > > > [   38.122255] x17: 0000000000000000 x16: 0000000000000000
> > > > > [   38.122258] x15: 00000000fffffff0 x14: ffff000008c3fb48
> > > > > [   38.122261] x13: ffff000008cdac4a x12: ffff000008c3f000
> > > > > [   38.122264] x11: 0000000000000000 x10: ffff000008cda000
> > > > > [   38.122267] x9 : 0000000000000000 x8 : ffff000008ce4a40
> > > > > [   38.122269] x7 : 0000000000000000 x6 : 0000000039ea41a9
> > > > > [   38.122272] x5 : 0000000000000000 x4 : 0000000000000000
> > > > > [   38.122275] x3 : ffffffffffffffff x2 : c7580c109cae4500
> > > > > [   38.122278] x1 : 0000000000000000 x0 : 0000000000000024
> > > > > [   38.122281] Call trace:
> > > > > [   38.122285]  mdp5_pipe_release+0x90/0xc0
> > > > > [   38.122288]  mdp5_plane_atomic_check+0x2c0/0x448
> > > > > [   38.122294]  drm_atomic_helper_check_planes+0xd0/0x208
> > > > > [   38.122298]  drm_atomic_helper_check+0x38/0xa8
> > > > > [   38.122302]  drm_atomic_check_only+0x3e8/0x630
> > > > > [   38.122305]  drm_atomic_commit+0x18/0x58
> > > > > [   38.122309]  __drm_atomic_helper_disable_all.isra.12+0x15c/0x1a8
> > > > > [   38.122312]  drm_atomic_helper_suspend+0x80/0xf0
> > > > > [   38.122316]  msm_pm_suspend+0x4c/0x70
> > > > > [   38.122320]  dpm_run_callback.isra.6+0x20/0x68
> > > > > [   38.122323]  __device_suspend+0x110/0x308
> > > > > [   38.122326]  dpm_suspend+0x100/0x1f0
> > > > > [   38.122329]  dpm_suspend_start+0x64/0x70
> > > > > [   38.122334]  suspend_devices_and_enter+0x110/0x500
> > > > > [   38.122336]  pm_suspend+0x268/0x2c0
> > > > > [   38.122339]  state_store+0x88/0x110
> > > > > [   38.122345]  kobj_attr_store+0x14/0x28
> > > > > [   38.122352]  sysfs_kf_write+0x3c/0x50
> > > > > [   38.122355]  kernfs_fop_write+0x118/0x1e0
> > > > > [   38.122360]  __vfs_write+0x30/0x168
> > > > > [   38.122363]  vfs_write+0xa4/0x1a8
> > > > > [   38.122366]  ksys_write+0x64/0xe8
> > > > > [   38.122368]  __arm64_sys_write+0x18/0x20
> > > > > [   38.122374]  el0_svc_common+0x6c/0x178
> > > > > [   38.122377]  el0_svc_compat_handler+0x1c/0x28
> > > > > [   38.122381]  el0_svc_compat+0x8/0x18
> > > > > [   38.122383] ---[ end trace 24145b7d8545345b ]---
> > > > > [   38.491552] Disabling non-boot CPUs ...
> > > > >
> > > > > Signed-off-by: Shawn Guo <shawn.guo at linaro.org>
> > > > > ---
> > > > >  drivers/gpu/drm/drm_atomic_helper.c | 16 ++++++++++++++++
> > > > >  1 file changed, 16 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > > > > index 85d163f16801..024985a92156 100644
> > > > > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > > > > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > > > > @@ -3140,6 +3140,7 @@ drm_atomic_helper_duplicate_state(struct drm_device *dev,
> > > > >         struct drm_atomic_state *state;
> > > > >         struct drm_connector *conn;
> > > > >         struct drm_connector_list_iter conn_iter;
> > > > > +       struct drm_private_obj *priv_obj;
> > > > >         struct drm_plane *plane;
> > > > >         struct drm_crtc *crtc;
> > > > >         int err = 0;
> > > > > @@ -3184,6 +3185,16 @@ drm_atomic_helper_duplicate_state(struct drm_device *dev,
> > > > >         }
> > > > >         drm_connector_list_iter_end(&conn_iter);
> > > > >
> > > > > +       drm_for_each_privobj(priv_obj, dev) {
> > > > > +               struct drm_private_state *priv_state;
> > > > > +
> > > > > +               priv_state = drm_atomic_get_private_obj_state(state, priv_obj);
> > > > > +               if (IS_ERR(priv_state)) {
> > > > > +                       err = PTR_ERR(priv_state);
> > > > > +                       goto free;
> > > > > +               }
> > > > > +       }
> > > > > +
> > > > >         /* clear the acquire context so that it isn't accidentally reused */
> > > > >         state->acquire_ctx = NULL;
> > > > >
> > > > > @@ -3278,6 +3289,8 @@ int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state,
> > > > >         struct drm_connector_state *new_conn_state;
> > > > >         struct drm_crtc *crtc;
> > > > >         struct drm_crtc_state *new_crtc_state;
> > > > > +       struct drm_private_state *new_priv_state;
> > > > > +       struct drm_private_obj *priv_obj;
> > > > >
> > > > >         state->acquire_ctx = ctx;
> > > > >
> > > > > @@ -3290,6 +3303,9 @@ int drm_atomic_helper_commit_duplicated_state(struct drm_atomic_state *state,
> > > > >         for_each_new_connector_in_state(state, connector, new_conn_state, i)
> > > > >                 state->connectors[i].old_state = connector->state;
> > > > >
> > > > > +       for_each_new_private_obj_in_state(state, priv_obj, new_priv_state, i)
> > > > > +               state->private_objs[i].old_state = priv_obj->state;
> > > > > +
> > > > >         ret = drm_atomic_commit(state);
> > > > >
> > > > >         state->acquire_ctx = NULL;
> > > > > --
> > > > > 2.17.1
> > > > >
> > > >
> > > >
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list