[Intel-gfx] [PATCH v3 21/22] drm/atomic: Introduce drm_atomic_helper_duplicate_commited_state()

Fri Jul 7 15:18:12 UTC 2017

On Fri, Jul 07, 2017 at 04:05:28PM +0200, Daniel Vetter wrote:
> On Fri, Jul 7, 2017 at 3:21 PM, Ville Syrjälä
> <ville.syrjala at linux.intel.com> wrote:
> > On Fri, Jul 07, 2017 at 02:03:38PM +0200, Daniel Vetter wrote:
> >> On Thu, Jul 06, 2017 at 11:24:41PM +0300, ville.syrjala at linux.intel.com wrote:
> >> > From: Ville Syrjälä <ville.syrjala at linux.intel.com>
> >> >
> >> > For i915 GPU reset handling we'll want to be able to duplicate the state
> >> > that was last commited to the hardware. For that purpose let's start to
> >> > track the commited state for each object and provide a way to duplicate
> >> > the commmited state into a new drm_atomic_state. The locking for
> >> > .commited_state must to be provided by the driver.
> >> >
> >> > drm_atomic_helper_duplicate_commited_state() duplicates the state
> >> > to both old_state and new_state. For the purposes of i915 GPU reset we
> >> > would only need one of them, but we actually need two top level states;
> >> > one for disabling everything (which would need the duplicated state to
> >> > be old_state), and another to reenable everything (which would need the
> >> > duplicated state to be new_state). So to make it less comples I figured
> >> > I'd just always duplicate both. Might want to rethink this if for no
> >> > other reason that reducing the chances of memory allocation failure.
> >> > Due to the double state duplication we need
> >> > drm_atomic_helper_clean_commited_state() to clean up the duplicated
> >> > old_state since that's not handled by the normal drm_atomic_state
> >> > cleanup code.
> >> >
> >> > TODO: do we want this in the helper, or maybe it should be just in i915?
> >> >
> >> > v2: s/commited/committed/ everywhere (checkpatch)
> >> >     Handle state duplication errors better
> >> > v3: Even more care in dealing with memory allocation errors
> >> >     Handle private objs too
> >> >     Deal with the potential ordering issues between swap_state()
> >> >     and hw_done() by keeping track of which state was swapped in
> >> >     last
> >> >
> >> > Signed-off-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
> >>
> >> I still don't get why we need to duplicate the committed state for gpu
> >> reset. When I said I'm not against adding all that complexity long-term I
> >> meant when we actually really need it. Imo g4x gpu reset isn't a good
> >> justification for that, reworking the atomic world for that seems
> >> massively out of proportion.
> >
> > Well, I still don't see what's so "massive" about a couple of extra state
> > pointers hanging around.
> >
> > Also while doing that state duplication stuff, my old idea of
> > splitting the crtc disable and enable phases into separate atomic
> > commits popped up again in my head. For that being able to duplicate
> > arbitrary states would seem like a nice thing to have. So the
> > refactoring I had to do can have other uses.
> 
> I fully realize that you're unhappy with how atomic ended up getting
> merged and that you think it's a grave mistake. And maybe it is, maybe
> it isn't, I have no idea.

I don't think I ever said that. I've said that it has certain design
problems that we ought to fix. This one being one, and another being
to separate the user state from the internal state. The latter I think
we'll have to tackle rather soon thanks to some new hardware in the
pipeline, or we need to come up with some other way to achieve the
same effect.

> But right now we have _nothing_ asking for
> that reorg afaik, and using gen4 reset to justify it is in my opinion
> simply not solid engineering practice. Maybe we need this in the
> future, and then we can add it, but not before. Refactoring stuff to
> prettify the architecture isn't really useful work.

Neither is having to throw out code that already exists and works. If
you're so worried about time being wasted on pre-g4x GPU reset, then
we could just as well merge my code and move on to more productive
endeavors.

> 
> >> Why exactly can't we do this simpler? I still don't get that part. Is
> >> there really no solution that doesn't break atomic's current assumption
> >> that commits are fully ordered on a given crtc?
> >
> > From the point of view of the old and new states it doesn't actually
> > break that. The commits done from the reset path are essentially
> > invisible to the normal modeset operation.
> 
> You insert something into a fully ordered queue. That does break the
> entire concept and needs a pile of locks and stuff to make it work.

Exactly one lock. Well two if you could the spinlock to protect the
committed_state pointer update from parallel updates touching the same
kms object. That latter one could be removed if atomic wouldn't allow
parallel commits to touch the same object.

> Yes it's doable, but it's a redesign with all the implications of
> subtle breakage all over.

What? It doesn't really even do anything unless you do the
duplicate_committed state(). Everything else is just assigning pointers.
So unless there's some really obvious bug somewhere it can't break
anything outside the GPU reset path. And really the only way to break
to GPU reset path is to have actual bugs in the normal display commit
code.

> While we do have open bugs for the current
> design. Rewriting the world to fix a bug needs seriously better
> justification imo.
> 
> > The one alternative proposed idea of allowing gem and kms sides go
> > out of whack scares me a bit. I think that might land us in more
> > trouble when I finally get around to making the video overlay a
> > drm_plane.
> 
> We've run perfectly fine with this idea for years.

Not perfectly. I've had to fix it several times. And I don't think I was
the only one.

> 
> > And I think trying to keep the GPU reset paths as similar as possible
> > between all the platforms would be a nice thing. Just whacking
> > everything on the head with a hammer on one platform but not on
> > another one seems to me like extra variation in behaviour that we
> > don't necessarily want.
> >
> > But like I said, if someone can come up with a better solution I
> > probably wouldn't object too much. It's not going to be coming from me
> > though since I have plenty of other things to do and vacation time is
> > coming up very soon. So unless someone else comes up with something nice
> > soon I think we should just go with my solution because a) it's already
> > available, and b) works quite decently from what I can see.
> 
> I guess I'll have to retype the old thing in the new world, but it
> really shouldn't be more than the quick draft I've laid down in the
> old thread. This here is imo no-go with all the core changes, and even
> just done within i915 I think it's highly dubious that it provides a
> real benefit, since defacto it means we'll have to abandon the atomic
> helpers entirely.

Now I think you're just being difficult for the sake of it. Have you
looked at the code at all? It's fully done from the atomic helpers
right now. And even moving the committed state tracking to i915
wouldn't require abandoning the helpers. We could just update the
committed state pointers when we call hw_done(), and we'd have to
update the state seqno/age timestamp when we call swap_state().
That's all there is to this.

-- 
Ville Syrjälä
Intel OTC