[Intel-gfx] [PATCH 24/51] drm/i915: Update deferred context creation to do explicit request management

Fri Feb 27 05:50:54 PST 2015

On Fri, Feb 27, 2015 at 12:45:19PM +0000, John Harrison wrote:
> On 25/02/2015 21:15, Daniel Vetter wrote:
> >On Wed, Feb 18, 2015 at 03:27:38PM +0000, John Harrison wrote:
> >>On 13/02/2015 12:15, Chris Wilson wrote:
> >>>On Fri, Feb 13, 2015 at 11:48:33AM +0000, John.C.Harrison at Intel.com wrote:
> >>>>From: John Harrison <John.C.Harrison at Intel.com>
> >>>>
> >>>>In execlist mode, context initialisation is deferred until first use of the
> >>>>given context. This is because execlist mode has many more contexts than legacy
> >>>>mode and many are never actually used.
> >>>That's not correct. There are no more contexts in execlists than legacy.
> >>>There are more ringbuffers, or rather the contexts have an extra state
> >>>object associated with them.
> >>Okay, I should have said sub-contexts. Or context state objects. Or
> >>something.
> >per-engine ctx state? Naming stuff is hard ;-)
> >
> >>>>>Previously, the initialisation commands
> >>>>were written to the ring and tagged with some random request structure via the
> >>>>OLR. This seemed to be causing a null pointer deference bug under certain
> >>>>circumstances (BZ:40112).
> >>>>
> >>>>This patch adds explicit request creation and submission to the deferred
> >>>>initialisation code path. Thus removing any reliance on or randomness caused by
> >>>>the OLR.
> >>>This is upside down though. The request should be referencing the
> >>>context (thus instantiating it on demand) and nothing in the context
> >>>allocation requires the request. The initialisation here should be during
> >>>i915_request_switch_context(), since it can be entirely shared with
> >>>legacy.
> >>>-Chris
> >>The request does reference the context - the alloc_reques() function takes a
> >>context object as a parameter. Thus it is impossible for the request to be
> >>used/supplied/required during context creation. The issue here is the lazy
> >>initialisation of the per ring context state which requires sending commands
> >>to the ring on first usage of the given context object on the given ring.
> >>
> >>One problem is that the initialisation request and the batch buffer request
> >>cannot be merged at the moment. They both use request->batch_obj for
> >>tracking the command object. Thus this patch only works due to the deferred
> >>intialisation occurring during the i915_gem_validate_context() call very
> >>early on in execbuffer() rather than as part of the context switch within
> >>the batch buffer execution which is much later.
> >My request struct doesn't have a batch_obj pointer. Where is that from and
> >why do we need it? Atm just chasing Chris' comments, haven't read the full
> >series yet.
> It should do! It was added way back in June 2013 by Mika in 'drm/i915: add
> batch bo to i915_add_request()'. So unless someone has removed it again
> since I last fetched a tree, you should definitely have it.

I must have been blind. Now I've found both the batch_obj and how it's
used in the renderstate. And that usage looks wrong since
request->batch_obj is used to assign blame to userspace batches for
arb_robustness gpu hang stats. The render state init batch is created by
the kernel, per definition userspace can't be responsible for hangs in
there.

Imo we should replace the batch obj argument in i915_gem_render_state_init
with NULL.

Would this allow you to resolve the inversion Chris brought up? Or is
there something else which would prevent us from putting the deferred
render ring init into the same request as the first execbuf?

> >>I'm not sure what you mean by i915_request_switch_context(). The existing
> >>i915_switch_context() does now take just a request structure rather than a
> >>ring/ringbuf/context mixture. However, it is not really a good idea to do
> >>the context switch automatically as part of creating the request. The
> >>request creation and request execution could be quite separated in time,
> >>especially with a scheduler.
> >>
> >>It should be possible to move the deferred initialisation within the context
> >>switch if the object tracking can be resolved. Thus they could share the
> >>same request and there would not be effectively two separate execution calls
> >>at the hardware level. Again, that's potentially work that could be done as
> >>a follow up task of improving the context management independent of the
> >>current task of removing the OLR.
> >I think the biggest risk with adding a separate request for the lrc
> >deferred init is in accidentally nesting request when someone moves around
> >the lrc validation. Atm it's at the top of execbuf but we tend to shuffle
> >things around a lot.
> >
> >Is there some simple WARN_ON we could smash into the alloc function to
> >make sure this never happens? ring->olr would be it, but since we want to
> >kill that that's not great. Or do I see risks which aren't really there?
> >-Daniel
> 
> There is a WARN_ON that the batch object in the request structure does not
> get overwritten. Although, that could only happen if the lazy setup and the
> batch buffer execution were sharing the same request. Nesting multiple
> requests shouldn't really be a problem. Without the OLR, there is no
> globally shared state. Having multiple requests being created in parallel is
> fine. The lazy context setup must be completed before the execbuffer is
> executed otherwise Bad Things are going to happen irrespective of request
> usage. So even if the execbuffer request is created first, the lazy setup
> will still happen at the right time without breaking the execbuffer's
> request.

Ok, sounds like we'd be covered. I just wanted to make sure we're not
ignoring potential issues here, but if requests are free-standing already
as-is (well after this series) we're hopefully fine.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch