[Intel-gfx] [PATCH 07/32] drm/i915: Store the reset counter when constructing a request
Chris Wilson
chris at chris-wilson.co.uk
Wed Dec 16 02:19:31 PST 2015
On Wed, Dec 16, 2015 at 10:44:19AM +0100, Daniel Vetter wrote:
> On Fri, Dec 11, 2015 at 11:33:03AM +0000, Chris Wilson wrote:
> > As the request is only valid during the same global reset epoch, we can
> > record the current reset_counter when constructing the request and reuse
> > it when waiting upon that request in future. This removes a very hairy
> > atomic check serialised by the struct_mutex at the time of waiting and
> > allows us to transfer those waits to a central dispatcher for all
> > waiters and all requests.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>
> "global reset epoch" got me thinking about what the implications for TDR
> are. Now the current TDR patches iirc are still somewhat tacked on the
> side of the existing reset handling, and so don't end up incrementing the
> reset counter.
>
> But I don't agree with that design decision, and I think any reset must
> increment the counter (and do the reset-in-progress flag dance), since
> that's the only way to guarantee that waiters will get off locks. Even
> though we have fewer of those with every kernel release.
>
> But then there's the problem that at least for some request they'll still
> be valid after the reset, and would break with this design here. Which can
> easily be fixed by incrementing the reset epoch for every request we
> decide should keep running (or which gets re-scheduled/emitted for another
> attempt), and doing that explicitly seems like a good idea.
>
> The only implication I see is that for per-ring reset we might want to go
> to a per-engine reset epoch counter.
Yes, I think per-engine epoch's will be required. (I've been trying to
choose my words carefully so that it is clear that it only applies to
today and not in a TDR future!)
> So given all that I think this is a solid idea. But one comment below as a
> fallout.
>
> > ---
> > drivers/gpu/drm/i915/i915_drv.h | 2 +-
> > drivers/gpu/drm/i915/i915_gem.c | 40 +++++++++++----------------------
> > drivers/gpu/drm/i915/intel_display.c | 7 +-----
> > drivers/gpu/drm/i915/intel_lrc.c | 7 ------
> > drivers/gpu/drm/i915/intel_ringbuffer.c | 6 -----
> > 5 files changed, 15 insertions(+), 47 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 1043ddd670a5..f30c305a6889 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2178,6 +2178,7 @@ struct drm_i915_gem_request {
> > /** On Which ring this request was generated */
> > struct drm_i915_private *i915;
> > struct intel_engine_cs *ring;
> > + unsigned reset_counter;
> >
> > /** GEM sequence number associated with the previous request,
> > * when the HWS breadcrumb is equal to this the GPU is processing
> > @@ -3059,7 +3060,6 @@ void __i915_add_request(struct drm_i915_gem_request *req,
> > #define i915_add_request_no_flush(req) \
> > __i915_add_request(req, NULL, false)
> > int __i915_wait_request(struct drm_i915_gem_request *req,
> > - unsigned reset_counter,
> > bool interruptible,
> > s64 *timeout,
> > struct intel_rps_client *rps);
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 27e617b76418..b17cc0e42a4f 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1214,7 +1214,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> > /**
> > * __i915_wait_request - wait until execution of request has finished
> > * @req: duh!
> > - * @reset_counter: reset sequence associated with the given request
> > * @interruptible: do an interruptible wait (normally yes)
> > * @timeout: in - how long to wait (NULL forever); out - how much time remaining
> > *
> > @@ -1229,7 +1228,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> > * errno with remaining time filled in timeout argument.
> > */
> > int __i915_wait_request(struct drm_i915_gem_request *req,
> > - unsigned reset_counter,
> > bool interruptible,
> > s64 *timeout,
> > struct intel_rps_client *rps)
> > @@ -1288,7 +1286,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> >
> > /* We need to check whether any gpu reset happened in between
> > * the caller grabbing the seqno and now ... */
> > - if (reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
> > + if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
>
> READ_ONCE(req->reset) as a future proofing maybe for TDR? Or punt that to
> TDR? Could be confusing to READ_ONCE something that's (with the current
> code at least) invariant.
Right, my plan was to solve this in TDR! At the moment, I am thinking
that we regenerate a new request for each one resubmitted (that requires
patching the seqno embedded into the ring before restarting the ring),
but otherwise we can just duplicate the contents of the request.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list