[Intel-gfx] [PATCH] drm/i915: Prevent recursion by retiring requests when the ring is full

Thu Feb 6 17:44:05 CET 2014

On Tue, Jan 28, 2014 at 11:25:41AM +0000, Chris Wilson wrote:
> On Tue, Jan 28, 2014 at 01:15:35PM +0200, Ville Syrjälä wrote:
> > On Mon, Jan 27, 2014 at 10:43:07PM +0000, Chris Wilson wrote:
> > > As the VM do not track activity of objects and instead use a large
> > > hammer to forcibly idle and evict all of their associated objects when
> > > one is released, it is possible for that to cause a recursion when we
> > > need to wait for free space on a ring and call retire requests.
> > > (intel_ring_begin -> intel_ring_wait_request ->
> > > i915_gem_retire_requests_ring -> i915_gem_context_free ->
> > > i915_gem_evict_vm -> i915_gpu_idle -> intel_ring_begin etc)
> > > 
> > > In order to remove the requirement for calling retire-requests from
> > > intel_ring_wait_request, we have to inline a couple of steps from
> > > retiring requests, notably we have to record the position of the request
> > > we wait for and use that to update the available ring space.
> > > 
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > 
> > Looks good to me.
> > Reviewed-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > 
> > I do have a couple of questions about request->tail though.
> > 
> > We set it to -1 in intel_ring_wait_request(). Isn't that going to cause
> > problems for i915_request_guilty()?
> > 
> > When not -1, request->tail points to just before the commands that
> > .add_request() adds to the ring. So that means intel_ring_wait_request()
> > might have to wait for one extra request, and I guess more importantly
> > if the GPU hangs inside the .add_request() commands, we won't attribute
> > the hang to the request in question. Was it designe to be that way, or
> > is there a bug here?
> 
> Ah good questions. I completely forgot about the behaviour here when we
> adjusted this for hangstats...
> 
> Setting it to -1 should not confuse the guilty search since that should
> be done (or at least changed so that is) based on a completion search.
> After making that change, we should be able to set request->tail back to
> being just after the request. Also I think we are quite safe to drop the
> manipulation of request->tail inside intel_ring_wait_request().

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch