[Intel-gfx] [PATCH] drm/i915: Prevent recursion by retiring requests when the ring is full

Chris Wilson chris at chris-wilson.co.uk
Tue Jan 28 12:25:41 CET 2014


On Tue, Jan 28, 2014 at 01:15:35PM +0200, Ville Syrjälä wrote:
> On Mon, Jan 27, 2014 at 10:43:07PM +0000, Chris Wilson wrote:
> > As the VM do not track activity of objects and instead use a large
> > hammer to forcibly idle and evict all of their associated objects when
> > one is released, it is possible for that to cause a recursion when we
> > need to wait for free space on a ring and call retire requests.
> > (intel_ring_begin -> intel_ring_wait_request ->
> > i915_gem_retire_requests_ring -> i915_gem_context_free ->
> > i915_gem_evict_vm -> i915_gpu_idle -> intel_ring_begin etc)
> > 
> > In order to remove the requirement for calling retire-requests from
> > intel_ring_wait_request, we have to inline a couple of steps from
> > retiring requests, notably we have to record the position of the request
> > we wait for and use that to update the available ring space.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> 
> Looks good to me.
> Reviewed-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
> 
> I do have a couple of questions about request->tail though.
> 
> We set it to -1 in intel_ring_wait_request(). Isn't that going to cause
> problems for i915_request_guilty()?
> 
> When not -1, request->tail points to just before the commands that
> .add_request() adds to the ring. So that means intel_ring_wait_request()
> might have to wait for one extra request, and I guess more importantly
> if the GPU hangs inside the .add_request() commands, we won't attribute
> the hang to the request in question. Was it designe to be that way, or
> is there a bug here?

Ah good questions. I completely forgot about the behaviour here when we
adjusted this for hangstats...

Setting it to -1 should not confuse the guilty search since that should
be done (or at least changed so that is) based on a completion search.
After making that change, we should be able to set request->tail back to
being just after the request. Also I think we are quite safe to drop the
manipulation of request->tail inside intel_ring_wait_request().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list