[Intel-gfx] [PATCH] drm/i915/ppgtt: Limit guilty hunt inside of relevant vm

Fri Jan 17 15:37:57 CET 2014

On Fri, Jan 17, 2014 at 04:29:31PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > On Fri, Jan 17, 2014 at 12:03:24PM +0200, Mika Kuoppala wrote:
> >> With full ppgtt, ACTHD is only relevant inside one context
> >> (address space). Trying to find guilty batch only relying
> >> on ACTHD, the result is false positives as ACTHD points
> >> inside batches on different address spaces.
> >> 
> >> Filter out nonrelated contexts by checking on which vm
> >> the ring was running on when the hang happened. Only after
> >> finding the relevant vm, use acthd to find the guilty
> >> batch inside it.
> >
> > Alternatively (or in addtion to) you could walk the request
> > list backwards and stop searching for guilty requests after
> > the first hit.
> 
> I took this idea and posted a patchset as a separate thread.
> 
> The approach you suggested feels more 'right' as it is lot
> less complex and we don't need acthd nor knowledge about address
> spaces to find the guilty.
> 
> Only drawback I can now think of is that if gpu hangs just
> after writing the seqno to hardware status page, we end up
> blaming the wrong request. But if this is a problem we could
> double check with acthd that they point to the same req.

In that eventuality, neither batch is guilty. Instead it is the driver
that is at fault for not working around the broken hardware.

Not sure how we would debug that other than through random trial and
error.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre