[Intel-gfx] [CI 3/6] drm/i915: Stop the machine whilst capturing the GPU crash dump

Chris Wilson chris at chris-wilson.co.uk
Thu Oct 13 15:04:26 UTC 2016


On Thu, Oct 13, 2016 at 04:57:39PM +0200, Daniel Vetter wrote:
> On Wed, Oct 12, 2016 at 10:05:19AM +0100, Chris Wilson wrote:
> > The error state is purposefully racy as we expect it to be called at any
> > time and so have avoided any locking whilst capturing the crash dump.
> > However, with multi-engine GPUs and multiple CPUs, those races can
> > manifest into OOPSes as we attempt to chase dangling pointers freed on
> > other CPUs. Under discussion are lots of ways to slow down normal
> > operation in order to protect the post-mortem error capture, but what it
> > we take the opposite approach and freeze the machine whilst the error
> > capture runs (note the GPU may still running, but as long as we don't
> > process any of the results the driver's bookkeeping will be static).
> > 
> > Note that by of itself, this is not a complete fix. It also depends on
> > the compiler barriers in list_add/list_del to prevent traversing the
> > lists into the void. We also depend that we only require state from
> > carefully controlled sources - i.e. all the state we require for
> > post-mortem debugging should be reachable from the request itself so
> > that we only have to worry about retrieving the request carefully. Once
> > we have the request, we know that all pointers from it are intact.
> > 
> > v2: Avoid drm_clflush_pages() inside stop_machine() as it may use
> > stop_machine() itself for its wbinvd fallback.
> 
> Hm, won't this hurt us real bad on any atom with ppgtt? Maybe a big check
> gen check with a scary comment about why we can't call drm_clflush_pages
> on old machines? Iirc gen3+ should all be able to flush without
> stop_machine.

:) Patch 2 switched to using coherent reads through the GTT for all.
Everyone is now equal (and the nice part about that was that it
uncovered the WC bug from kernel 4.0!)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list