[Intel-gfx] [PATCH v2] drm/i915: Restore inhibiting the load of the default context

Daniel Vetter daniel at ffwll.ch
Thu Dec 10 06:03:36 PST 2015


On Thu, Dec 10, 2015 at 01:37:56PM +0000, Chris Wilson wrote:
> On Thu, Dec 10, 2015 at 03:24:52PM +0200, Francisco Jerez wrote:
> > Mika Kuoppala <mika.kuoppala at linux.intel.com> writes:
> > 
> > > Chris Wilson <chris at chris-wilson.co.uk> writes:
> > >
> > >> Following a GPU reset, we may leave the context in a poorly defined
> > >> state, and reloading from that context will leave the GPU flummoxed. For
> > >> secondary contexts, this will lead to that context being banned - but
> > >> currently it is also causing the default context to become banned,
> > >> leading to turmoil in the shared state.
> > >>
> > >> This is a regression from
> > >>
> > >> commit 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b [v4.1]
> > >> Author: Ben Widawsky <benjamin.widawsky at intel.com>
> > >> Date:   Mon Mar 16 16:00:58 2015 +0000
> > >>
> > >>     drm/i915: Initialize all contexts
> > >>
> > >> which quietly introduced the removal of the MI_RESTORE_INHIBIT on the
> > >> default context.
> > >>
> > 
> > AFAICT the removal of MI_RESTORE_INHIBIT in that commit seemed
> > justified.  Ben explained that it was needed to fix a pagefault in the
> > default context under certain conditions.  I don't know the details of
> > the original change (Ben CC'ed), but it seems like this would be trading
> > one bug for another?
> 
> A bug in a feature that never worked and isn't enabled?
>  
> > Other than that this opens the door again to subtle state leaks between
> > processes, as I learned recently while implementing L3 partitioning in
> > Mesa.  Mesa now changes the L3 configuration in ways that will break
> > assumptions from processes that use the default context (the DDX).  The
> > DDX assumes, for instance, that the URB size is set according to the
> > hardware defaults, but it doesn't actually program the URB size itself,
> > so in a way the DDX relies on MI_RESTORE_INHIBIT *not* to be set for the
> > default context for correct operation.  This commit breaks that
> > assumption and the kernel ABI.
> 
> Wrong the ABI is the other way around and has been for 10 years. Note
> the kernel isn't also ensuring that the default context has the default
> hardware values either. The "golden renderstate" also doesn't set all
> defaults and is also a recent introduction.
> 
> It changes the ABI back to what it was....

Well it seems folks like the new ABI, even if somewhat accidental. And
more isolation between clients can't hurt I think (well it does hurt perf
somewhat). So I guess we need a respun patch that just makes sure that we
restore the default context stuff after reset, probably by marking it as
unitialized? And then just the one the kernel has I guess?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-gfx mailing list