[Intel-gfx] [PATCH] drm/i915: Prevent TLB error on first execution on SNB

Daniel Vetter daniel at ffwll.ch
Mon Feb 23 14:54:41 PST 2015


On Fri, Feb 13, 2015 at 02:12:48PM +0000, Chris Wilson wrote:
> On Fri, Feb 13, 2015 at 02:43:40PM +0100, Daniel Vetter wrote:
> > On Fri, Feb 13, 2015 at 12:59:45PM +0000, Chris Wilson wrote:
> > > Long ago I found that I was getting sporadic errors when booting SNB,
> > > with the symptom being that the first batch died with IPEHR != *ACTHD,
> > > typically caused by the TLB being invalid. These magically disappeared
> > > if I held the forcewake during the entire ring initialisation sequence.
> > > (It can probably be shortened to a short critical section, but the whole
> > > initialisation is full of register writes and so we would be taking and
> > > releasing forcewake almost continually, and so holding it over the
> > > entire sequence will probably be a net win!)
> > > 
> > > Note some of the kernels I encounted the issue already had the deferred
> > > forcewake release, so it is still relevant.
> > > 
> > > I know that there have been a few other reports with similar failure
> > > conditions on SNB, I think such as
> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=80913
> > > 
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > 
> > Given that we've already added a forcewake critical section around
> > individual ring inits this makes maybe a bit too much sense. But I do
> > wonder whether we don't need the same for resume and gpu resets?
> > 
> > With the split into hw/sw setup we could get that by pusing the
> > forcewake_get/put inti i915_gem_init_hw. Does the magic still work with
> > that? And if we put it there there fw_get/put in init_ring_common is fully
> > redundant and could be remove.
> 
> Hmm, my original thought was to keep the engine alive from the first
> programming of CTL up until we fed in the first request (which is the
> ppgtt/context init). We can add a second forcewake layer into init_hw to
> give the same security blanket for resume/reset. Sound reasonable?

With the split into sw/hw setup init_hw should be all that's needed for
coverage, nothing touches the hw outside of it. Hence I think the original
outer layer is redundant. Does the magic still work if we drop that part,
or have I missed some hw access (there really shouldn't be any)?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the Intel-gfx mailing list