[Intel-gfx] [PATCH] drm/i915: Evict CS TLBs between batches

Mon Sep 8 12:36:30 CEST 2014

On Mon, Sep 08, 2014 at 09:15:50AM +0100, Chris Wilson wrote:
> On Mon, Sep 08, 2014 at 10:03:51AM +0200, Daniel Vetter wrote:
> > On Sun, Sep 07, 2014 at 09:08:31AM +0100, Chris Wilson wrote:
> > > Running igt, I was encountering the invalid TLB bug on my 845g, despite
> > > that it was using the CS workaround. Examining the w/a buffer in the
> > > error state, showed that the copy from the user batch into the
> > > workaround itself was suffering from the invalid TLB bug (the first
> > > cacheline was broken with the first two words reversed). Time to try a
> > > fresh approach. This extends the workaround to write into each page of
> > > our scratch buffer in order to overflow the TLB and evict the invalid
> > > entries. This could be refined to only do so after we update the GTT,
> > > but for simplicity, we do it before each batch.
> > > 
> > > I suspect this supersedes our current workaround, but for safety keep
> > > doing both.
> > 
> > I suspect that we might end up with just an elaborate delay
> > implementation, but if it works then it's good. One nitpick below, with
> > that addressed this is Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> 
> One way to test that is simply comparing 64x4096 byte writes in the same
> page vs 64x4 byte writes in 64 different pages. That should be roughly
> the same latency (thought with TLB fetches you never be too sure) and
> demonstrate that it is either the TLB or the delay that's the factor.

Quick update:

Wrote 256k into one page (instead of 4 byte write into each of 64
pages), hopefully testing the delay theory, and found it did not prevent
the corruption/hang.

Now trying to refine the estimate on the number of TLBs.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre