[Intel-gfx] [PATCH 1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno

Imre Deak imre.deak at intel.com
Wed Jun 10 08:26:58 PDT 2015


On ke, 2015-06-10 at 08:10 -0700, Jesse Barnes wrote:
> On 06/10/2015 03:59 AM, Imre Deak wrote:
> > I think the discussion here is about two separate things:
> > 1. Possible ordering issue between the seqno store and the completion
> > interrupt
> > 2. Coherency issue that leaves the CPU with a stale view of the seqno
> > indefinitely, which this patch works around
> > 
> > I'm confident that in my case the problem is not due to ordering. If it
> > was "only" ordering then the value would show up eventually. This is not
> > the case though, __wait_for_request will see the stale value
> > indefinitely even though it gets woken up periodically afterwards by the
> > lost IRQ logic (with hangcheck disabled).
> 
> Yeah, based on your workaround it sounds like the write from the CS is
> landing in memory but failing to invalidate the associated CPU
> cacheline.  I assume mapping the HWSP as uncached also works around this
> issue?

I assume it would, but it would of course have a bigger overhead. Based
on my testing the coherency problem happens only occasionally, so for
the rest of the times we still would want to benefit from cached reads.
See especially __i915_spin_request().

> I wonder if this is just an issue with GGTT mappings on BXT.  If we had
> per context HSWPs using PPGTT (and maybe even 48 bit PPGTT) mappings,
> the snoop may be performed correctly...  Looks like we don't have a
> store_dword variant for explicit coherent or incoherent buffer writes
> (though it does test PPGTT writes at least); that would make this easy
> to test.

Well, I tried to play around with the GGTT PTE/PAT flags to see if we're
not setting something or there is a bit that is significant despite the
specification. The spec says it's only the snoop flag that matters and
everything maps to PAT index 0. That looks correct in our code. In any
case I would expect that the MI_STORE_DATA_IMM would be coherent since
this is explicitly stated in the spec.

I also added a new testcase to gem_storedw_batches_loop that tests the
same thing via PPGTT mappings, it fails the same way.

--Imre



More information about the Intel-gfx mailing list