[Intel-gfx] [PATCH 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem

Chris Wilson chris at chris-wilson.co.uk
Wed Sep 16 01:17:53 PDT 2015


On Tue, Sep 15, 2015 at 09:30:20PM +0300, Imre Deak wrote:
> The execlist context object is mapped with a CPU/GPU coherent mapping
> everywhere, but on BXT A stepping due to a HW issue the coherency is not
> guaranteed. To work around this flush the CPU cache after any change
> from the CPU to the context object. Note that this also includes any
> changes done by the VM core as opposed to the driver, when
> reading from backing store/bzeroing the pages.
> 
> I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> opcode value. I couldn't find this value on the ring but looking at the
> contents of the active context object it turned out to be a parameter
> dword of a bigger command there. The original command opcode itself
> was zeroed out, based on the above I assume due to a CPU writeback of
> the corresponding cacheline. When restoring the context the GPU would
> jump over the zeroed out opcode and hang when trying to execute the
> above parameter dword.
> 
> I could easily reproduce this by running igt/gem_render_copy_redux and
> gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> anything involving frequent switches between two separate contexts. With
> this workaround I couldn't reproduce the problem.
> 
> Note that I also considered using set_pages_array_uc/wc on the context
> object but this wouldn't work with kmap_atomic which always returns a WB
> mapping, at least on HIGHMEM. The alternative would be keeping a UC/WC
> kernel mapping around whenever the context object is pinned, but this
> would be a bigger change. Since I'm not sure if there would be any
> benefit in using set_pages_array, I chose the simpler clflush method.

Nope. Fix execlists to use correct GEM domain management. From
experience the whole context object needs to be flushed if no longer
coherent.

Are you absolutely sure that you want to enable snooping on those pages
since that historically would be bogus? I would expect some strong
bspec reference saying that it is legal.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list