[Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
Jesse Barnes
jbarnes at virtuousgeek.org
Thu May 21 13:29:25 PDT 2015
On 05/21/2015 06:00 AM, Chris Wilson wrote:
> On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote:
>> On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
>>> On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
>>>> On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
>>>>> With the advent of mmap(wc), we have a path to write directly into
>>>>> active GPU buffers. When combined with async updates (i.e. avoiding the
>>>>> explicit domain management along with the memory barriers and GPU
>>>>> stalls) we start to see the GPU read the wrong values from memory - i.e.
>>>>> we have insufficient memory barriers along the execbuffer path. Writes
>>>>> through the GTT should have been naturally serialised with execution
>>>>> through the GTT as well and so the impact only seems to be from the WC
>>>>> paths.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>>> Cc: Akash Goel <akash.goel at intel.com>
>>>>> Cc: stable at vger.kernel.org
>>>>
>>>> Do we have a nasty igt for this? Bugzilla?
>>>
>>> I've added igt/gem_streaming_writes.
>>>
>>> That wmb() is not enough for !llc. Since the wmb() made piglit happy it
>>> is quite possible I haven't hit the same path exactly, but it's going to
>>> take some investigation to see if igt/gem_streaming_writes can possibly
>>> work on !llc.
>>
>> Humbug.
>>
>> Found the bug in gem_streaming_writes, even though I still think the
>> wmb() is strictly required, it runs fine without (presumably I haven't
>> managed to avoid all barriers in the execbuffer path yet). However, I
>> think can improve the stress by inserting extra gpu load -- that should
>> help make the CPU writes / GPU reads of the buffer concurrent?
>
> Just a small update. I haven't found a way to reproduce this in igt yet,
> but I can still observe the effect using vbo-map-unsync and the fix
> there is the above patch to make the wmb() unconditional.
>
> We need to put this into stable@ reasonably quickly (I suspect some of
> the 4.0 mmap(wc) regressions are due to this as well).
So the symptom is that the GPU picks up older values from memory, and
your theory is that the wmb() kicks the values out of the store buffer
or WC buffer prior to the execbuf?
I think that's reasonable, and I'm hoping "globally visible" is spec'd
to include the GPU and other system agents in the sfence definition.
Jesse
More information about the Intel-gfx
mailing list