[Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
Daniel Vetter
daniel at ffwll.ch
Sat Jun 8 23:22:00 CEST 2013
On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde <C.Emde at osadl.org> wrote:
> On 04/08/2013 11:54 AM, Daniel Vetter wrote:
>>
>> On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote:
>>>
>>> On Thu, 4 Apr 2013 21:31:03 +0100
>>> Chris Wilson <chris at chris-wilson.co.uk> wrote:
>>>
>>>> In order to fully serialize access to the fenced region and the update
>>>> to the fence register we need to take extreme measures on SNB+, and
>>>> manually flush writes to memory prior to writing the fence register in
>>>> conjunction with the memory barriers placed around the register write.
>>>>
>>>> Fixes i-g-t/gem_fence_thrash
>>>>
>>>> v2: Bring a bigger gun
>>>> v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
>>>> v4: Remove changes for working generations.
>>>> v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.
>>>> v6: Rewrite comments to ellide forgotten history.
>>>>
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>> Cc: Jon Bloomfield <jon.bloomfield at intel.com>
>>>> Tested-by: Jon Bloomfield <jon.bloomfield at intel.com> (v2)
>>>> Cc: stable at vger.kernel.org
>>>> ---
>>>> drivers/gpu/drm/i915/i915_gem.c | 28 +++++++++++++++++++++++-----
>>>> 1 file changed, 23 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>>> b/drivers/gpu/drm/i915/i915_gem.c
>>>> index fa4ea1a..8f7739e 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>> @@ -2689,17 +2689,35 @@ static inline int fence_number(struct
>>>> drm_i915_private *dev_priv,
>>>> return fence - dev_priv->fence_regs;
>>>> }
>>>>
>>>> +static void i915_gem_write_fence__ipi(void *data)
>>>> +{
>>>> + wbinvd();
>>>> +}
>>>> +
>>>> static void i915_gem_object_update_fence(struct drm_i915_gem_object
>>>> *obj,
>>>> struct drm_i915_fence_reg
>>>> *fence,
>>>> bool enable)
>>>> {
>>>> - struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
>>>> - int reg = fence_number(dev_priv, fence);
>>>> -
>>>> - i915_gem_write_fence(obj->base.dev, reg, enable ? obj : NULL);
>>>> + struct drm_device *dev = obj->base.dev;
>>>> + struct drm_i915_private *dev_priv = dev->dev_private;
>>>> + int fence_reg = fence_number(dev_priv, fence);
>>>> +
>>>> + /* In order to fully serialize access to the fenced region and
>>>> + * the update to the fence register we need to take extreme
>>>> + * measures on SNB+. In theory, the write to the fence register
>>>> + * flushes all memory transactions before, and coupled with the
>>>> + * mb() placed around the register write we serialise all memory
>>>> + * operations with respect to the changes in the tiler. Yet, on
>>>> + * SNB+ we need to take a step further and emit an explicit
>>>> wbinvd()
>>>> + * on each processor in order to manually flush all memory
>>>> + * transactions before updating the fence register.
>>>> + */
>>>> + if (HAS_LLC(obj->base.dev))
>>>> + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1);
>>>> + i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL);
>>>>
>>>> if (enable) {
>>>> - obj->fence_reg = reg;
>>>> + obj->fence_reg = fence_reg;
>>>> fence->obj = obj;
>>>> list_move_tail(&fence->lru_list,
>>>> &dev_priv->mm.fence_list);
>>>> } else {
>>>
>>>
>>> I almost wish x86 just gave up and went fully weakly ordered. At least
>>> then we'd know we need barriers everywhere, rather than just in
>>> mystical places.
>>>
>>> Reviewed-by: Jesse Barnes <jbarnes at virtuousgeek.org>
>>
>>
>> Queued for -next, thanks for the patch. I am a bit unhappy that the
>> testcase doesn't work too well and can't reliably reproduce this on all
>> affected machines. But I've tried a bit to improve things and it's very
>> fickle. I guess we just have to accept this :(
>
>
> Under real-time conditions when we expect deterministic response to
> external and internal events at any time within a couple of
> microseconds, invalidating and flushing the entire cache in a running
> system is unacceptable. This is introducing latencies of several
> milliseconds which was clearly shown in RT regression tests on a kernel
> with this patch applied (https://www.osadl.org/?id=1543#c7602). We
> therefore have to revert it in the RT patch queue - kind of a
> workaround of a workaround.
>
> Would simply be wonderful, if we could get rid of the hateful wbinvd().
I'm somewhat surprised people have not started to scream earlier about
this one ;-)
We're trying to figure out whether there's a less costly w/a (we have
some benchmark where it kills performance, too) but until that's done
we pretty much need to stick with it. If you want to avoid any bad
side-effects due to that w/a you can alternatively pin all gpu
rendering tasks to the same cpu core/thread.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list