[Intel-gfx] [PATCH] drm/i915: Increase context alignment requirement for Sandybridge

Wed Mar 23 22:42:37 UTC 2016

On Tue, Mar 22, 2016 at 02:07:24PM +0000, Chris Wilson wrote:
> In bugzilla, there are some very weird bugs on SNB GT1 whereby the
> seqno stop being written, but the GPU is otherwise functional, well the
> command streamer at least! However, since the seqno were not being
> updated any waits upon rendering results hung, triggering the GPU hang
> detector.
> 
> I found a very similar hang when running igt/gem_exec_whisper on a SNB
> GT1 and after playing around came to the conclusion that:
> 
> (a) it depends on timing, enabling debug and other slowdowns masks the
> bug;
> 
> (b) it was not context size, as increasing the allocation to 128KiB made
> no difference;
> 
> (c) it depended upon placement as restricting the binding to the
> mappable region works;
> 
> (d) it depended upon alignment of the context binding, though the bspec
> still only lists the restriction as 4k
> 
> Changing the alignment constrainst seems to be least intrusive, and
> though I have not been able to reproduce this on snb-gt2 and all the
> recent bugs to the best of my knowledge have been snb-gt1, it is safer
> to apply the constraint to all snb. Though I am still a little wary that
> is merely a side-effect that is papering over the issue (for example, it
> may be placement of the context next to another object that is causng
> the issue, or it may be finding the new alignment slows down context
> switches enough etc).

So, the hang came back (after switching to using an iomap rather than
the vmap).

An alternative workaround is to avoid the first page of the GTT! Either
by subtracting the first page from the drm_mm, or applying a bias to the
ringbuffer.

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0715bb7..f75e9a8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2726,6 +2726,7 @@ static int i915_gem_setup_global_gtt(struct drm_device *dev,
 
        BUG_ON(mappable_end > end);
 
+       start += PAGE_SIZE;
        ggtt_vm->start = start;
 
        /* Subtract the guard page before address space initialization to

or

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ce59850..f789f92 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2109,10 +2109,11 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 {
        struct drm_i915_private *dev_priv = to_i915(dev);
        struct drm_i915_gem_object *obj = ringbuf->obj;
+       unsigned flags = PIN_OFFSET_BIAS | PAGE_SIZE;
        int ret;
 
        if (HAS_LLC(dev_priv) && !obj->stolen) {
-               ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, 0);
+               ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, flags);
                if (ret)
                        return ret;
 
@@ -2128,7 +2129,7 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
                        return -ENOMEM;
                }
        } else {
-               ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
+               ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, flags | PIN_MAPPABLE);
                if (ret)
                        return ret;

> References: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=93262

Has the same RING_START==0 symptom. So still promising. Ideas?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre