[Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: request ring to be pinned above GUC_WOPCM_TOP

Fri Dec 23 07:44:54 UTC 2016

On Thu, Dec 22, 2016 at 03:15:03PM -0800, Daniele Ceraolo Spurio wrote:
> 
> 
> On 22/12/16 14:23, Patchwork wrote:
> >== Series Details ==
> >
> >Series: drm/i915: request ring to be pinned above GUC_WOPCM_TOP
> >URL   : https://patchwork.freedesktop.org/series/17147/
> >State : failure
> >
> >== Summary ==
> >
> >Series 17147v1 drm/i915: request ring to be pinned above GUC_WOPCM_TOP
> >https://patchwork.freedesktop.org/api/1.0/series/17147/revisions/1/mbox/
> >
> >Test gem_busy:
> >        Subgroup basic-busy-default:
> >                pass       -> FAIL       (fi-hsw-4770)
> >                pass       -> FAIL       (fi-hsw-4770r)
> >                pass       -> FAIL       (fi-byt-j1900)
> >        Subgroup basic-hang-default:
> >                pass       -> FAIL       (fi-hsw-4770)
> >                pass       -> FAIL       (fi-hsw-4770r)
> >Test gem_wait:
> >        Subgroup basic-busy-all:
> >                pass       -> FAIL       (fi-ivb-3770)
> >                pass       -> FAIL       (fi-hsw-4770)
> >                pass       -> FAIL       (fi-ivb-3520m)
> >        Subgroup basic-wait-all:
> >                pass       -> FAIL       (fi-ivb-3770)
> >                pass       -> FAIL       (fi-hsw-4770)
> >                pass       -> FAIL       (fi-byt-j1900)
> 
> Clearly moving the ring for legacy submission as well was a bad
> idea, although by looking at the logs I'm not clear as of why we're
> getting these hangs and I don't have any pre-Gen8 device to try and
> reproduce locally. I'll wait until tomorrow to see if there are any
> comments on the patch and then I'll submit v2 with the change in
> offset gated by HAS_GUC_SCHED.

Well they are all gen7 which had the issue of MI_STORE_DWORD_IMM stopping
after a ring wrap to address 0, i.e. why there's a 4096 bias in there.
Now, it is likely that you've made a hole large enough for something
else to creep into address 0 which is equally upsetting to the GPU. No
rational explanation, just a theory.

But for the sake of not fragmenting our aperture space needlessly (and
these will be allocated from bottom-up in the future) we should compute
the bias per-gen. Compute a value in the context, alongside ring->size,
and pass it in would be my approach. kernel_context would then have a
conservative bias due to enable_execlists being unresolved atm, but
later contexts would not be as restricted (for !guc). It should then be
possible to apply that bias to both contexts/rings equally, or at least
consistently.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre