[Intel-gfx] [PATCH 3/3] drm/i915: Enable render context support for gen4 (Broadwater to Cantiga)
Chris Wilson
chris at chris-wilson.co.uk
Thu Jan 10 16:32:34 UTC 2019
Quoting Ville Syrjälä (2019-01-10 16:03:21)
> On Thu, Jan 10, 2019 at 10:38:07AM +0000, Chris Wilson wrote:
> > Broadwater and the rest of gen4 do support being able to saving and
> > reloading context specific registers between contexts, providing isolation
> > of the basic GPU state (as programmable by userspace). This allows
> > userspace to assume that the GPU retains their state from one batch to the
> > next, minimising the amount of state it needs to reload and manually save
> > across batches.
> >
> > v2: CONSTANT_BUFFER woes
> >
> > Running through piglit turned up an interesting issue, a GPU hang inside
> > the context load. The context image includes the CONSTANT_BUFFER command
> > that loads an address into a on-gpu buffer, and the context load was
> > executing that immediately. However, since it was reading from the GTT
> > there is no guarantee that the GTT retains the same configuration as
> > when the context was saved, resulting in stray reads and a GPU hang.
> >
> > Having tried issuing a CONSTANT_BUFFER (to disable the command) from the
> > ring before saving the context to no avail, we resort to patching out
> > the instruction inside the context image before loading.
> >
> > This does impose that gen4 always reissues CONSTANT_BUFFER commands on
> > each batch, but due to the use of a shared GTT that was and will remain
> > a requirement.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > Cc: Kenneth Graunke <kenneth at whitecape.org>
> > Reviewed-by: Ville Syrjälä <ville.syrjala at linux.intel.com> #v1
> > ---
> > drivers/gpu/drm/i915/intel_engine_cs.c | 2 +-
> > drivers/gpu/drm/i915/intel_gpu_commands.h | 3 +++
> > drivers/gpu/drm/i915/intel_ringbuffer.c | 17 +++++++++++++++++
> > 3 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index f89b8f199e3f..88109e0de051 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -219,6 +219,7 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class)
> > return round_up(GEN6_CXT_TOTAL_SIZE(cxt_size) * 64,
> > PAGE_SIZE);
> > case 5:
> > + case 4:
> > /*
> > * There is a discrepancy here between the size reported
> > * by the register and the size of the context layout
> > @@ -235,7 +236,6 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class)
> > cxt_size * 64,
> > cxt_size - 1);
> > return round_up(cxt_size * 64, PAGE_SIZE);
> > - case 4:
> > case 3:
> > case 2:
> > /* For the special day when i810 gets merged. */
> > diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
> > index 105e2a9e874a..00c0175c37ed 100644
> > --- a/drivers/gpu/drm/i915/intel_gpu_commands.h
> > +++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
> > @@ -266,6 +266,9 @@
> > #define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_PS \
> > ((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x47<<16))
> >
> > +#define GFX_OP_CONSTANT_BUFFER \
> > + (0x3 << 29 | 0x0 << 27 | 0x0 << 24 | 0x2 << 16)
> > +
> > #define MFX_WAIT ((0x3<<29)|(0x1<<27)|(0x0<<16))
> >
> > #define COLOR_BLT ((0x2<<29)|(0x40<<22))
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 889f3de79dd0..21bd71cf2e94 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1632,6 +1632,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
> > len += 2 + (num_rings ? 4*num_rings + 6 : 0);
> > else if (IS_GEN(i915, 5))
> > len += 2;
> > + else if (IS_GEN(i915, 4))
> > + len += 4;
> > if (flags & MI_FORCE_RESTORE) {
> > GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
> > flags &= ~MI_FORCE_RESTORE;
> > @@ -1668,6 +1670,21 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
> > * this should never take effect and so be a no-op!
> > */
> > *cs++ = MI_SUSPEND_FLUSH | MI_SUSPEND_FLUSH_EN;
> > + } else if (IS_GEN(i915, 4)) {
> > + /*
> > + * Disable CONSTANT_BUFFER before it is loaded from the context
> > + * image. For as it is loaded, it is executed and the stored
> > + * address may no longer be valid, leading to a GPU hang.
> > + *
> > + * This imposes the requirement that userspace reload their
> > + * CONSTANT_BUFFER on every batch, fortunately a requirement
> > + * they are already accustomed to from before contexts were
> > + * enabled.
> > + */
> > + *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
> > + *cs++ = 0;
> > + *cs++ = i915_ggtt_offset(rq->hw_context->state) + 0x1d4;
>
> Is that offset correct for ctg/elk? The docs suggest that it is not.
> Though it looks like it'd land inside 3DSTATE_SAMPLER_PALETTE_LOAD_1
> so probably not something anyone would notice.
I just checked piglit didn't break. Admittedly a bit blaise :)
> > + *cs++ = GFX_OP_CONSTANT_BUFFER; /* inactive */
>
> I'm thinking there are a few ways in which we could even end up
> with an inconsistent curbe setup in the context.
>
> Eg. something like this:
> CS_URB_STATE(num>0)
> URB_FENCE(cs>0)
> CONSTANT_BUFFER(len>0)
> CS_URB_STATE(num=0)
> URB_FENCE(cs=0)
> ctx save
> ctx restore
>
> The restore would end up trying to issue CONSTANT_BUFFER with
> len > 0 even though nothing is allocated for the CS unit in
> the URB.
>
> That didn't seem to be the case in your earlier hang though
> so not quite sure why it was hanging. And I'm not sure this
> would even cause a hang. The spec just says "undefined".
>
> Bspec has this to say about CONSTANT_BUFFER:
> "Programming Notes
> Constant Buffers can only be allocated in linear (not tiled) graphics memory
> Constant Buffers can only be mapped to Main Memory (UC)"
>
> Maybe we were violating one of those? Though I don't think we could
> even violate the tiled vs. linear restriction unless the memory access
> goes through the gtt fence for some reason. I didn't think gen4+ do
> that anymore. So that maybe leaves the possibility that a snooped
> bo got mapped to the same address. But again I'm not sure if that
> would cause a hang or just potentiall stale data being loaded into
> the CURBE.
Agreed that tiling shouldn't be an issue with the gen4 architecture (and
that's probably just cut'n'paste from older). snooped is also unlikely
since we/mesa are definitely not allocating snooped buffers on the fly,
so the only snoop buffers should be the static HWSP -- and that's not
even in the GGTT for Crestline.
So perhaps it is an overlapping fence?
> Another mystery is why g4x/ilk wouldn't need this. There's nothing in
> the docs to suggest that it should behave any differently. Hmm. Maybe
> MI_INVALIDATE_ISP could have something to do with this? Maybe that
> forces CONSTANT_BUFFER to be saved with valid==0 always? I guess it
> might be semi-interesting to peek at the resulting context image
> for.
The ctg/ilk context image is ugly and looks much more like line noise
(like there's a magically 255 dword 3DSTATE command full of junk.)
It could just be with the larger GGTT for g4x/ilk, piglit run gpu wasn't
able to trigger the same issue.
> After a bit of digging I found a few more potentially related
> tidbits:
>
> ECOSPKD[4] Constant Buffer Save/Restore Disable [DevBW-C1+]
>
> ILK_Chicken_1[7] ILK-C0: Curbe 8HW drop ECO
> 0: Fix is enabled to complete the 8HW CURBE restore FIFO clear
> 1: Fix is disable. Behavior is same as ILK B0
>
> ILK_Chicken_1[0] [DevILK-B0]
> “1” CS will force URB modify_enables set after context restore
> “0” CS will not force URB modify_enables set after context restore
>
> All of those default to 0 AFAICS, so only ILK_Chicken_1[7] seems like it
> could have any real effect. So wouldn't explain g4x working.
That does look interesting. Thanks,
-Chris
More information about the Intel-gfx
mailing list