[Intel-gfx] [PATCH 3/3] drm/i915: Enable render context support for gen4 (Broadwater to Cantiga)

Chris Wilson chris at chris-wilson.co.uk
Thu Jan 10 16:32:34 UTC 2019


Quoting Ville Syrjälä (2019-01-10 16:03:21)
> On Thu, Jan 10, 2019 at 10:38:07AM +0000, Chris Wilson wrote:
> > Broadwater and the rest of gen4  do support being able to saving and
> > reloading context specific registers between contexts, providing isolation
> > of the basic GPU state (as programmable by userspace). This allows
> > userspace to assume that the GPU retains their state from one batch to the
> > next, minimising the amount of state it needs to reload and manually save
> > across batches.
> > 
> > v2: CONSTANT_BUFFER woes
> > 
> > Running through piglit turned up an interesting issue, a GPU hang inside
> > the context load. The context image includes the CONSTANT_BUFFER command
> > that loads an address into a on-gpu buffer, and the context load was
> > executing that immediately. However, since it was reading from the GTT
> > there is no guarantee that the GTT retains the same configuration as
> > when the context was saved, resulting in stray reads and a GPU hang.
> > 
> > Having tried issuing a CONSTANT_BUFFER (to disable the command) from the
> > ring before saving the context to no avail, we resort to patching out
> > the instruction inside the context image before loading.
> > 
> > This does impose that gen4 always reissues CONSTANT_BUFFER commands on
> > each batch, but due to the use of a shared GTT that was and will remain
> > a requirement.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > Cc: Kenneth Graunke <kenneth at whitecape.org>
> > Reviewed-by: Ville Syrjälä <ville.syrjala at linux.intel.com> #v1
> > ---
> >  drivers/gpu/drm/i915/intel_engine_cs.c    |  2 +-
> >  drivers/gpu/drm/i915/intel_gpu_commands.h |  3 +++
> >  drivers/gpu/drm/i915/intel_ringbuffer.c   | 17 +++++++++++++++++
> >  3 files changed, 21 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index f89b8f199e3f..88109e0de051 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -219,6 +219,7 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class)
> >                       return round_up(GEN6_CXT_TOTAL_SIZE(cxt_size) * 64,
> >                                       PAGE_SIZE);
> >               case 5:
> > +             case 4:
> >                       /*
> >                        * There is a discrepancy here between the size reported
> >                        * by the register and the size of the context layout
> > @@ -235,7 +236,6 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class)
> >                                        cxt_size * 64,
> >                                        cxt_size - 1);
> >                       return round_up(cxt_size * 64, PAGE_SIZE);
> > -             case 4:
> >               case 3:
> >               case 2:
> >               /* For the special day when i810 gets merged. */
> > diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
> > index 105e2a9e874a..00c0175c37ed 100644
> > --- a/drivers/gpu/drm/i915/intel_gpu_commands.h
> > +++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
> > @@ -266,6 +266,9 @@
> >  #define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_PS \
> >       ((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x47<<16))
> >  
> > +#define GFX_OP_CONSTANT_BUFFER \
> > +     (0x3 << 29 | 0x0 << 27 | 0x0 << 24 | 0x2 << 16)
> > +
> >  #define MFX_WAIT  ((0x3<<29)|(0x1<<27)|(0x0<<16))
> >  
> >  #define COLOR_BLT     ((0x2<<29)|(0x40<<22))
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 889f3de79dd0..21bd71cf2e94 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1632,6 +1632,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
> >               len += 2 + (num_rings ? 4*num_rings + 6 : 0);
> >       else if (IS_GEN(i915, 5))
> >               len += 2;
> > +     else if (IS_GEN(i915, 4))
> > +             len += 4;
> >       if (flags & MI_FORCE_RESTORE) {
> >               GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
> >               flags &= ~MI_FORCE_RESTORE;
> > @@ -1668,6 +1670,21 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
> >                * this should never take effect and so be a no-op!
> >                */
> >               *cs++ = MI_SUSPEND_FLUSH | MI_SUSPEND_FLUSH_EN;
> > +     } else if (IS_GEN(i915, 4)) {
> > +             /*
> > +              * Disable CONSTANT_BUFFER before it is loaded from the context
> > +              * image. For as it is loaded, it is executed and the stored
> > +              * address may no longer be valid, leading to a GPU hang.
> > +              *
> > +              * This imposes the requirement that userspace reload their
> > +              * CONSTANT_BUFFER on every batch, fortunately a requirement
> > +              * they are already accustomed to from before contexts were
> > +              * enabled.
> > +              */
> > +             *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
> > +             *cs++ = 0;
> > +             *cs++ = i915_ggtt_offset(rq->hw_context->state) + 0x1d4;
> 
> Is that offset correct for ctg/elk? The docs suggest that it is not.
> Though it looks like it'd land inside 3DSTATE_SAMPLER_PALETTE_LOAD_1
> so probably not something anyone would notice.

I just checked piglit didn't break. Admittedly a bit blaise :)

> > +             *cs++ = GFX_OP_CONSTANT_BUFFER; /* inactive */
> 
> I'm thinking there are a few ways in which we could even end up
> with an inconsistent curbe setup in the context.
> 
> Eg. something like this:
> CS_URB_STATE(num>0)
> URB_FENCE(cs>0)
> CONSTANT_BUFFER(len>0)
> CS_URB_STATE(num=0)
> URB_FENCE(cs=0)
> ctx save
> ctx restore
> 
> The restore would end up trying to issue CONSTANT_BUFFER with
> len > 0 even though nothing is allocated for the CS unit in
> the URB.
> 
> That didn't seem to be the case in your earlier hang though
> so not quite sure why it was hanging. And I'm not sure this
> would even cause a hang. The spec just says "undefined".
> 
> Bspec has this to say about CONSTANT_BUFFER:
> "Programming Notes
>  Constant Buffers can only be allocated in linear (not tiled) graphics memory
>  Constant Buffers can only be mapped to Main Memory (UC)"
> 
> Maybe we were violating one of those? Though I don't think we could
> even violate the tiled vs. linear restriction unless the memory access
> goes through the gtt fence for some reason. I didn't think gen4+ do
> that anymore. So that maybe leaves the possibility that a snooped
> bo got mapped to the same address. But again I'm not sure if that
> would cause a hang or just potentiall stale data being loaded into
> the CURBE.

Agreed that tiling shouldn't be an issue with the gen4 architecture (and
that's probably just cut'n'paste from older). snooped is also unlikely
since we/mesa are definitely not allocating snooped buffers on the fly,
so the only snoop buffers should be the static HWSP -- and that's not
even in the GGTT for Crestline.

So perhaps it is an overlapping fence?

> Another mystery is why g4x/ilk wouldn't need this. There's nothing in
> the docs to suggest that it should behave any differently. Hmm. Maybe
> MI_INVALIDATE_ISP could have something to do with this? Maybe that
> forces CONSTANT_BUFFER to be saved with valid==0 always? I guess it
> might be semi-interesting to peek at the resulting context image
> for.

The ctg/ilk context image is ugly and looks much more like line noise
(like there's a magically 255 dword 3DSTATE command full of junk.)

It could just be with the larger GGTT for g4x/ilk, piglit run gpu wasn't
able to trigger the same issue.

> After a bit of digging I found a few more potentially related
> tidbits:
> 
> ECOSPKD[4] Constant Buffer Save/Restore Disable [DevBW-C1+]
> 
> ILK_Chicken_1[7] ILK-C0: Curbe 8HW drop ECO
>  0: Fix is enabled to complete the 8HW CURBE restore FIFO clear
>  1: Fix is disable. Behavior is same as ILK B0
> 
> ILK_Chicken_1[0] [DevILK-B0]
>  “1” CS will force URB modify_enables set after context restore
>  “0” CS will not force URB modify_enables set after context restore
> 
> All of those default to 0 AFAICS, so only ILK_Chicken_1[7] seems like it
> could have any real effect. So wouldn't explain g4x working.

That does look interesting. Thanks,
-Chris


More information about the Intel-gfx mailing list