[Intel-gfx] [PATCH 15/69] drm/i915/gt: Track all timelines created using the HWSP
Chris Wilson
chris at chris-wilson.co.uk
Tue Dec 15 17:16:30 UTC 2020
Quoting Mika Kuoppala (2020-12-15 17:09:39)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > We assume that the contents of the HWSP are lost across suspend, and so
> > upon resume we must restore critical values such as the timeline seqno.
> > Keep track of every timeline allocated that uses the HWSP as its storage
> > and so we can then reset all seqno values by walking that list.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 9 ++++-
> > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 6 ++++
> > drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 +
> > .../drm/i915/gt/intel_execlists_submission.c | 11 ++++--
> > .../gpu/drm/i915/gt/intel_ring_submission.c | 35 +++++++++++++++++++
> > drivers/gpu/drm/i915/gt/intel_timeline.h | 13 +++++--
> > .../gpu/drm/i915/gt/intel_timeline_types.h | 2 ++
> > 7 files changed, 71 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index 71bd052628f4..6c08e74edcae 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -648,6 +648,8 @@ static int init_status_page(struct intel_engine_cs *engine)
> > void *vaddr;
> > int ret;
> >
> > + INIT_LIST_HEAD(&engine->status_page.timelines);
> > +
> > /*
> > * Though the HWS register does support 36bit addresses, historically
> > * we have had hangs and corruption reported due to wild writes if
> > @@ -936,6 +938,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
> > fput(engine->default_state);
> >
> > if (engine->kernel_context) {
> > + list_del(&engine->kernel_context->timeline->engine_link);
> > intel_context_unpin(engine->kernel_context);
> > intel_context_put(engine->kernel_context);
> > }
> > @@ -1281,8 +1284,12 @@ void intel_engines_reset_default_submission(struct intel_gt *gt)
> > struct intel_engine_cs *engine;
> > enum intel_engine_id id;
> >
> > - for_each_engine(engine, gt, id)
> > + for_each_engine(engine, gt, id) {
> > + if (engine->sanitize)
> > + engine->sanitize(engine);
> > +
> > engine->set_default_submission(engine);
> > + }
> > }
> >
> > bool intel_engine_can_store_dword(struct intel_engine_cs *engine)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > index 99574378047f..1e5bad0b9a82 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > @@ -60,6 +60,12 @@ static int __engine_unpark(struct intel_wakeref *wf)
> >
> > /* Scrub the context image after our loss of control */
> > ce->ops->reset(ce);
> > +
> > + CE_TRACE(ce, "reset { seqno:%x, *hwsp:%x, ring:%x }\n",
> > + ce->timeline->seqno,
> > + READ_ONCE(*ce->timeline->hwsp_seqno),
> > + ce->ring->emit);
> > + GEM_BUG_ON(ce->timeline->seqno != *ce->timeline->hwsp_seqno);
>
> Compiler should be satified but could still have been READ_ONCE,
> for the reader and for the fine bug on which might get delivered to console.
Yup, I hesitated as it meant a newline :)
> But main thing is that now coherency is enforced from the get go.
> > +static void xcs_sanitize(struct intel_engine_cs *engine)
> > +{
> > + /*
> > + * Poison residual state on resume, in case the suspend didn't!
> > + *
> > + * We have to assume that across suspend/resume (or other loss
> > + * of control) that the contents of our pinned buffers has been
> > + * lost, replaced by garbage. Since this doesn't always happen,
> > + * let's poison such state so that we more quickly spot when
> > + * we falsely assume it has been preserved.
> > + */
> > + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
> > + memset(engine->status_page.addr, POISON_INUSE, PAGE_SIZE);
> > +
> > + /*
> > + * The kernel_context HWSP is stored in the status_page. As above,
> > + * that may be lost on resume/initialisation, and so we need to
> > + * reset the value in the HWSP.
> > + */
> > + sanitize_hwsp(engine);
> > +
> > + /* And scrub the dirty cachelines for the HWSP */
> > + clflush_cache_range(engine->status_page.addr, PAGE_SIZE);
>
> The flush could be part of the actual writing of the seqno with
> that range. But then you would need to track the debug so. Better
> to make sure to transfer everything to be visible.
Yes. It's also just part of the general 'sanitize' paranoia; we scrub
everything until we are convinced that we only see our own bugs
reflected back at us.
-Chris
More information about the Intel-gfx
mailing list