[Intel-gfx] [PATCH] drm/i915/gt: Incrementally check for rewinding

Chris Wilson chris at chris-wilson.co.uk
Wed Jun 10 14:03:58 UTC 2020


Quoting Chang, Bruce (2020-06-10 05:25:39)
> On 6/9/2020 8:17 AM, Chris Wilson wrote:
> > In commit 5ba32c7be81e ("drm/i915/execlists: Always force a context
> > reload when rewinding RING_TAIL"), we placed the check for rewinding a
> > context on actually submitting the next request in that context. This
> > was so that we only had to check once, and could do so with precision
> > avoiding as many forced restores as possible. For example, to ensure
> > that we can resubmit the same request a couple of times, we include a
> > small wa_tail such that on the next submission, the ring->tail will
> > appear to move forwards when resubmitting the same request. This is very
> > common as it will happen for every lite-restore to fill the second port
> > after a context switch.
> >
> > However, intel_ring_direction() is limited in precision to movements of
> > upto half the ring size. The consequence being that if we tried to
> > unwind many requests, we could exceed half the ring and flip the sense
> > of the direction, so missing a force restore. As no request can be
> > greater than half the ring (i.e. 2048 bytes in the smallest case), we
> > can check for rollback incrementally. As we check against the tail that
> > would be submitted, we do not lose any sensitivity and allow lite
> > restores for the simple case. We still need to double check upon
> > submitting the context, to allow for multiple preemptions and
> > resubmissions.
> >
> > Fixes: 5ba32c7be81e ("drm/i915/execlists: Always force a context reload when rewinding RING_TAIL")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> > Cc: <stable at vger.kernel.org> # v5.4+
> 
> Verified this has fixed the issue regarding the GPU hang with incomplete 
> error state.

But it does not entirely... tgl b0 still has the issue of a lite restore
being processed while it is doing an [implicit] semaphore wait at just
the wrong time, dies (or something that looks suspiciously like that).
That can be reproduced without any preemption rollback, so I suspect a
placebo effect.
-Chris


More information about the Intel-gfx mailing list