[Intel-gfx] [PATCH] drm/i915: Restore the wait for idle engine after flushing interrupts
Chris Wilson
chris at chris-wilson.co.uk
Fri Nov 10 12:06:59 UTC 2017
Quoting Mika Kuoppala (2017-11-10 12:00:38)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
>
> > So it appears that commit 5427f207852d ("drm/i915: Bump wait-times for
> > the final CS interrupt before parking") was a little over optimistic in
> > its belief that it had successfully waited for all residual activity on
> > the engines before parking. Numerous sightings in CI since then of
> >
> > <7>[ 52.542886] [IGT] core_auth: executing
> > <3>[ 52.561013] [drm:intel_engines_park [i915]] *ERROR* vcs0 is not idle before parking
> > <7>[ 52.561215] intel_engines_park vcs0
> > <7>[ 52.561229] intel_engines_park current seqno 98, last 98, hangcheck 0 [-247449 ms], inflight 0
> > <7>[ 52.561238] intel_engines_park Reset count: 0
> > <7>[ 52.561266] intel_engines_park Requests:
> > <7>[ 52.561363] intel_engines_park RING_START: 0x00000000 [0x00000000]
> > <7>[ 52.561377] intel_engines_park RING_HEAD: 0x00000000 [0x00000000]
> > <7>[ 52.561390] intel_engines_park RING_TAIL: 0x00000000 [0x00000000]
> > <7>[ 52.561406] intel_engines_park RING_CTL: 0x00000000
> > <7>[ 52.561422] intel_engines_park RING_MODE: 0x00000200 [idle]
> > <7>[ 52.561442] intel_engines_park ACTHD: 0x00000000_00000000
> > <7>[ 52.561459] intel_engines_park BBADDR: 0x00000000_00000000
> > <7>[ 52.561474] intel_engines_park Execlist status: 0x00000301 00000000
> > <7>[ 52.561489] intel_engines_park Execlist CSB read 5 [5 cached], write 5 [5 from hws], interrupt posted? no
> > <7>[ 52.561500] intel_engines_park ELSP[0] idle
> > <7>[ 52.561510] intel_engines_park ELSP[1] idle
> > <7>[ 52.561519] intel_engines_park HW active? 0x0
> > <7>[ 52.561608] intel_engines_park Idle? yes
> > <7>[ 52.561617] intel_engines_park
> >
> > on Braswell, which indicates that the engine just needs that little bit
> > longer after flushing the tasklet to settle. So give it a few more
> > milliseconds before declaring an emergency and applying the emergency
> > brake.
> >
>
> Because the print above indicates that it did went idle straight
> afterwards?
Indeed.
> Just pondering here what was the key nonidleness key that
> lead to this. What raced?
Still pondering myself. This is basically to shut CI up so the random
fails stop inflicting the confusion of false positives.
My guess is that it is a residual ELSP tasklet and the ring registers
taking a moment to idle. But I am not sure, on the way here we gave it
long enough to settle with no new work coming in, so it should not need
another 10ms on top of the 200ms it already had!
-Chris
More information about the Intel-gfx
mailing list