[Intel-gfx] [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker

Chris Wilson chris at chris-wilson.co.uk
Thu Jul 21 11:04:26 UTC 2016


On Thu, Jul 21, 2016 at 11:28:02AM +0100, Tvrtko Ursulin wrote:
> 
> On 21/07/16 11:10, Chris Wilson wrote:
> >On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 21/07/16 07:57, Chris Wilson wrote:
> >>>During the idle-worker we disable the hangcheck and so kick any waiters
> >>>that should have been completed (since the GPU is now idle). Unlike the
> >>>hangcheck, we do not take any care to avoid the race between the irq
> >>>handler and ourselves, and so it is possible for us to declare a missed
> >>>interrupt even as the bottom-half is being scheduled to run. Let's
> >>>ignore this race to stop a potential false-positive error.
> >>
> >>If the bottom half is scheduled to run then then..
> >>
> >>>References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
> >>>Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>>Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> >>>Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem.c | 7 +++----
> >>>  1 file changed, 3 insertions(+), 4 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>index 40047eb48826..9e826585edb2 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
> >>>  	rearm_hangcheck = false;
> >>>
> >>>  	stuck_engines = intel_kick_waiters(dev_priv);
> >>
> >>... this will not return a stucked engine since the there is a bh
> >>task assigned all until the bh exits.
> >
> >It reports if it wakes up a waiter on any engine. If the bh is already
> >running, we cannot know if it has missed the seqno update. If it isn't
> >running yet, we cannot know if it is about to be run.
> 
> Oh I read the logic as completely opposite than what it is.
> 
> Since the idle worker runs 100ms after last retirement, that would
> mean a really slow waiter or what?

It is dubious. But the idle worker runs 100ms after the first time we
detect all engines are idle and may be running as we detect all engines
are idle again. The only thing for sure is that in some cases that bdw-u
is reaching the idle-worker with an unwoken engine (and that there is
a race here in declaring it as a missed interrupt). I wasn't that
concerned about the race because of that 100ms delay where eveything
should have been idle, but on reflection that 100ms is not guarranteed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list