[Intel-gfx] [PATCH v2 3/3] drm/i915: Clean up GPU hang message
Chris Wilson
chris at chris-wilson.co.uk
Mon Dec 14 03:39:47 PST 2015
On Mon, Dec 14, 2015 at 11:28:39AM +0000, Dave Gordon wrote:
> On 11/12/15 22:59, Chris Wilson wrote:
> >Remove some redundant kernel messages as we deduce a hung GPU and
> >capture the error state.
> >
> >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >---
> > drivers/gpu/drm/i915/i915_irq.c | 16 ++++++----------
> > 1 file changed, 6 insertions(+), 10 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >index 4cfbd694b3a8..365d4872a67d 100644
> >--- a/drivers/gpu/drm/i915/i915_irq.c
> >+++ b/drivers/gpu/drm/i915/i915_irq.c
> >@@ -2963,7 +2963,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> > struct drm_device *dev = dev_priv->dev;
> > struct intel_engine_cs *ring;
> > int i;
> >- int busy_count = 0, rings_hung = 0;
> >+ int busy_count = 0;
> > bool stuck[I915_NUM_RINGS] = { 0 };
> > #define BUSY 1
> > #define KICK 5
> >@@ -3057,17 +3057,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> > }
> >
> > for_each_ring(ring, dev_priv, i) {
> >- if (ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG) {
> >- DRM_INFO("%s on %s\n",
> >- stuck[i] ? "stuck" : "no progress",
> >- ring->name);
> >- rings_hung++;
> >- }
> >+ if (ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG)
> >+ return i915_handle_error(dev, true,
> >+ "%s on %s",
> >+ stuck[i] ? "No progress" : "Hang",
> >+ ring->name);
> > }
> >
> >- if (rings_hung)
> >- return i915_handle_error(dev, true, "Ring hung");
> >-
> > /* Reset timer in case GPU hangs without another request being added */
> > if (busy_count)
> > i915_queue_hangcheck(dev_priv);
>
> This version provides less information (in dmesg & syslog) in the
> case that multiple rings have (been detected as) hung. Does this
> ever happen?
Not often. And intended, since that information is already in the error
state.
> Is the original more useful for finding bugs in
> hangcheck itself?
No. See i915_hangcheck_info.
> If this is a test that has disabled error state
> capture (because it's *trying* to hang one or more rings) can we
> still know how many rings have been diagnosed as hung?
If you have a use case, then you can look at the interface you require.
i915_hangcheck_info should be sufficient in most cases, or at least a
good starting point. But you may want a more specific debugfs to avoid
parsing.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list