[Intel-gfx] [PATCH 1/6] drm/i915: add interface to simulate gpu hangs

Eugeni Dodonov eugeni at dodonov.net
Thu Apr 26 02:03:05 CEST 2012


On Wed, Apr 25, 2012 at 08:57, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:

> gpu reset is a very important piece of our infrastructure.
> Unfortunately we only really it test by actually hanging the gpu,
> which often has bad side-effects for the entire system. And the gpu
> hang handling code is one of the rather complicated pieces of code we
> have, consisting of
> - hang detection
> - error capture
> - actual gpu reset
> - reset of all the gem bookkeeping
> - reinitialition of the entire gpu
>
> This patch adds a debugfs to selectively stopping rings by ceasing to
> update the hw tail pointer, which will result in the gpu no longer
> updating it's head pointer and eventually to the hangcheck firing.
> This way we can exercise the gpu hang code under controlled conditions
> without a dying gpu taking down the entire systems.
>
> Patch motivated by me forgetting to properly reinitialize ppgtt after
> a gpu reset.
>
> Usage:
>
> echo $((1 << $ringnum)) > i915_ring_stop # stops one ring
>
> echo 0xffffffff > i915_ring_stop # stops all, future-proof version
>
> then run whatever testload is desired. i915_ring_stop automatically
> resets after a gpu hang is detected to avoid hanging the gpu to fast
> and declaring it wedged.
>
> v2: Incorporate feedback from Chris Wilson.
>
> v3: Add the missing cleanup.
>
> Signed-Off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
>

I think I sent my R-b for the previous series, but just in case, for the
new one:
Reviewed-By: Eugeni Dodonov <eugeni.dodonov at intel.com>

With small bikeshed below:

 static ssize_t
> +i915_ring_stop_read(struct file *filp,
> +                   char __user *ubuf,
> +                   size_t max,
> +                   loff_t *ppos)
> +{
> +       struct drm_device *dev = filp->private_data;
> +       drm_i915_private_t *dev_priv = dev->dev_private;
> +       char buf[80];
>

buf is 80 characters here, but


> +static ssize_t
> +i915_ring_stop_write(struct file *filp,
> +                    const char __user *ubuf,
> +                    size_t cnt,
> +                    loff_t *ppos)
> +{
> +       struct drm_device *dev = filp->private_data;
> +       struct drm_i915_private *dev_priv = dev->dev_private;
> +       char buf[20];
>

here it is 20.. I don't think we'll need more than 20 the way it is
supposed to work, so maybe standardize it to 80 above as well for
consistency?

-- 
Eugeni Dodonov
<http://eugeni.dodonov.net/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20120425/8097f714/attachment.html>


More information about the Intel-gfx mailing list