[Intel-gfx] [RFC] [PATCH 0/7] arb robustness enablers

Fri Feb 8 11:35:06 CET 2013

On Mon, Feb 04, 2013 at 04:04:36PM +0200, Mika Kuoppala wrote:
> Hi,
> 
> This patchset adds ioctl and related changes to allow
> userspace to query about the context loss status of the
> specified context. The aim is to provide enabler
> for the ARB_Robustness GL extension.
> 
> I post this as RFC to get ack/nack on the interface.
> The interface is in 0004-drm-i915-add-i915_get_reset_status_ioctl.patch
> 
> Also any feedback on the other parts of the patchset is most
> welcomed.
> 
> The i-g-t testcase can be found here:
> https://github.com/mkuoppal/intel-gpu-tools/commits/arb-robustness

Ok, I've only quickly scrolled through this but it looks rather sane, and
the interface seems to be a fit for arb robustness. So tentative ack on
the ioctl ;-) A few comments:

The first patch imo makes sense conceptually, even though it's not
required to implement the ioctl (only the test). This needs critical
review from Chris though since he's really picky about the hangcheck
misfiring or not detecting hangs. Might be that we also have some races
left which have been papered over by the acthd checks ... Best to split
this out for maximum scrunity.

For the remaining stuff, I see three pieces:
- refcounts for hw contexts: Imo this is rather independent and should go
  in early. If it doesn't apply without the other I think we should move
  it ahead in a rebase.
- Infrastructure to figure out whom to blame for a hang. One thing I'd
  really love to see on top of your patches is adjusting the "hanging too
  fast" detection to only disable command execution for the offending fd
  and not kill the entire gpu unnecessarily. This should help a lot
  already today, without any userspace support. I think it'd be also good
  to have an i-g-t test for this which tries to hang the gpu with the
  hangman on e.g.  the blit constantly until the gpu dies. And then open a
  new fd to check whether things still work.
- Putting that blame to the right hw contexts and exposing it through the
  ioctl. Usual interface rules applies: We need to have userspace support
  more or less ready before merging, i.e. a rough cut at the robustness
  extension patches for mesa. Probably best if you can do this in
  collaboration with one of the gl guys in Helsinki.

Cheers, Daniel

> 
> Thanks,
> --Mika
> 
> Mika Kuoppala (7):
>   drm/i915: detect infinite loops in hang check
>   drm/i915: add struct i915_reset_stats
>   drm/i915: add reset_status for hw_contexts
>   drm/i915: add i915_get_reset_status_ioctl
>   drm/i915: add batch object and context to i915_add_request()
>   drm/i915: reference count for i915_hw_contexts
>   drm/i915: find guilty batch buffer on ring resets
> 
>  drivers/gpu/drm/i915/i915_dma.c            |    3 +-
>  drivers/gpu/drm/i915/i915_drv.c            |   42 +++++++++
>  drivers/gpu/drm/i915/i915_drv.h            |   41 +++++++--
>  drivers/gpu/drm/i915/i915_gem.c            |  119 ++++++++++++++++++++++++--
>  drivers/gpu/drm/i915/i915_gem_context.c    |   66 +++++++++++++--
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   11 ++-
>  drivers/gpu/drm/i915/i915_irq.c            |  127 ++++++++++++++--------------
>  drivers/gpu/drm/i915/intel_overlay.c       |    5 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.h    |    4 +
>  include/uapi/drm/i915_drm.h                |   19 +++++
>  11 files changed, 351 insertions(+), 88 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch