[Intel-gfx] [PATCH 1/1] drm/i915: track first and last processes that touch gem objects

Chris Wilson chris at chris-wilson.co.uk
Fri Feb 3 19:02:38 CET 2012


On Fri,  3 Feb 2012 12:43:25 -0200, Eugeni Dodonov <eugeni.dodonov at intel.com> wrote:
> This allows to hopefully find out who was responsible for the GPU death.
> We record the 1st and last process to touch each object, to keep track of
> the process which created the object originally and the last process to
> touch it.
> 
> To simplify post-mortem analysis, we also search for the processes names
> when gathering the i915_error_state and when peeking at the list of active
> gem objects in debugfs. This is not perfect for tracking all the
> processes, as they can quit or die before their batchbuffers got executed,
> but having to track them during the entire object lifetime would be
> excessively memcpy hungry.

I think you've slightly missed here. Tracking who created a buffer is
interesting and who last used it, but you really need to also track 
on whose behalf the request (i.e. each batch) is executing.

For the goal of recording creator, you could just use:

  obj->creator = current ? current->pid : 0;

in i915_gem_object_init with 0 as the special value for objects created by
the driver outside of process context. And similarly for i915_add_request,
though I'd associate those with the owner of the file_priv.  The important
point here is that a buffer may be associated with multiple batches
submitted by one or more clients before a hang is detected, and so unless
the dispatch pid is tracked you do not know who submitted the erroneous
batch. (Even a batch may be submitted more than once by many clients,
given sufficient pathology.) So adding the request queue to the
i915_error_state would also be interesting, especially with the jiffie
and ring->tail.

Also note that there is no direct link between i915_gem_fault() and usage
of the object, the point at which you want to add the obj->last_used_by
tracking to is domain management - which catches the usage of CPU
mappings as well as move-to-active.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre



More information about the Intel-gfx mailing list