[Intel-gfx] [PATCH 1/1] drm/i915: track first and last processes that touch gem objects
eric at anholt.net
Mon Feb 6 23:59:11 CET 2012
On Mon, 6 Feb 2012 17:15:44 +0100, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Fri, Feb 03, 2012 at 06:02:38PM +0000, Chris Wilson wrote:
> > On Fri, 3 Feb 2012 12:43:25 -0200, Eugeni Dodonov <eugeni.dodonov at intel.com> wrote:
> > > This allows to hopefully find out who was responsible for the GPU death.
> > > We record the 1st and last process to touch each object, to keep track of
> > > the process which created the object originally and the last process to
> > > touch it.
> > >
> > > To simplify post-mortem analysis, we also search for the processes names
> > > when gathering the i915_error_state and when peeking at the list of active
> > > gem objects in debugfs. This is not perfect for tracking all the
> > > processes, as they can quit or die before their batchbuffers got executed,
> > > but having to track them during the entire object lifetime would be
> > > excessively memcpy hungry.
> > I think you've slightly missed here. Tracking who created a buffer is
> > interesting and who last used it, but you really need to also track
> > on whose behalf the request (i.e. each batch) is executing.
> > For the goal of recording creator, you could just use:
> > obj->creator = current ? current->pid : 0;
> > in i915_gem_object_init with 0 as the special value for objects created by
> > the driver outside of process context. And similarly for i915_add_request,
> > though I'd associate those with the owner of the file_priv. The important
> > point here is that a buffer may be associated with multiple batches
> > submitted by one or more clients before a hang is detected, and so unless
> > the dispatch pid is tracked you do not know who submitted the erroneous
> > batch. (Even a batch may be submitted more than once by many clients,
> > given sufficient pathology.) So adding the request queue to the
> > i915_error_state would also be interesting, especially with the jiffie
> > and ring->tail.
> > Also note that there is no direct link between i915_gem_fault() and usage
> > of the object, the point at which you want to add the obj->last_used_by
> > tracking to is domain management - which catches the usage of CPU
> > mappings as well as move-to-active.
> I'll second Chris here - I think the interesting stuff is to add some kind
> of cheap ownership tracking, not who exactly created the buffer. The
> latter is imo only really interesting for resource accounting, and that
> would require it to be somewhat more solid. And we don't do any resource
> accounting atm anyway.
Having the creator associated with the buffer should be nice. I agree
that for hang debugging, making the pid association part of the request
struct makes more sense than tracking it per-object. With those two, I
don't see much use for "last pwriter/executer" with the buffer.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 197 bytes
Desc: not available
More information about the Intel-gfx