[Intel-gfx] [PATCH 1/1] drm/i915: track first and last processes that touch gem objects
Ben Widawsky
ben at bwidawsk.net
Tue Feb 7 09:49:45 CET 2012
On Mon, Feb 06, 2012 at 11:59:11PM +0100, Eric Anholt wrote:
> On Mon, 6 Feb 2012 17:15:44 +0100, Daniel Vetter <daniel at ffwll.ch> wrote:
> > On Fri, Feb 03, 2012 at 06:02:38PM +0000, Chris Wilson wrote:
> > > On Fri, 3 Feb 2012 12:43:25 -0200, Eugeni Dodonov <eugeni.dodonov at intel.com> wrote:
> > > > This allows to hopefully find out who was responsible for the GPU death.
> > > > We record the 1st and last process to touch each object, to keep track of
> > > > the process which created the object originally and the last process to
> > > > touch it.
> > > >
> > > > To simplify post-mortem analysis, we also search for the processes names
> > > > when gathering the i915_error_state and when peeking at the list of active
> > > > gem objects in debugfs. This is not perfect for tracking all the
> > > > processes, as they can quit or die before their batchbuffers got executed,
> > > > but having to track them during the entire object lifetime would be
> > > > excessively memcpy hungry.
> > >
> > > I think you've slightly missed here. Tracking who created a buffer is
> > > interesting and who last used it, but you really need to also track
> > > on whose behalf the request (i.e. each batch) is executing.
> > >
> > > For the goal of recording creator, you could just use:
> > >
> > > obj->creator = current ? current->pid : 0;
> > >
> > > in i915_gem_object_init with 0 as the special value for objects created by
> > > the driver outside of process context. And similarly for i915_add_request,
> > > though I'd associate those with the owner of the file_priv. The important
> > > point here is that a buffer may be associated with multiple batches
> > > submitted by one or more clients before a hang is detected, and so unless
> > > the dispatch pid is tracked you do not know who submitted the erroneous
> > > batch. (Even a batch may be submitted more than once by many clients,
> > > given sufficient pathology.) So adding the request queue to the
> > > i915_error_state would also be interesting, especially with the jiffie
> > > and ring->tail.
> > >
> > > Also note that there is no direct link between i915_gem_fault() and usage
> > > of the object, the point at which you want to add the obj->last_used_by
> > > tracking to is domain management - which catches the usage of CPU
> > > mappings as well as move-to-active.
> >
> > I'll second Chris here - I think the interesting stuff is to add some kind
> > of cheap ownership tracking, not who exactly created the buffer. The
> > latter is imo only really interesting for resource accounting, and that
> > would require it to be somewhat more solid. And we don't do any resource
> > accounting atm anyway.
>
> Having the creator associated with the buffer should be nice. I agree
> that for hang debugging, making the pid association part of the request
> struct makes more sense than tracking it per-object. With those two, I
> don't see much use for "last pwriter/executer" with the buffer.
Could I recommend storing drm_file instead of the PID. That is what I
have, and required for forced-throttling. You should be able to get to a
pid from the file descriptor.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20120207/5266b58b/attachment.sig>
More information about the Intel-gfx
mailing list