[Intel-gfx] [RFC PATCH 00/12] AubCrash

Chris Wilson chris at chris-wilson.co.uk
Fri Oct 27 19:10:42 UTC 2017


Quoting Oscar Mateo (2017-10-27 19:45:39)
> 
> 
> On 10/27/2017 11:30 AM, Chris Wilson wrote:
> > Quoting Oscar Mateo (2017-10-27 19:01:03)
> >> AubCrash is a companion to i915_gpu_error. It gives us the possibility to
> >> dump an AUB file that describes the state of the system at the point of
> >> the crash (GTTs, contexts, BBs, BOs, etc...). Being an AUB file, it can be
> >> used by a number of already existing tools (graphical AUB file browsers,
> >> simulators, emulators, etc...) that facilitate debugging (an improvement
> >> over the current text-based crash dump).
> > Since it is capture everything in progress, but only the kernel side of
> > it, why put it in the kernel? Is this absolutely required for
> > post-mortem debugging, or should we focus on capturing the death throes
> > of userspace much better (an aubcapture flight-data-recorder, plus
> > client annotations more akin to apitrace)?
> >
> > Sell me with the bugzilla references.
> > -Chris
> 
> An aubcapture flight-data-recorder is the next logical step. Like 
> i-g-t's intel_aubdump tool, but at the kernel level, so that it includes 
> everything: contexts, WA BBs, virtual GPU addresses, pagetables, etc... 
> The trojan horse for that is "drm/i915: Add an AUB file format writer". 
> Now you only have to add a couple of debugfs entries (one for start/stop 
> the capture, one to retrieve the AUB file as it gets created via 'relay 
> channel') and a number of hooks around i915 to capture everything that 
> can be interesting.

But we don't need to do that at the kernel level, as the ioctl interface
is the defining uABI. The only thing we can't snoop are the real phys
addresses but afaik for the replay aspect you don't need real, just
consistent.

Don't do anything in the kernel that can be done in userspace, because
we can never get it out again. We really do need compelling arguments as
to why it is impossible to do what needs to be done from userspace. And
however we put it, we can't just leak physical addressess or other
lowlevel information that opens ourselves to abuse, or snooping of one
client on another. At least not without a very good defense to hide behind
when it is spotted. The argument has to be really compelling if you want
us to maintain this for all platforms for the next decade+.

One suggestion is that we put all the dodgy stuff in an auxiliary
module, not even just hiding behind a module option. Of course that
makes the post-mortem aspect impossible.

(I'm not saying I have the answers, just that that its a high bar we have
to pass.)
-Chris


More information about the Intel-gfx mailing list