[Intel-gfx] [PATCH 02/13] drm/i915: Copy user requested buffers into the error state

Sun Apr 2 08:51:57 UTC 2017

On Sat, Apr 01, 2017 at 05:48:55PM -0700, Matt Turner wrote:
> On Wed, Mar 29, 2017 at 8:56 AM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > Introduce a new execobject.flag (EXEC_OBJECT_CAPTURE) that userspace may
> > use to indicate that it wants the contents of this buffer preserved in
> > the error state (/sys/class/drm/cardN/error) following a GPU hang
> > involving this batch.
> >
> > Use this at your discretion, the contents of the error state. although
> > compressed, are allocated with GFP_ATOMIC (i.e. limited) and kept for all
> > eternity (until the error state is destroyed).
> >
> > Based on an earlier patch by Ben Widawsky <ben at bwidawsk.net>
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Ben Widawsky <ben at bwidawsk.net>
> > Cc: Matt Turner <mattst88 at gmail.com>
> > Acked-by: Ben Widawsky <ben at bwidawsk.net>
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > ---
> 
> Thank you, Chris. With this in place (and a few patches from Ben
> rebased for libdrm and Mesa) I can disassemble the shader program from
> an error state.
> 
> In this case, I turned off the end-of-thread bit on the sendc in order
> to cause a hang:
> 
> render ring --- user = 0x00000000 fff75000
> pln(8)          g124<1>F        g4<0,1,0>F      g2<8,8,1>F      {
> align1 1Q compacted };
> pln(8)          g125<1>F        g4.4<0,1,0>F    g2<8,8,1>F      {
> align1 1Q compacted };
> pln(8)          g126<1>F        g5<0,1,0>F      g2<8,8,1>F      {
> align1 1Q compacted };
> pln(8)          g127<1>F        g5.4<0,1,0>F    g2<8,8,1>F      {
> align1 1Q compacted };
> sendc(8)        null<1>UW       g124<8,8,1>F
>                             render RT write SIMD8 LastRT Surface = 0
> mlen 4 rlen 0 { align1 1Q };
> nop                                                             ;
> pln(16)         g120<1>F        g6<0,1,0>F      g2<8,8,1>F      {
> align1 1H compacted };
> pln(16)         g122<1>F        g6.4<0,1,0>F    g2<8,8,1>F      {
> align1 1H compacted };
> pln(16)         g124<1>F        g7<0,1,0>F      g2<8,8,1>F      {
> align1 1H compacted };
> pln(16)         g126<1>F        g7.4<0,1,0>F    g2<8,8,1>F      {
> align1 1H compacted };
> sendc(16)       null<1>UW       g120<8,8,1>F
>                             render RT write SIMD16 LastRT Surface = 0
> mlen 8 rlen 0 { align1 1H };
> illegal(1)                                                      { align1 1N };
> 
> Presumably we would like to save more than just instruction buffers.
> Do we have a good way of discerning what each blob of data in the
> error state is?

The prechosen set are named (batch, ring, HW context, HW status,
semaphore). The user ones just have a nondescript 'user'. My thinking
was that either there would be an additional debug only (aub-esque)
buffer added to the execbuf that contained all the useful info to index
the other buffers captured, or userspace puts a header/footer into its
captured batches. I did consider the possibility of adding a tag through
the execobject, maybe 8-bits inside flags, but I prefer the approach
of embedding information into the buffers (much more flexibile).

It is also possible to take the simulator route and decode the buffers
according to the current GPU state, the link between relocation
addresses and buffer address should be sufficient?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre