[Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Rogovin, Kevin kevin.rogovin at intel.com
Wed Sep 27 10:10:59 UTC 2017


HI,

 In spirit, stuffing data into MI_NOOP is nicer since then one can just rely on aubinator to read that data and go to town. The main issues I see are the following.

 1. One needs to now insert MI_NOOP's into the command buffer in order to insert strings. This changes what is sent to the GPU (though one can argue that MI_NOOP should not really matter). The big nasty potential change is in situations where the command buffer approaches full, with the MI_NOOP it fills faster and thus that dramatically changes what a driver sends to the GPU since a new batchbuffer triggers more state and -FLUSHES-.

2. It means more modifications to the driver in order to insert the messages.

3. The driver needs to somehow get a call-id from the application in order to know what value to place in the MI_NOOP. 

The worst issue (for me) is #1; #3 is solveable-ish by making some function pointer available to set the value to stuff in the MI_NOOP unused bits. Issue #2 is quite icky because I have more in mind for the logger than Mesa/i965 and I want to keep the work to add it to a driver to a bare minimum.

FWIW, when I started this, I wanted to do it via aub-dumper and aubinator where they would produce auxiliary files that had the necessary data to know what in it came from where. But the more I looked at the issues I wanted to solve, the more trickier it seemed to me to use aubdumper and aubinator to accomplish that.

-----Original Message-----
From: Landwerlin, Lionel G 
Sent: Wednesday, September 27, 2017 12:35 PM
To: Rogovin, Kevin <kevin.rogovin at intel.com>; Chris Wilson <chris at chris-wilson.co.uk>; mesa-dev at lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

A few months ago I implemented debug messages in the command stream by stuffing the unused bits of MI_NOOP :

https://patchwork.freedesktop.org/series/26079/

Aubinator would then read the bits and print the messages.

We might be able to reuse similar idea to get away with any external interface.
Instead of having strings of characters we could put a marker with the BO handle that could be used to store all of the metadata about a particular draw call.

What do you think?

-
Lionel

On 27/09/17 07:53, Rogovin, Kevin wrote:
> Hi,
>
>   Right now the way the thing works is that it walks the batchbuffer just after the kernel returns from the ioctl and updates its internal view of the GPU state as it walks and emits to the log file the data. The log on a single batchbuffer is (essentially) just a list of call ID's from the apitrace together of "where in the batchbuffer" that call started.
>
>   I confess that I had not realized the potential application for using something like this to help diagnose GPU hangs! I think it is a really good idea. What I could do is the following (and it is not terribly hard to do):
>
>     1. -BEFORE- issuing the ioctl, the logger walks just the api markers in the log of the batchbuffer, and makes a new GEM BO filled with apitrace data (call ID, and maybe GL function data) and modify the ioctl to have an extra buffer.
>
>    2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM BO; In order to read the GPU state more accurately I need to walk the log and update the GPU state after the ioctl (mostly out of paranoia for values copied from BO's to pipeline registers).
>
> What would happen, is that if a batchbuffer made the GPU hang, you would then know all the GL commands (trace ID's from the API trace) that made stuff on that batchbuffer. Then one could go back to the apitrace of the troublesome application  and have a much better starting place to debug.
>
> We could also do something evil looking and put another modification on apitrace where it can have a list of call trace ranges where it inserts glFinish after each call. Those glFinish()'s will then force the ioctl of the exact troublesome draw call without needing to tell i965 to flush after each draw call.
>
> Just to make sure, you want the "apitrace" data (call ID list, maybe function name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug code know which one to use to dump? I would guess if the batchbuffer is the first buffer, then it would be the last buffer, otherwise if the batch buffer is the last one, I guess it would be one just before, but that might screw up reloc-data if any of the relocs in the batchbuffer refer to itself. I can also emit the data to a file and close the file before the ioctl and if the ioctl returns, delete said file (assuming a GPU hang always stops the process, then a hang would leave behind a file).
>
> Let me know, what is best, and I will do it.
>
> -Kevin
>
>
> -----Original Message-----
> From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> Sent: Tuesday, September 26, 2017 11:20 PM
> To: Rogovin, Kevin <kevin.rogovin at intel.com>; 
> mesa-dev at lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for 
> Intel GPU
>
> Quoting Rogovin, Kevin (2017-09-26 10:35:44)
>> Hi,
>>
>>    Attached to this message are the following:
>>       1. a file giving example usage of the tool with a modified 
>> apitrace to produce json output
>>
>>       2. the patches to apitrace to make it BatchbufferLogger aware
>>
>>       3. the JSON files (gzipped) made from the example.
>>
>>
>> I encourage (and hope) people will take a look at the JSON to see the potential of the tool.
> The automatic apitrace-esque logging seems very useful. How easy would 
> it be to write that trace into a bo and associate with the execbuffer 
> (from my pov, it should be that hard)? That way you could get the most 
> recent actions before a GPU hang, attach them to a bug and decode them 
> at leisure. (An extension may be to keep a ring of the last N traces 
> so that you can see some setup a few batches ago that triggered a hang 
> in this one.)
>
> I presume you already have such a plan, and I'm just preaching to the choir.
> -Chris
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev




More information about the mesa-dev mailing list