[Freedreno] [RFC 0/4] drm/msm: GPU crash state

Jordan Crouse jcrouse at codeaurora.org
Fri Jan 5 22:11:19 UTC 2018


On Fri, Jan 05, 2018 at 06:32:22PM +0000, Chris Wilson wrote:
> Quoting Jordan Crouse (2018-01-05 18:00:17)
> > This is a request for comment on code to store and dump a GPU state
> > a hang with inspiration from the very good i915 GPU error state and
> > the binary GPU snapshot in the downstream kernel.
> > 
> > The goal is to store and provide enough information to debug software
> > and hardware issues on the Adreno hardware in a semi human-readable
> > format that can also be parsed by scripts.
> > 
> > The goal for this request for comment is to get some consensus
> > about the format and work through some of the technical issues.
> 
> My biggest regret for i915/error is that we didn't adopt a sensible file
> format and organically grew it from dmesg-style logging. This is quite a
> hindrance when it comes to trying to improve the capture whilst
> maintaining compatibility with the existing tools. Switching to json/yaml
> at this point won't be too difficult to spot the change in format, just a
> large chunk of technical debt to pay off. So I would recommend you pick a
> an adaptable, human readable, file format for ease of tool development.

This is a really great suggestion. The downstream qcom kernel uses a strictly
binary format which is also problematic for other reasons. I like the idea of
having something standard and extensible while remaining human readable without
tools.

> The second important feature for capturing error state is to include as
> much user information as possible. You want to be able to identify which
> library generated the hang in a post-mortem dump from a user in 6-12
> months time, and just as importantly, why the library did what it did. I
> like the idea of userspace being able to attach buffers that are
> included in the error state (supplied as auxiliary information to the
> guilty command stream) to provide a flight-data-recorder from the user's
> pov. So design your interface with a view to extending to include blobs.

I love the ascii85 and compression stuff that i915 does and that would fit in
well a nice file format as well.

> It would be interesting to have a common file format... While
> interpreting the data is going to highly specific to a gpu/driver, the
> data itself will be similar between drivers. If we had a common file
> format, we could extend something like mesa/intel/aubinator_error_decode
> and throw in a bunch of xml descriptors for the different gpus. Just a
> thought...

I'm definitely open to this. There is never anything wrong with improved
debugging for everybody.

Thanks,
Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


More information about the Freedreno mailing list