[Intel-gfx] [PATCH 0/2] Add support for dumping error captures via kernel logging
John Harrison
john.c.harrison at intel.com
Tue Apr 11 16:41:04 UTC 2023
On 4/11/2023 07:41, Rodrigo Vivi wrote:
> On Mon, Apr 10, 2023 at 12:25:21PM -0700, John.C.Harrison at Intel.com wrote:
>> From: John Harrison <John.C.Harrison at Intel.com>
>>
>> Sometimes, the only effective way to debug an issue is to dump all the
>> interesting information at the point of failure. So add support for
>> doing that.
> No! Please no!
> We have some of this on Xe and I'm hating it. I'm going to try to remove
> from there soon. It is horrible when you lost the hability to use dmesg
> directly because it goes over the number of lines it saves... or even
> with dmesg -w it goes over the number of lines of your terminal...
> or the ssh and serial slowness when printing a bunch of information.
>
> We probably want to be able to capture multiple error states and be
> able to cross them with a kernel timeline, but definitely not overflood
> our log terminals.
I think you are missing the point.
This is the emergency backup plan for when nothing else works. It is not
on by default. It should never happen on an end user system unless we
specifically request them to run with a patched kernel to enable a dump
at a specific point.
But there are (many) times when nothing else works. In those instances,
it is extremely useful to be able to dump the system state in this manner.
It is code we have been using internally for some time and it has helped
resolve a number of different difficult to debug bugs. As our Xe
generation platforms are now out in the wild and no longer just
internal, it is also proving important to have this facility available
in upstream trees as well. And having it merged rather than floating
around as random patches passed from person to person is far easier to
manage and would also help reduce the internal tree burden.
John.
>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>>
>>
>> John Harrison (2):
>> drm/i915: Dump error capture to kernel log
>> drm/i915/guc: Dump error capture to dmesg on CTB error
>>
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 53 +++++++++
>> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 +
>> drivers/gpu/drm/i915/i915_gpu_error.c | 130 ++++++++++++++++++++++
>> drivers/gpu/drm/i915/i915_gpu_error.h | 8 ++
>> 4 files changed, 197 insertions(+)
>>
>> --
>> 2.39.1
>>
More information about the Intel-gfx
mailing list