[PATCH 3/4] drm/amdgpu: add reset register trace dump function
Sharma, Shashank
shashank.sharma at amd.com
Mon Jan 24 17:10:10 UTC 2022
On 1/24/2022 6:07 PM, Christian König wrote:
>
>
> Am 24.01.22 um 18:00 schrieb Sharma, Shashank:
>>
>>
>> On 1/24/2022 5:48 PM, Christian König wrote:
>>> Am 24.01.22 um 17:45 schrieb Sharma, Shashank:
>>>> Hello Christian,
>>>> Thank for your comments, please fine mine inline:
>>>>
>>>> On 1/24/2022 8:15 AM, Christian König wrote:
>>>>> Am 21.01.22 um 21:34 schrieb Sharma, Shashank:
>>>>>> From 1c5c552eeddaffd9fb3e7d45ece1b2b28fccc575 Mon Sep 17 00:00:00
>>>>>> 2001
>>>>>> From: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
>>>>>> Date: Fri, 21 Jan 2022 14:19:10 +0530
>>>>>> Subject: [PATCH 3/4] drm/amdgpu: add reset register trace dump
>>>>>> function for
>>>>>> gfx_v10_0
>>>>>>
>>>>>> Implementation of register trace dump function on the AMD GPU resets
>>>>>>
>>>>>> Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram at amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 8 ++++
>>>>>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 53
>>>>>> ++++++++++++++++++++++-
>>>>>> drivers/gpu/drm/amd/include/amd_shared.h | 1 +
>>>>>> 3 files changed, 60 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>>>> index d855cb53c7e0..c97b53b54333 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>>>> @@ -537,6 +537,14 @@ TRACE_EVENT(amdgpu_ib_pipe_sync,
>>>>>> __entry->seqno)
>>>>>> );
>>>>>>
>>>>>> +TRACE_EVENT(gfx_v10_0_reset_reg_dumps,
>>>>>> + TP_PROTO(char *reg_dumps),
>>>>>> + TP_ARGS(reg_dumps),
>>>>>> + TP_STRUCT__entry(__string(dumps, reg_dumps)),
>>>>>> + TP_fast_assign(__assign_str(dumps, reg_dumps);),
>>>>>> + TP_printk("amdgpu register dump {%s}", __get_str(dumps))
>>>>>> +);
>>>>>> +
>>>>>> #undef AMDGPU_JOB_GET_TIMELINE_NAME
>>>>>> #endif
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>>>>> index 16dbe593cba2..05974ed5416d 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>>>>> @@ -46,7 +46,7 @@
>>>>>> #include "v10_structs.h"
>>>>>> #include "gfx_v10_0.h"
>>>>>> #include "nbio_v2_3.h"
>>>>>> -
>>>>>> +#include "amdgpu_trace.h"
>>>>>> /*
>>>>>> * Navi10 has two graphic rings to share each graphic pipe.
>>>>>> * 1. Primary ring
>>>>>> @@ -188,6 +188,12 @@
>>>>>> #define RLCG_ERROR_REPORT_ENABLED(adev) \
>>>>>> (amdgpu_sriov_reg_indirect_mmhub(adev) ||
>>>>>> amdgpu_sriov_reg_indirect_gc(adev))
>>>>>>
>>>>>> +#define N_REGS (17)
>>>>>> +#define DUMP_REG(addr) do { \
>>>>>> + (dump)[i][0] = (addr); \
>>>>>> + (dump)[i++][1] = RREG32_SOC15_IP(GC, addr); \
>>>>>> + } while (0)
>>>>>
>>>>> Those macros need an AMDGPU_ prefix and maybe some better names.
>>>>
>>>> Agree, @Amar pls check this out.
>>>>>
>>>>> From the design POV I'm really wondering if it wouldn't be better
>>>>> if userspace defines the registers we want to dump in case of a crash.
>>>>>
>>>>
>>>> I am not so sure about it. This means we have to communicate with
>>>> the userspace and get the list of registers it wishes to see, do I
>>>> smell a new IOCTL or DRM property ? I also feel that we have better
>>>> filtering tools for a trace event in userspace than in kernel.
>>>>
>>>> What do you think ?
>>>
>>> Just a writeable debugfs file should probably do it. We have a list
>>> of known registers filled in during boot (just reg offsets) and
>>> userspace can read/write the file to update it.
>>>
>>
>> Ok, so in this case, what would be our state machine ?
>> 1. gpu_reset happened
>> 2. uevent_sent from work queue
>> 3. userspace goes and writes into debugfs files
>> 4. kernel collects the desired register values
>>
>> 5 (a). kernel send the trace_event with desired values
>> 6 (a). kernel sends another uevent to indicate trace_event
>>
>> Or
>>
>> 5 (a+b). kernel loads the reg values via debugfs itself.
>>
>> In both of these case, wouldn't it be too late to collect register
>> values by the time userspace list arrives at kernel ?
>>
>> Or do you have a better state machine in mind ?
>
> 1. system boots, kernel loads a default register list to dump.
> 2. Userspace optionally updates this register list.
> ....
> 3. GPU reset happens, we dump the registers into the trace log.
> 4. We send udev event to signal the GPU reset to umr or other tool to
> save and store the trace log.
>
Got it, let us rework in those lines.
- Shashank
> Regards,
> Christian.
More information about the amd-gfx
mailing list