[PATCH v4 2/4] drm/amdgpu: Rework coredump to use memory dynamically

André Almeida andrealmeid at igalia.com
Thu Aug 17 15:17:35 UTC 2023



Em 17/08/2023 12:04, Shashank Sharma escreveu:
> 
> On 17/08/2023 15:45, André Almeida wrote:
>> Hi Shashank,
>>
>> Em 17/08/2023 03:41, Shashank Sharma escreveu:
>>> Hello Andre,
>>>
>>> On 15/08/2023 21:50, André Almeida wrote:
>>>> Instead of storing coredump information inside amdgpu_device struct,
>>>> move if to a proper separated struct and allocate it dynamically. This
>>>> will make it easier to further expand the logged information.
>>>>
>>>> Signed-off-by: André Almeida <andrealmeid at igalia.com>
>>>> ---
>>>> v4: change kmalloc to kzalloc
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        | 14 +++--
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 63 
>>>> ++++++++++++++--------
>>>>   2 files changed, 49 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index 9c6a332261ab..0d560b713948 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -1088,11 +1088,6 @@ struct amdgpu_device {
>>>>       uint32_t                        *reset_dump_reg_list;
>>>>       uint32_t            *reset_dump_reg_value;
>>>>       int                             num_regs;
>>>> -#ifdef CONFIG_DEV_COREDUMP
>>>> -    struct amdgpu_task_info         reset_task_info;
>>>> -    bool                            reset_vram_lost;
>>>> -    struct timespec64               reset_time;
>>>> -#endif
>>>>       bool                            scpm_enabled;
>>>>       uint32_t                        scpm_status;
>>>> @@ -1105,6 +1100,15 @@ struct amdgpu_device {
>>>>       uint32_t            aid_mask;
>>>>   };
>>>> +#ifdef CONFIG_DEV_COREDUMP
>>>> +struct amdgpu_coredump_info {
>>>> +    struct amdgpu_device        *adev;
>>>> +    struct amdgpu_task_info         reset_task_info;
>>>> +    struct timespec64               reset_time;
>>>> +    bool                            reset_vram_lost;
>>>> +};
>>>
>>> The patch looks good to me in general, but I would recommend slightly 
>>> different arrangement and segregation of GPU reset information.
>>>
>>> Please consider a higher level structure adev->gpu_reset_info, and 
>>> move everything related to reset dump info into that, including this 
>>> new coredump_info structure, something like this:
>>>
>>> struct amdgpu_reset_info {
>>>
>>>      uint32_t *reset_dump_reg_list;
>>>
>>>      uint32_t *reset_dump_reg_value;
>>>
>>>      int num_regs;
>>>
>>
>> Right, I can encapsulate there reset_dump members,
>>
>>> #ifdef CONFIG_DEV_COREDUMP
>>>
>>>     struct amdgpu_coredump_info *coredump_info;/* keep this dynamic 
>>> allocation */
>>
>> but we don't need a pointer for amdgpu_coredump_info inside 
>> amdgpu_device or inside of amdgpu_device->gpu_reset_info, right?
> 
> I think it would be better if we keep all of the GPU reset related data 
> in the same structure, so adev->gpu_reset_info->coredump_info sounds 
> about right to me.
> 

But after patch 2/4, we don't need to store a coredump_info pointer 
inside adev, this is what I meant. What would be the purpose of having 
this pointer? It's freed by amdgpu_devcoredump_free(), so we don't need 
to keep track of it.

> - Shashank
> 
>>
>>>
>>> #endif
>>>
>>> }
>>>
>>>
>>> This will make sure that all the relevant information is at the same 
>>> place.
>>>
>>> - Shashank
>>>
>>        amdgpu_inc_vram_lost(tmp_adev);


More information about the dri-devel mailing list