[PATCH] drm/amdgpu: add ring reset messages
Lazar, Lijo
lijo.lazar at amd.com
Mon Oct 28 15:40:52 UTC 2024
On 10/28/2024 8:11 PM, Alex Deucher wrote:
> Ping?
>
> On Fri, Oct 18, 2024 at 11:47 AM Alex Deucher <alexdeucher at gmail.com> wrote:
>>
>> Ping?
>>
>> On Tue, Oct 15, 2024 at 2:28 PM Alex Deucher <alexander.deucher at amd.com> wrote:
>>>
>>> Add messages to make it clear when a per ring reset
>>> happens. This is helpful for debugging and aligns with
>>> other reset methods.
>>>
>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index 102742f1faa2..2d60552a13ac 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -137,6 +137,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
>>> /* attempt a per ring reset */
>>> if (amdgpu_gpu_recovery &&
>>> ring->funcs->reset) {
>>> + dev_err(adev->dev, "Starting %s ring reset\n", s_job->sched->name);
Is dev_err intentional or dev_info is good enough? Also, suggest to add
ring name to fail/pass messages.
Thanks,
Lijo
>>> /* stop the scheduler, but don't mess with the
>>> * bad job yet because if ring reset fails
>>> * we'll fall back to full GPU reset.
>>> @@ -150,8 +151,10 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
>>> amdgpu_fence_driver_force_completion(ring);
>>> if (amdgpu_ring_sched_ready(ring))
>>> drm_sched_start(&ring->sched);
>>> + dev_err(adev->dev, "Ring reset success\n");>>> goto exit;
>>> }
>>> + dev_err(adev->dev, "Ring reset failure\n");
>>> }
>>>
>>> if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>> --
>>> 2.46.2
>>>
More information about the amd-gfx
mailing list