[PATCH 1/3] drm/amdgpu: Forward soft recovery errors to userspace

Christian König ckoenig.leichtzumerken at gmail.com
Fri Aug 2 08:30:08 UTC 2024


Am 01.08.24 um 17:17 schrieb Friedrich Vock:
> Hi,
>
> I happened to come across an issue just now again where soft recovery
> fails to get reported to userspace properly, causing apps to submit
> hanging work in a loop (which ended up hanging the entire machine) - it
> seems like this patch never made it into amd-staging-drm-next. Given
> that it has a Reviewed-by and everything, was this just an oversight or
> are there some blockers to pushing it that I missed?
>
> If not, I'd be grateful if the patch could get merged.

Sorry that was my fault, I've forgotten about it because Alex usually 
picks up stuff for amd-staging-drm-next.

Thanks for the reminder, just pushed it.

Regards,
Christian.

>
> Thanks,
> Friedrich
>
> On 08.03.24 09:33, Christian König wrote:
>> Am 07.03.24 um 20:04 schrieb Joshua Ashton:
>>> As we discussed before[1], soft recovery should be
>>> forwarded to userspace, or we can get into a really
>>> bad state where apps will keep submitting hanging
>>> command buffers cascading us to a hard reset.
>>
>> Marek you are in favor of this like forever.  So I would like to request
>> you to put your Reviewed-by on it and I will just push it into our
>> internal kernel branch.
>>
>> Regards,
>> Christian.
>>
>>>
>>> 1:
>>> https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/ 
>>>
>>> Signed-off-by: Joshua Ashton <joshua at froggi.es>
>>>
>>> Cc: Friedrich Vock <friedrich.vock at gmx.de>
>>> Cc: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>> Cc: Christian König <christian.koenig at amd.com>
>>> Cc: André Almeida <andrealmeid at igalia.com>
>>> Cc: stable at vger.kernel.org
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +--
>>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index 4b3000c21ef2..aebf59855e9f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -262,9 +262,8 @@ amdgpu_job_prepare_job(struct drm_sched_job
>>> *sched_job,
>>>       struct dma_fence *fence = NULL;
>>>       int r;
>>> -    /* Ignore soft recovered fences here */
>>>       r = drm_sched_entity_error(s_entity);
>>> -    if (r && r != -ENODATA)
>>> +    if (r)
>>>           goto error;
>>>       if (!fence && job->gang_submit)
>>



More information about the amd-gfx mailing list