[PATCH 1/3] drm/amdgpu: Forward soft recovery errors to userspace

Friedrich Vock friedrich.vock at gmx.de
Thu Aug 1 15:17:34 UTC 2024


Hi,

I happened to come across an issue just now again where soft recovery
fails to get reported to userspace properly, causing apps to submit
hanging work in a loop (which ended up hanging the entire machine) - it
seems like this patch never made it into amd-staging-drm-next. Given
that it has a Reviewed-by and everything, was this just an oversight or
are there some blockers to pushing it that I missed?

If not, I'd be grateful if the patch could get merged.

Thanks,
Friedrich

On 08.03.24 09:33, Christian König wrote:
> Am 07.03.24 um 20:04 schrieb Joshua Ashton:
>> As we discussed before[1], soft recovery should be
>> forwarded to userspace, or we can get into a really
>> bad state where apps will keep submitting hanging
>> command buffers cascading us to a hard reset.
>
> Marek you are in favor of this like forever.  So I would like to request
> you to put your Reviewed-by on it and I will just push it into our
> internal kernel branch.
>
> Regards,
> Christian.
>
>>
>> 1:
>> https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/
>> Signed-off-by: Joshua Ashton <joshua at froggi.es>
>>
>> Cc: Friedrich Vock <friedrich.vock at gmx.de>
>> Cc: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>> Cc: Christian König <christian.koenig at amd.com>
>> Cc: André Almeida <andrealmeid at igalia.com>
>> Cc: stable at vger.kernel.org
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index 4b3000c21ef2..aebf59855e9f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -262,9 +262,8 @@ amdgpu_job_prepare_job(struct drm_sched_job
>> *sched_job,
>>       struct dma_fence *fence = NULL;
>>       int r;
>> -    /* Ignore soft recovered fences here */
>>       r = drm_sched_entity_error(s_entity);
>> -    if (r && r != -ENODATA)
>> +    if (r)
>>           goto error;
>>       if (!fence && job->gang_submit)
>


More information about the amd-gfx mailing list