[PATCH 1/3] drm/amdgpu: Forward soft recovery errors to userspace
Christian König
ckoenig.leichtzumerken at gmail.com
Fri Aug 2 08:30:08 UTC 2024
Am 01.08.24 um 17:17 schrieb Friedrich Vock:
> Hi,
>
> I happened to come across an issue just now again where soft recovery
> fails to get reported to userspace properly, causing apps to submit
> hanging work in a loop (which ended up hanging the entire machine) - it
> seems like this patch never made it into amd-staging-drm-next. Given
> that it has a Reviewed-by and everything, was this just an oversight or
> are there some blockers to pushing it that I missed?
>
> If not, I'd be grateful if the patch could get merged.
Sorry that was my fault, I've forgotten about it because Alex usually
picks up stuff for amd-staging-drm-next.
Thanks for the reminder, just pushed it.
Regards,
Christian.
>
> Thanks,
> Friedrich
>
> On 08.03.24 09:33, Christian König wrote:
>> Am 07.03.24 um 20:04 schrieb Joshua Ashton:
>>> As we discussed before[1], soft recovery should be
>>> forwarded to userspace, or we can get into a really
>>> bad state where apps will keep submitting hanging
>>> command buffers cascading us to a hard reset.
>>
>> Marek you are in favor of this like forever. So I would like to request
>> you to put your Reviewed-by on it and I will just push it into our
>> internal kernel branch.
>>
>> Regards,
>> Christian.
>>
>>>
>>> 1:
>>> https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/
>>>
>>> Signed-off-by: Joshua Ashton <joshua at froggi.es>
>>>
>>> Cc: Friedrich Vock <friedrich.vock at gmx.de>
>>> Cc: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
>>> Cc: Christian König <christian.koenig at amd.com>
>>> Cc: André Almeida <andrealmeid at igalia.com>
>>> Cc: stable at vger.kernel.org
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +--
>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index 4b3000c21ef2..aebf59855e9f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -262,9 +262,8 @@ amdgpu_job_prepare_job(struct drm_sched_job
>>> *sched_job,
>>> struct dma_fence *fence = NULL;
>>> int r;
>>> - /* Ignore soft recovered fences here */
>>> r = drm_sched_entity_error(s_entity);
>>> - if (r && r != -ENODATA)
>>> + if (r)
>>> goto error;
>>> if (!fence && job->gang_submit)
>>
More information about the amd-gfx
mailing list