[PATCH v3 3/4] drm/amdgpu: use drm_file_err in logging to also dump process information

Khatri, Sunil sukhatri at amd.com
Wed Apr 16 12:16:59 UTC 2025


On 4/16/2025 5:37 PM, Pierre-Eric Pelloux-Prayer wrote:
> Hi,
>
> Le 16/04/2025 à 12:01, Khatri, Sunil a écrit :
>>
>> On 4/16/2025 12:56 PM, Tvrtko Ursulin wrote:
>>>
>>> On 15/04/2025 19:43, Sunil Khatri wrote:
>>>> add process and pid information in the userqueue error
>>>> logging to make it more useful in resolving the error
>>>> by logs.
>>>>
>>>> Sample log:
>>>> [   42.444297] [drm:amdgpu_userqueue_wait_for_signal [amdgpu]] 
>>>> *ERROR* Timed out waiting for fence f=000000001c74d978 for 
>>>> comm:Xwayland pid:3427
>>>> [   42.444669] [drm:amdgpu_userqueue_suspend [amdgpu]] *ERROR* Not 
>>>> suspending userqueue, timeout waiting for comm:Xwayland pid:3427
>>>> [   42.824729] [drm:amdgpu_userqueue_wait_for_signal [amdgpu]] 
>>>> *ERROR* Timed out waiting for fence f=0000000074407d3e for 
>>>> comm:systemd-logind pid:1058
>>>> [   42.825082] [drm:amdgpu_userqueue_suspend [amdgpu]] *ERROR* Not 
>>>> suspending userqueue, timeout waiting for comm:systemd-logind pid:1058
>>>>
>>>> Signed-off-by: Sunil Khatri <sunil.khatri at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 14 ++++++++------
>>>>   1 file changed, 8 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/ amdgpu_userqueue.c
>>>> index 1867520ba258..05c1ee27a319 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>>>> @@ -43,7 +43,7 @@ amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr 
>>>> *uq_mgr,
>>>>       if (f && !dma_fence_is_signaled(f)) {
>>>>           ret = dma_fence_wait_timeout(f, true, 
>>>> msecs_to_jiffies(100));
>>>>           if (ret <= 0) {
>>>> -            DRM_ERROR("Timed out waiting for fence f=%p\n", f);
>>>> +            drm_file_err(uq_mgr->file, "Timed out waiting for 
>>>> fence f=%p\n", f);
>>>
>>> You decided to leave %p after all?
>>
>> Yes we are printing the fence ptr here to see which fence is timing 
>> out. Anyways right now intention of this patch is to add additional 
>> process information along with existing information like fence here.
>>
>
> I agree with Tvrtko, "fence=%llu:%llu" would be better to identify 
> "which fence is timing out".

I agree to it for sure, just that there are other places also where we 
are printing fence ptr and will take that up in another patch.

Regards
Sunil Khatri

>
>
> Pierre-Eric
>
>
>> regards
>> Sunil
>>
>>>
>>>>               return;
>>>>           }
>>>>       }
>>>> @@ -440,7 +440,8 @@ amdgpu_userqueue_resume_all(struct 
>>>> amdgpu_userq_mgr *uq_mgr)
>>>>       }
>>>>         if (ret)
>>>> -        DRM_ERROR("Failed to map all the queues\n");
>>>> +        drm_file_err(uq_mgr->file, "Failed to map all the queue\n");
>>>
>>> You lost the plural by accident.
>> Yes i will add 's'. Noted.
>>>
>> I am also not sure "all the queues" makes sense in this context 
>> versus "all queues" but it's inconsequential really.
>> Regards
>> Sunil
>>> Yes it all queues from a uq_mgr.
>>>> +
>>>>       return ret;
>>>>   }
>>>>   @@ -598,7 +599,8 @@ amdgpu_userqueue_suspend_all(struct 
>>>> amdgpu_userq_mgr *uq_mgr)
>>>>       }
>>>>         if (ret)
>>>> -        DRM_ERROR("Couldn't unmap all the queues\n");
>>>> +        drm_file_err(uq_mgr->file, "Couldn't unmap all the 
>>>> queues\n");
>>>> +
>>>>       return ret;
>>>>   }
>>>>   @@ -615,7 +617,7 @@ amdgpu_userqueue_wait_for_signal(struct 
>>>> amdgpu_userq_mgr *uq_mgr)
>>>>               continue;
>>>>           ret = dma_fence_wait_timeout(f, true, 
>>>> msecs_to_jiffies(100));
>>>>           if (ret <= 0) {
>>>> -            DRM_ERROR("Timed out waiting for fence f=%p\n", f);
>>>> +            drm_file_err(uq_mgr->file, "Timed out waiting for 
>>>> fence f=%p\n", f);
>>>>               return -ETIMEDOUT;
>>>>           }
>>>>       }
>>>> @@ -634,13 +636,13 @@ amdgpu_userqueue_suspend(struct 
>>>> amdgpu_userq_mgr *uq_mgr,
>>>>       /* Wait for any pending userqueue fence work to finish */
>>>>       ret = amdgpu_userqueue_wait_for_signal(uq_mgr);
>>>>       if (ret) {
>>>> -        DRM_ERROR("Not suspending userqueue, timeout waiting for 
>>>> work\n");
>>>> +        drm_file_err(uq_mgr->file, "Not suspending userqueue, 
>>>> timeout waiting\n");
>>>>           return;
>>>>       }
>>>>         ret = amdgpu_userqueue_suspend_all(uq_mgr);
>>>>       if (ret) {
>>>> -        DRM_ERROR("Failed to evict userqueue\n");
>>>> +        drm_file_err(uq_mgr->file, "Failed to evict userqueue\n");
>>>>           return;
>>>
>>> It is pre-existing but strikes me as odd that failure to 
>>> amdgpu_userqueue_suspend_all() logs a failure to *evict* instead of 
>>> suspend (as the previous log does). Anyway, I did not look at the 
>>> surrounding code so just thinking out loud.
>>
>> Yes suspend failed as all the fences were not evicted and thats why 
>> suspend failed. Anyways there are already alex patches which will 
>> change this to unmap as a code reorganisation for suspend/resume is 
>> in pipeline.
>>
>> regards
>>
>> Sunil
>>
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>       }
>>>


More information about the dri-devel mailing list