[PATCH v3 3/4] drm/amdgpu: use drm_file_err in logging to also dump process information
Khatri, Sunil
sukhatri at amd.com
Wed Apr 16 10:01:27 UTC 2025
On 4/16/2025 12:56 PM, Tvrtko Ursulin wrote:
>
> On 15/04/2025 19:43, Sunil Khatri wrote:
>> add process and pid information in the userqueue error
>> logging to make it more useful in resolving the error
>> by logs.
>>
>> Sample log:
>> [ 42.444297] [drm:amdgpu_userqueue_wait_for_signal [amdgpu]]
>> *ERROR* Timed out waiting for fence f=000000001c74d978 for
>> comm:Xwayland pid:3427
>> [ 42.444669] [drm:amdgpu_userqueue_suspend [amdgpu]] *ERROR* Not
>> suspending userqueue, timeout waiting for comm:Xwayland pid:3427
>> [ 42.824729] [drm:amdgpu_userqueue_wait_for_signal [amdgpu]]
>> *ERROR* Timed out waiting for fence f=0000000074407d3e for
>> comm:systemd-logind pid:1058
>> [ 42.825082] [drm:amdgpu_userqueue_suspend [amdgpu]] *ERROR* Not
>> suspending userqueue, timeout waiting for comm:systemd-logind pid:1058
>>
>> Signed-off-by: Sunil Khatri <sunil.khatri at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 14 ++++++++------
>> 1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> index 1867520ba258..05c1ee27a319 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
>> @@ -43,7 +43,7 @@ amdgpu_userqueue_cleanup(struct amdgpu_userq_mgr
>> *uq_mgr,
>> if (f && !dma_fence_is_signaled(f)) {
>> ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
>> if (ret <= 0) {
>> - DRM_ERROR("Timed out waiting for fence f=%p\n", f);
>> + drm_file_err(uq_mgr->file, "Timed out waiting for fence
>> f=%p\n", f);
>
> You decided to leave %p after all?
Yes we are printing the fence ptr here to see which fence is timing out.
Anyways right now intention of this patch is to add additional process
information along with existing information like fence here.
regards
Sunil
>
>> return;
>> }
>> }
>> @@ -440,7 +440,8 @@ amdgpu_userqueue_resume_all(struct
>> amdgpu_userq_mgr *uq_mgr)
>> }
>> if (ret)
>> - DRM_ERROR("Failed to map all the queues\n");
>> + drm_file_err(uq_mgr->file, "Failed to map all the queue\n");
>
> You lost the plural by accident.
Yes i will add 's'. Noted.
>
I am also not sure "all the queues" makes sense in this context versus
"all queues" but it's inconsequential really.
Regards
Sunil
> Yes it all queues from a uq_mgr.
>> +
>> return ret;
>> }
>> @@ -598,7 +599,8 @@ amdgpu_userqueue_suspend_all(struct
>> amdgpu_userq_mgr *uq_mgr)
>> }
>> if (ret)
>> - DRM_ERROR("Couldn't unmap all the queues\n");
>> + drm_file_err(uq_mgr->file, "Couldn't unmap all the queues\n");
>> +
>> return ret;
>> }
>> @@ -615,7 +617,7 @@ amdgpu_userqueue_wait_for_signal(struct
>> amdgpu_userq_mgr *uq_mgr)
>> continue;
>> ret = dma_fence_wait_timeout(f, true, msecs_to_jiffies(100));
>> if (ret <= 0) {
>> - DRM_ERROR("Timed out waiting for fence f=%p\n", f);
>> + drm_file_err(uq_mgr->file, "Timed out waiting for fence
>> f=%p\n", f);
>> return -ETIMEDOUT;
>> }
>> }
>> @@ -634,13 +636,13 @@ amdgpu_userqueue_suspend(struct
>> amdgpu_userq_mgr *uq_mgr,
>> /* Wait for any pending userqueue fence work to finish */
>> ret = amdgpu_userqueue_wait_for_signal(uq_mgr);
>> if (ret) {
>> - DRM_ERROR("Not suspending userqueue, timeout waiting for
>> work\n");
>> + drm_file_err(uq_mgr->file, "Not suspending userqueue,
>> timeout waiting\n");
>> return;
>> }
>> ret = amdgpu_userqueue_suspend_all(uq_mgr);
>> if (ret) {
>> - DRM_ERROR("Failed to evict userqueue\n");
>> + drm_file_err(uq_mgr->file, "Failed to evict userqueue\n");
>> return;
>
> It is pre-existing but strikes me as odd that failure to
> amdgpu_userqueue_suspend_all() logs a failure to *evict* instead of
> suspend (as the previous log does). Anyway, I did not look at the
> surrounding code so just thinking out loud.
Yes suspend failed as all the fences were not evicted and thats why
suspend failed. Anyways there are already alex patches which will change
this to unmap as a code reorganisation for suspend/resume is in pipeline.
regards
Sunil
>
> Regards,
>
> Tvrtko
>
>> }
>
More information about the dri-devel
mailing list