[PATCH v1] drm/amdgpu: give each kernel job a unique id

Christian König christian.koenig at amd.com
Wed Jun 11 14:57:50 UTC 2025


On 6/11/25 16:25, Danilo Krummrich wrote:
> (Cc: dri-devel)
> 
> On Tue, Jun 10, 2025 at 03:05:34PM +0200, Christian König wrote:
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>> index 5a8bc6342222..6108a6f9dba7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
>>> @@ -44,6 +44,22 @@
>>>  struct amdgpu_fence;
>>>  enum amdgpu_ib_pool_type;
>>>  
>>> +/* Internal kernel job ids. (decreasing values, starting from U64_MAX). */
>>> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE              (18446744073709551615ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_PDES         (18446744073709551614ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_VM_UPDATE_RANGE        (18446744073709551613ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_VM_PT_CLEAR            (18446744073709551612ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_TTM_MAP_BUFFER         (18446744073709551611ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_TTM_ACCESS_MEMORY_SDMA (18446744073709551610ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER        (18446744073709551609ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE       (18446744073709551608ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_MOVE_BLIT              (18446744073709551607ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER       (18446744073709551606ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_CLEANER_SHADER         (18446744073709551605ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_FLUSH_GPU_TLB          (18446744073709551604ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_KFD_GART_MAP           (18446744073709551603ULL)
>>> +#define AMDGPU_KERNEL_JOB_ID_VCN_RING_TEST          (18446744073709551602ULL)
> 
> Why not
> 
> 	(U64_MAX - {1, 2, ...})?

That's what Pierre came up with as well, but I thought that this is ugly because it makes it hard to match the numbers from the trace back to something in the code.

>> Mhm, reiterating our internal discussion on the mailing list.
>>
>> I think it would be nicer if we could use negative values for the kernel submissions and positive for userspace. But as discussed internally we would need to adjust the scheduler trace points for that once more.
>>
>> @Philip and @Danilo any opinion on that?
> 
> Both, the U64_MAX and the positive-negative approach, are a bit hacky. I wonder
> why we need client_id to be a u64, wouldn't a u32 not be enough?

That can trivially overflow on long running boxes.

> Anyways, if client_id remains to be a u64, we could add a global DRM constant
> instead, e.g.
> 
> 	#define DRM_CLIENT_ID_MAX	0x0fffffffffffffff
> 	#define DRM_KERNEL_ID_BASE	(DRM_CLIENT_ID_MAX + 1)
> 
> And in drm_file_alloc() fail if we're out of IDs.

Mhm, I wouldn't mind printing the client id as hex and then always setting the highest bit for kernel submissions.

But when we keep printing them as base 10 it kind of becomes unreadable.

Christian.

> 
> I think this would be much cleaner.



More information about the dri-devel mailing list