[PATCH v1] drm/amdgpu: give each kernel job a unique id
Pierre-Eric Pelloux-Prayer
pierre-eric at damsy.net
Fri Jun 13 07:51:27 UTC 2025
Hi,
Le 13/06/2025 à 01:48, Danilo Krummrich a écrit :
> On Thu, Jun 12, 2025 at 09:00:34AM +0200, Christian König wrote:
>> On 6/11/25 17:11, Danilo Krummrich wrote:
>>>>>> Mhm, reiterating our internal discussion on the mailing list.
>>>>>>
>>>>>> I think it would be nicer if we could use negative values for the kernel submissions and positive for userspace. But as discussed internally we would need to adjust the scheduler trace points for that once more.
>>>>>>
>>>>>> @Philip and @Danilo any opinion on that?
>>>>>
>>>>> Both, the U64_MAX and the positive-negative approach, are a bit hacky. I wonder
>>>>> why we need client_id to be a u64, wouldn't a u32 not be enough?
>>>>
>>>> That can trivially overflow on long running boxes.
>>>
>>> I don't know if "trivially" is the word of choice given that the number is
>>> 4,294,967,295.
>>>
>>> But I did indeed miss that this is a for ever increasing atomic. Why is it an
>>> atomic? Why is it not an IDA?
>>
>> Well IDA has some extra overhead compared to an ever increasing atomic, additional to that it might not be the best choice to re-use numbers for clients in a trace log.
>
> I think the overhead is not relevant at all, this is called from
> drm_file_alloc(). The only path I can see where this is called is
> drm_client_init(), which isn't high frequent stuff at all, is it?
>
> It seems to me that we should probably use IDA here.
>
>> On the other hand using smaller numbers is usually nicer for manual inspection.
Re-using IDs might bring confusion in the trace logs, for instance when tracing lots of short lived
processes.
>
> Another option is to just add an interface to get a kernel client_id from the
> same atomic / IDA.
Honestly I don't really see the problem with the current patch: using the last N ids for the kernel
jobs requires no other changes and works fine.
https://gitlab.freedesktop.org/drm/amd/-/issues/4260 has a screenshot of the UMR tool using these ids.
If the theoretical overlap with client drm id is a concern, adding code to drm_file_alloc() to not
use the last 0xff ids would be easy.
btw maybe other drivers also use kernel jobs with the same semantics, in which case we might want to
move the special IDs definition to a shared place.
Thanks,
Pierre-Eric
More information about the dri-devel
mailing list