[PATCH] drm/amdkfd: Have kfd driver use same PASID values from graphic driver
Felix Kuehling
felix.kuehling at amd.com
Thu Nov 21 15:22:00 UTC 2024
On 2024-11-20 22:58, Chen, Xiaogang wrote:
>>> @@ -1822,15 +1804,20 @@ struct kfd_process *kfd_lookup_process_by_pasid(u32 pasid)
>>> {
>>> struct kfd_process *p, *ret_p = NULL;
>>> unsigned int temp;
>>> + int i;
>>> int idx = srcu_read_lock(&kfd_processes_srcu);
>>> hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>>> - if (p->pasid == pasid) {
>>> - kref_get(&p->ref);
>>> - ret_p = p;
>>> - break;
>>> + for (i = 0; i < p->n_pdds; i++) {
>>> + if (p->pdds[i]->pasid == pasid) {
>>> + kref_get(&p->ref);
>>> + ret_p = p;
>>> + break;
>>> + }
>> I think this won't work correctly. The same PASID can be used for different processes on different GPUs because each adev manages its own PASID->amdgpu_vm lookup table. So kfd_lookup_process_by_pasid needs a new parameter that identifies the GPU adev, and you should only compare pasids, if the adev matches.
>
> I think it is the main concern here: the pasid used here is global in driver by amdgpu_pasid_alloc(16) at amdgpu_driver_open_kms. Each time a render node(partition) got opened, a new pasid value is generated. Its lifetime is until render node got closed. A pdd just uses this pasid. And each adev has its own xarray which saves pasids for this adev.
I think I had a misunderstanding here. I saw the xarray in adev and assumed that each adev allocated PASIDs independently. But there is also a global amdgpu_pasid_ida that I missed. If the PASID allocation is global in the amdgpu_pasid_ida, then each PASID uniquely identifies a VM your code should be fine.
@Christian, we discussed the number of PASIDs consumed on systems with many GPUs and partitions. If the PASIDs are managed globally, then running out of PASIDs is a concern. Do you think we should change this?
Regards,
Felix
>
> Regards
>
> Xiaogang
More information about the amd-gfx
mailing list