[PATCH 2/2] drm/amdkfd: Improve chances of unique gpu_id

Felix Kuehling felix.kuehling at amd.com
Thu May 2 03:47:05 UTC 2024



On 2024-05-01 21:08, Harish Kasiviswanathan wrote:
> gpu_id needs to be unique for user space to identify GPUs via KFD
> interface. Do a single pass search to detect collision. If
> detected, increment gpu_id by one.
> 
> Probability of collisons are very rare. Hence, no more complexity is
> added to ensure uniqueness.> 
> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan at amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index b93913934b03..f2d1e82e7bed 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1095,6 +1095,8 @@ static uint32_t kfd_generate_gpu_id(struct kfd_node *gpu)
>  	uint32_t hashout;
>  	uint32_t buf[8];
>  	uint64_t local_mem_size;
> +	struct kfd_topology_device *dev;
> +	bool is_unique = true;
>  	int i;
>  
>  	if (!gpu)
> @@ -1115,7 +1117,13 @@ static uint32_t kfd_generate_gpu_id(struct kfd_node *gpu)
>  	for (i = 0, hashout = 0; i < 8; i++)
>  		hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH);
>  
> -	return hashout;
> +	down_read(&topology_lock);
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		if (dev->gpu && dev->gpu_id == hashout)
> +			is_unique = false;

You can break early here.

> +	}
> +	up_read(&topology_lock);
> +	return is_unique ? hashout : ++hashout;

We should make sure that hashout stays within the KFD_GPU_ID_HASH_WIDTH. And if we're already adding a collision check, we may as well make it air-tight. It should be easy enough by wrapping it in a do-while loop. While we're at it, can we also check that the hash is not 0, because that value is used for non-GPU nodes? I think this would satisfy all my requests:

	do {
		if (!hashout)
			hashout++;
		is_unique = true;
		list_for_each_entry(dev, &topology_device_list, list) {
			if (dev->gpu && dev->gpu_id == hashout) {
				is_unique = false;
				hashout = (hashout + 1) &
					  ((1U << KFD_GPU_ID_HASH_WIDTH) - 1);
				break;
			}
		}
	} while (!is_unique);

Regards,
  Felix


>  }
>  /* kfd_assign_gpu - Attach @gpu to the correct kfd topology device. If
>   *		the GPU device is not already present in the topology device
> @@ -1946,7 +1954,6 @@ int kfd_topology_add_device(struct kfd_node *gpu)
>  	struct amdgpu_gfx_config *gfx_info = &gpu->adev->gfx.config;
>  	struct amdgpu_cu_info *cu_info = &gpu->adev->gfx.cu_info;
>  
> -	gpu_id = kfd_generate_gpu_id(gpu);
>  	if (gpu->xcp && !gpu->xcp->ddev) {
>  		dev_warn(gpu->adev->dev,
>  			 "Won't add GPU to topology since it has no drm node assigned.");
> @@ -1969,6 +1976,7 @@ int kfd_topology_add_device(struct kfd_node *gpu)
>  	if (res)
>  		return res;
>  
> +	gpu_id = kfd_generate_gpu_id(gpu);
>  	dev->gpu_id = gpu_id;
>  	gpu->id = gpu_id;
>  


More information about the amd-gfx mailing list