[PATCH] Fix Incorrect VMIDs passed to HWS

Felix Kuehling felix.kuehling at amd.com
Thu Mar 17 21:05:45 UTC 2022


Am 2022-03-17 um 16:57 schrieb Tushar Patel:
> Removed dev_error message for incorrect VMIDs
>
> Fix Incorrect VMIDs passed to HWS

This could use more of an explanation. The problem here was, that the 
previous default was based on an outdated number of VMIDs. On Arcturus 
and Aldebaran we reserve more VMIDs for KFD. That was never reflected in 
the maximum concurrency setting for HWS. This patch fixes that by making 
the default dependent on the number of VMIDs per GPU.


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 +++---------
>   2 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 4c20c23d6ba0..bda1b5132ee8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -680,7 +680,7 @@ MODULE_PARM_DESC(sched_policy,
>    * Maximum number of processes that HWS can schedule concurrently. The maximum is the
>    * number of VMIDs assigned to the HWS, which is also the default.
>    */
> -int hws_max_conc_proc = 8;
> +int hws_max_conc_proc = -1;
>   module_param(hws_max_conc_proc, int, 0444);
>   MODULE_PARM_DESC(hws_max_conc_proc,
>   	"Max # processes HWS can execute concurrently when sched_policy=0 (0 = no concurrency, #VMIDs for KFD = Maximum(default))");
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 339e12c94cff..66074e1abc79 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -483,15 +483,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>   	}
>   
>   	/* Verify module parameters regarding mapped process number*/
> -	if ((hws_max_conc_proc < 0)
> -			|| (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) {
> -		dev_err(kfd_device,
> -			"hws_max_conc_proc %d must be between 0 and %d, use %d instead\n",
> -			hws_max_conc_proc, kfd->vm_info.vmid_num_kfd,
> -			kfd->vm_info.vmid_num_kfd);
> -		kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd;
> -	} else
> -		kfd->max_proc_per_quantum = hws_max_conc_proc;
> +	kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd;

I'd move that into an else-branch of the if-statement. That would make 
the logic clearer.


> +	if (hws_max_conc_proc != -1)

Change this condition to "hws_max_conc_proc >= 0". We never want to set 
kfd->max_proc_per_quantum to something negative.

With those issues fixed, the patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


> +		kfd->max_proc_per_quantum = min(hws_max_conc_proc, kfd->vm_info.vmid_num_kfd)
>   
>   	/* calculate max size of mqds needed for queues */
>   	size = max_num_of_queues_per_device *


More information about the amd-gfx mailing list