[PATCH] Fix Incorrect VMIDs passed to HWS
Felix Kuehling
felix.kuehling at amd.com
Thu Mar 17 21:05:45 UTC 2022
Am 2022-03-17 um 16:57 schrieb Tushar Patel:
> Removed dev_error message for incorrect VMIDs
>
> Fix Incorrect VMIDs passed to HWS
This could use more of an explanation. The problem here was, that the
previous default was based on an outdated number of VMIDs. On Arcturus
and Aldebaran we reserve more VMIDs for KFD. That was never reflected in
the maximum concurrency setting for HWS. This patch fixes that by making
the default dependent on the number of VMIDs per GPU.
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +-
> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 +++---------
> 2 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 4c20c23d6ba0..bda1b5132ee8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -680,7 +680,7 @@ MODULE_PARM_DESC(sched_policy,
> * Maximum number of processes that HWS can schedule concurrently. The maximum is the
> * number of VMIDs assigned to the HWS, which is also the default.
> */
> -int hws_max_conc_proc = 8;
> +int hws_max_conc_proc = -1;
> module_param(hws_max_conc_proc, int, 0444);
> MODULE_PARM_DESC(hws_max_conc_proc,
> "Max # processes HWS can execute concurrently when sched_policy=0 (0 = no concurrency, #VMIDs for KFD = Maximum(default))");
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 339e12c94cff..66074e1abc79 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -483,15 +483,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
> }
>
> /* Verify module parameters regarding mapped process number*/
> - if ((hws_max_conc_proc < 0)
> - || (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) {
> - dev_err(kfd_device,
> - "hws_max_conc_proc %d must be between 0 and %d, use %d instead\n",
> - hws_max_conc_proc, kfd->vm_info.vmid_num_kfd,
> - kfd->vm_info.vmid_num_kfd);
> - kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd;
> - } else
> - kfd->max_proc_per_quantum = hws_max_conc_proc;
> + kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd;
I'd move that into an else-branch of the if-statement. That would make
the logic clearer.
> + if (hws_max_conc_proc != -1)
Change this condition to "hws_max_conc_proc >= 0". We never want to set
kfd->max_proc_per_quantum to something negative.
With those issues fixed, the patch is
Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
> + kfd->max_proc_per_quantum = min(hws_max_conc_proc, kfd->vm_info.vmid_num_kfd)
>
> /* calculate max size of mqds needed for queues */
> size = max_num_of_queues_per_device *
More information about the amd-gfx
mailing list