[PATCH] drm/amdgpu: add option params to enforce process isolation between graphics and compute

Christian König christian.koenig at amd.com
Tue Jun 6 10:33:39 UTC 2023


Am 01.06.23 um 13:14 schrieb Chong Li:
> enforce process isolation between graphics and compute via using the same reserved vmid.
>
> Signed-off-by: Chong Li <chongli2 at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  9 +++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 13 ++++++++++++-
>   3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index ce196badf42d..48c5c547d85a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -215,6 +215,7 @@ extern int amdgpu_force_asic_type;
>   extern int amdgpu_smartshift_bias;
>   extern int amdgpu_use_xgmi_p2p;
>   extern int amdgpu_mtype_local;
> +extern int enforce_isolation;
>   #ifdef CONFIG_HSA_AMD
>   extern int sched_policy;
>   extern bool debug_evictions;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 3d91e123f9bd..2e0ebd92b4cf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -973,6 +973,15 @@ MODULE_PARM_DESC(
>   						4 = AMDGPU_CPX_PARTITION_MODE)");
>   module_param_named(user_partt_mode, amdgpu_user_partt_mode, uint, 0444);
>   
> +
> +/**
> + * DOC: enforce_isolation (int)
> + * enforce process isolation between graphics and compute via using the same reserved vmid.
> + */
> +int enforce_isolation = 0;

Please move that to the other declarations above.

> +module_param(enforce_isolation, int, 0444);

IIRC you can also use bool here.

> +MODULE_PARM_DESC(enforce_isolation, "enforce process isolation between graphics and compute . 1 = On, 0 = Off");

This way you can drop the "1 = On, 0 = Off" part from the description 
because "enforce_isolation=on" should then be accepted on the kernel 
commandline as well.

> +
>   /* These devices are not supported by amdgpu.
>    * They are supported by the mach64, r128, radeon drivers
>    */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> index c991ca0b7a1c..33efa17d08ff 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> @@ -409,7 +409,7 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct amdgpu_ring *ring,
>   	if (r || !idle)
>   		goto error;
>   
> -	if (vm->reserved_vmid[vmhub]) {
> +	if (vm->reserved_vmid[vmhub] || (enforce_isolation && (vmhub == AMDGPU_GFXHUB(0)))) {
>   		r = amdgpu_vmid_grab_reserved(vm, ring, job, &id, fence);
>   		if (r || !id)
>   			goto error;
> @@ -578,6 +578,17 @@ void amdgpu_vmid_mgr_init(struct amdgpu_device *adev)
>   			list_add_tail(&id_mgr->ids[j].list, &id_mgr->ids_lru);
>   		}
>   	}
> +
> +	if (enforce_isolation) {
> +		struct amdgpu_vmid_mgr *id_mgr = &adev->vm_manager.id_mgr[AMDGPU_GFXHUB(0)];
> +		struct amdgpu_vmid *id = NULL;

Empty line between declaration and code please.

> +		++id_mgr->reserved_use_count;
> +		id = list_first_entry(&id_mgr->ids_lru, struct amdgpu_vmid,
> +					list);
> +		/* Remove from normal round robin handling */
> +		list_del_init(&id->list);
> +		id_mgr->reserved = id;

It would be good if we don't duplicate this hunk here and in 
amdgpu_vmid_alloc_reserved().

We should probably cleanup amdgpu_vmid_alloc_reserved() a bit and move 
the check for vm->reserved_vmid into amdgpu_vm_ioctl().

This way we could call amdgpu_vmid_alloc_reserved() here as well.

Apart from that looks good from the technical side.

Regards,
Christian.

> +	}
>   }
>   
>   /**



More information about the amd-gfx mailing list