[PATCH] drm/amdgpu: add option params to enforce process isolation between graphics and compute
Christian König
christian.koenig at amd.com
Tue Jun 6 10:33:39 UTC 2023
Am 01.06.23 um 13:14 schrieb Chong Li:
> enforce process isolation between graphics and compute via using the same reserved vmid.
>
> Signed-off-by: Chong Li <chongli2 at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 13 ++++++++++++-
> 3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index ce196badf42d..48c5c547d85a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -215,6 +215,7 @@ extern int amdgpu_force_asic_type;
> extern int amdgpu_smartshift_bias;
> extern int amdgpu_use_xgmi_p2p;
> extern int amdgpu_mtype_local;
> +extern int enforce_isolation;
> #ifdef CONFIG_HSA_AMD
> extern int sched_policy;
> extern bool debug_evictions;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 3d91e123f9bd..2e0ebd92b4cf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -973,6 +973,15 @@ MODULE_PARM_DESC(
> 4 = AMDGPU_CPX_PARTITION_MODE)");
> module_param_named(user_partt_mode, amdgpu_user_partt_mode, uint, 0444);
>
> +
> +/**
> + * DOC: enforce_isolation (int)
> + * enforce process isolation between graphics and compute via using the same reserved vmid.
> + */
> +int enforce_isolation = 0;
Please move that to the other declarations above.
> +module_param(enforce_isolation, int, 0444);
IIRC you can also use bool here.
> +MODULE_PARM_DESC(enforce_isolation, "enforce process isolation between graphics and compute . 1 = On, 0 = Off");
This way you can drop the "1 = On, 0 = Off" part from the description
because "enforce_isolation=on" should then be accepted on the kernel
commandline as well.
> +
> /* These devices are not supported by amdgpu.
> * They are supported by the mach64, r128, radeon drivers
> */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> index c991ca0b7a1c..33efa17d08ff 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> @@ -409,7 +409,7 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct amdgpu_ring *ring,
> if (r || !idle)
> goto error;
>
> - if (vm->reserved_vmid[vmhub]) {
> + if (vm->reserved_vmid[vmhub] || (enforce_isolation && (vmhub == AMDGPU_GFXHUB(0)))) {
> r = amdgpu_vmid_grab_reserved(vm, ring, job, &id, fence);
> if (r || !id)
> goto error;
> @@ -578,6 +578,17 @@ void amdgpu_vmid_mgr_init(struct amdgpu_device *adev)
> list_add_tail(&id_mgr->ids[j].list, &id_mgr->ids_lru);
> }
> }
> +
> + if (enforce_isolation) {
> + struct amdgpu_vmid_mgr *id_mgr = &adev->vm_manager.id_mgr[AMDGPU_GFXHUB(0)];
> + struct amdgpu_vmid *id = NULL;
Empty line between declaration and code please.
> + ++id_mgr->reserved_use_count;
> + id = list_first_entry(&id_mgr->ids_lru, struct amdgpu_vmid,
> + list);
> + /* Remove from normal round robin handling */
> + list_del_init(&id->list);
> + id_mgr->reserved = id;
It would be good if we don't duplicate this hunk here and in
amdgpu_vmid_alloc_reserved().
We should probably cleanup amdgpu_vmid_alloc_reserved() a bit and move
the check for vm->reserved_vmid into amdgpu_vm_ioctl().
This way we could call amdgpu_vmid_alloc_reserved() here as well.
Apart from that looks good from the technical side.
Regards,
Christian.
> + }
> }
>
> /**
More information about the amd-gfx
mailing list