[PATCH] drm/amd/amdgpu: vm entities should have kernel priority
Liu, Monk
Monk.Liu at amd.com
Mon Jul 19 09:40:02 UTC 2021
[AMD Official Use Only]
If there is move jobs clashing there we probably need to fix the bugs of those move jobs
Previously I believe you also remember that we agreed to always trust kernel jobs especially paging jobs,
Without set paging jobs' priority to KERNEL level how can we keep that protocol ? do you have a better idea?
Thanks
------------------------------------------
Monk Liu | Cloud-GPU Core team
------------------------------------------
-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com>
Sent: Monday, July 19, 2021 4:25 PM
To: Chen, JingWen <JingWen.Chen2 at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Chen, Horace <Horace.Chen at amd.com>; Liu, Monk <Monk.Liu at amd.com>
Subject: Re: [PATCH] drm/amd/amdgpu: vm entities should have kernel priority
Am 19.07.21 um 07:57 schrieb Jingwen Chen:
> [Why]
> Current vm_pte entities have NORMAL priority, in SRIOV multi-vf use
> case, the vf flr happens first and then job time out is found.
> There can be several jobs timeout during a very small time slice.
> And if the innocent sdma job time out is found before the real bad
> job, then the innocent sdma job will be set to guilty as it only has
> NORMAL priority. This will lead to a page fault after resubmitting
> job.
>
> [How]
> sdma should always have KERNEL priority. The kernel job will always be
> resubmitted.
I'm not sure if that is a good idea. We intentionally didn't gave the page table updates kernel priority to avoid clashing with the move jobs.
Christian.
>
> Signed-off-by: Jingwen Chen <Jingwen.Chen2 at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 358316d6a38c..f7526b67cc5d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2923,13 +2923,13 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm)
> INIT_LIST_HEAD(&vm->done);
>
> /* create scheduler entities for page table updates */
> - r = drm_sched_entity_init(&vm->immediate, DRM_SCHED_PRIORITY_NORMAL,
> + r = drm_sched_entity_init(&vm->immediate, DRM_SCHED_PRIORITY_KERNEL,
> adev->vm_manager.vm_pte_scheds,
> adev->vm_manager.vm_pte_num_scheds, NULL);
> if (r)
> return r;
>
> - r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_NORMAL,
> + r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_KERNEL,
> adev->vm_manager.vm_pte_scheds,
> adev->vm_manager.vm_pte_num_scheds, NULL);
> if (r)
More information about the amd-gfx
mailing list