[PATCH] drm/amd/amdgpu: vm entities should have kernel priority

Christian König ckoenig.leichtzumerken at gmail.com
Tue Jul 20 09:28:46 UTC 2021


I would check (job->vm == NULL) instead, but yes something like that 
should work.

Regards,
Christian.

Am 20.07.21 um 11:14 schrieb Chen, JingWen:
> [AMD Official Use Only]
>
> Hi Christian,
>
> I agree that changing the priority is not the right way.
> So to not consider paging job guilty, do you think it's OK to not increase_karma for it(if (job->vmid == 0))?
>
> Best Regards,
> JingWen Chen
>
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken at gmail.com>
> Sent: Tuesday, July 20, 2021 4:29 PM
> To: Chen, JingWen <JingWen.Chen2 at amd.com>; Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Chen, Horace <Horace.Chen at amd.com>
> Subject: Re: [PATCH] drm/amd/amdgpu: vm entities should have kernel priority
>
> Hi JingWen,
>
> you can look at the job->vm pointer to distinct an userspace submission
> from a kernel submission.
>
> The priority is not really related to the submission type, we just
> happen to treat paging jobs with the highest priority since they are
> important for the system as a whole.
>
> Regards,
> Christian.
>
> Am 20.07.21 um 09:49 schrieb Chen, JingWen:
>> [AMD Official Use Only]
>>
>> Hi Christian,
>>
>> Even if this is a userspace mapping, it's still packaged by the kernel, so it's always assumed to be correct. In which case modifying the priority should have no side effects. May I know the detailed reason for your concern?
>>
>> And if we eventually decide not to change the priority, do you have any suggestions about how to make sure the paging jobs are not considered guilty?
>>
>> Best Regards,
>> JingWen Chen
>>
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Sent: Monday, July 19, 2021 7:10 PM
>> To: Liu, Monk <Monk.Liu at amd.com>; Chen, JingWen <JingWen.Chen2 at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Chen, Horace <Horace.Chen at amd.com>
>> Subject: Re: [PATCH] drm/amd/amdgpu: vm entities should have kernel priority
>>
>> Am 19.07.21 um 11:42 schrieb Liu, Monk:
>>> [AMD Official Use Only]
>>>
>>> Besides, I think our current KMD have three types of kernel sdma jobs:
>>> 1) adev->mman.entity, it is already a KERNEL priority entity
>>> 2) vm->immediate
>>> 3) vm->delay
>>>
>>> Do you mean now vm->immediate or delay are used as moving jobs instead of mman.entity ?
>> No, exactly that's the point. vm->immediate and vm->delayed are not for kernel paging jobs.
>>
>> Those are used for userspace page table updates.
>>
>> I agree that those should probably not considered guilty, but modifying the priority of them is not the right way of doing that.
>>
>> Regards,
>> Christian.
>>
>>> Thanks
>>>
>>> ------------------------------------------
>>> Monk Liu | Cloud-GPU Core team
>>> ------------------------------------------
>>>
>>> -----Original Message-----
>>> From: Liu, Monk
>>> Sent: Monday, July 19, 2021 5:40 PM
>>> To: 'Christian König' <ckoenig.leichtzumerken at gmail.com>; Chen,
>>> JingWen <JingWen.Chen2 at amd.com>; amd-gfx at lists.freedesktop.org
>>> Cc: Chen, Horace <Horace.Chen at amd.com>
>>> Subject: RE: [PATCH] drm/amd/amdgpu: vm entities should have kernel
>>> priority
>>>
>>> [AMD Official Use Only]
>>>
>>> If there is move jobs clashing there we probably need to fix the bugs
>>> of those move jobs
>>>
>>> Previously I believe you also remember that we agreed to always trust
>>> kernel jobs especially paging jobs,
>>>
>>> Without set paging jobs' priority to KERNEL level how can we keep that protocol ? do you have a better idea?
>>>
>>> Thanks
>>>
>>> ------------------------------------------
>>> Monk Liu | Cloud-GPU Core team
>>> ------------------------------------------
>>>
>>> -----Original Message-----
>>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>>> Sent: Monday, July 19, 2021 4:25 PM
>>> To: Chen, JingWen <JingWen.Chen2 at amd.com>;
>>> amd-gfx at lists.freedesktop.org
>>> Cc: Chen, Horace <Horace.Chen at amd.com>; Liu, Monk <Monk.Liu at amd.com>
>>> Subject: Re: [PATCH] drm/amd/amdgpu: vm entities should have kernel
>>> priority
>>>
>>> Am 19.07.21 um 07:57 schrieb Jingwen Chen:
>>>> [Why]
>>>> Current vm_pte entities have NORMAL priority, in SRIOV multi-vf use
>>>> case, the vf flr happens first and then job time out is found.
>>>> There can be several jobs timeout during a very small time slice.
>>>> And if the innocent sdma job time out is found before the real bad
>>>> job, then the innocent sdma job will be set to guilty as it only has
>>>> NORMAL priority. This will lead to a page fault after resubmitting
>>>> job.
>>>>
>>>> [How]
>>>> sdma should always have KERNEL priority. The kernel job will always
>>>> be resubmitted.
>>> I'm not sure if that is a good idea. We intentionally didn't gave the page table updates kernel priority to avoid clashing with the move jobs.
>>>
>>> Christian.
>>>
>>>> Signed-off-by: Jingwen Chen <Jingwen.Chen2 at amd.com>
>>>> ---
>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
>>>>      1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> index 358316d6a38c..f7526b67cc5d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>>> @@ -2923,13 +2923,13 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>>>>        INIT_LIST_HEAD(&vm->done);
>>>>
>>>>        /* create scheduler entities for page table updates */
>>>> -    r = drm_sched_entity_init(&vm->immediate, DRM_SCHED_PRIORITY_NORMAL,
>>>> +    r = drm_sched_entity_init(&vm->immediate,
>>>> +DRM_SCHED_PRIORITY_KERNEL,
>>>>                                  adev->vm_manager.vm_pte_scheds,
>>>>                                  adev->vm_manager.vm_pte_num_scheds, NULL);
>>>>        if (r)
>>>>                return r;
>>>>
>>>> -    r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_NORMAL,
>>>> +    r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_KERNEL,
>>>>                                  adev->vm_manager.vm_pte_scheds,
>>>>                                  adev->vm_manager.vm_pte_num_scheds, NULL);
>>>>        if (r)



More information about the amd-gfx mailing list