[PATCH 2/2] drm/amdkfd: change svm range evict

Thu Jun 30 16:09:00 UTC 2022

On 2022-06-30 11:19, Eric Huang wrote:
>
> On 2022-06-29 19:29, Felix Kuehling wrote:
>> On 2022-06-29 18:53, Eric Huang wrote:
>>>
>>>
>>> On 2022-06-29 18:20, Felix Kuehling wrote:
>>>> On 2022-06-28 17:43, Eric Huang wrote:
>>>>> Two changes:
>>>>> 1. reducing unnecessary evict/unmap when range is not mapped to gpu.
>>>>> 2. adding always evict when flags is set to always_mapped.
>>>>>
>>>>> Signed-off-by: Eric Huang <jinhuieric.huang at amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 ++++++++--
>>>>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>>> index 4bf2f75f853b..76e817687ef9 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>>> @@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, 
>>>>> struct mm_struct *mm,
>>>>>       struct kfd_process *p;
>>>>>       int r = 0;
>>>>>   +    if (!prange->mapped_to_gpu)
>>>>> +        return 0;
>>>>
>>>> This feels like an unrelated optimization that should be in a 
>>>> separate patch.
>>>>
>>>> But I'm not sure this is correct, because it doesn't consider child 
>>>> ranges. svm_range_unmap_from_gpus already contains this check, so 
>>>> ranges should not be unmapped unnecessarily either way. Is there 
>>>> any other benefit to this change that I'm missing?
>>> I will send another patch separately that considers child ranges.
>>
>> I think this should only be done in the XNACK-off case. For XNACK-on 
>> it's already handled in the svm_range_unmap_from_gpus.
> Yes and It is also done when KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED is set.
>>
>>
>>> The benefit is it will reduce unnecessary queue evicts when 
>>> allocating context save memory, which is unmapped to gpu.
>>
>> That sounds wrong. The context save area should never be unmapped 
>> from GPU. That's the whole point of setting the 
>> KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag. I guess this is happening 
>> while migrating the context save area to VRAM for the first time, 
>> even before it's mapped to GPU?
> Yes. It is for the first time when registering svm range and migrating 
> to VRAM are doing together, at this moment, the range is not mapped to 
> GPU.
>>
>> I think there may be another eviction, when the CWSR header is 
>> initialized by the CPU. That would also migrate it back to system 
>> memory. To avoid that, you should probably register the context save 
>> area only after the header has been initialized.
> Yes. I am using this way. Please look at patch 4/4.
>>
>> I think avoiding an eviction when memory is migrated when it is first 
>> registered is worthwhile, not just for CWSR.
>>
>>
>>> It is for efficiency reason. On the other hand, without this 
>>> optimization KFDCWSRTest.InterruptRestore fails with queue 
>>> preemption error.
>>
>> What do you mean by "queue preemption error"? Does HWS hang?
> HWS doesn't hang immediately, so there is not error for fence timeout 
> "The cp might be in an unrecoverable state due to an unsuccessful 
> queues preemption". The error is "HIQ MQD's queue_doorbell_id0 is not 
> 0, Queue preemption time out" after checking mqd manager, which means 
> HWS abandons unmap queue request without returning timeout error to 
> driver. And after this error, the following test will fail at queue 
> creation as HWS hangs

OK, that sounds like the kind of bug the InterruptRestore test is meant 
to catch. I think you just created a better test by causing more 
preemptions. ;)

So we should do two things:

  * Avoid unnecessary preemptions in KFD
  * Improve the test to reproduce this hang even without unnecessary
    preemptions in KFD, so we can investigate the issue

Regards,
   Felix

>>
>>
>>> I think the reason is the extra queue evicts make HWS too busy to 
>>> preempt existing queues. There is one unmap_queue packet sent to HWS 
>>> in current code, and will be three unmap_queue packets with unified 
>>> memory allocation.
>>
>> When queues of a process are already evicted, they should not get 
>> evicted again. That's handled by the qpd->evicted counter. There 
>> should never be multiple unmap_queues packets in flight at the same 
>> time. If you're seeing three unmap_queues, you should also see queues 
>> restored three times.
>>
>> HWS should never be too busy to evict queues. If you're seeing 
>> preemptions fail, you may have found a bug.
> The restore delay worker will do something differently in term of 
> timing. It could restore queues before next unmap_queues, so the 
> situation is too complicate to debug in multiple queues evict/restore 
> environment. The error definitely means there is a bug, from driver 
> point of view there is nothing wrong even adding extra queue eviction, 
> so I try to avoid extra queue eviction and keep it as before. The 
> bottom line is to make sure unified svm range for context save area 
> doesn't cause any failure in kfdtest, so I can theoretically assume 
> extra queue eviction/restoring causes HWS failure.
>
> Regards,
> Eric
>>
>> Regards,
>>   Felix
>>
>>
>>> So this optimization will keep only one unmap_queue as before.
>>>
>>> Regards,
>>> Eric
>>>>
>>>> Regards,
>>>>   Felix
>>>>
>>>>
>>>>> +
>>>>>       p = container_of(svms, struct kfd_process, svms);
>>>>>         pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 
>>>>> 0x%lx]\n",
>>>>>            svms, prange->start, prange->last, start, last);
>>>>>   -    if (!p->xnack_enabled) {
>>>>> +    if (!p->xnack_enabled ||
>>>>> +        (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
>>>>>           int evicted_ranges;
>>>>>             list_for_each_entry(pchild, &prange->child_list, 
>>>>> child_list) {
>>>>> @@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, 
>>>>> struct mm_struct *mm,
>>>>>           if (r)
>>>>>               goto out_unlock_range;
>>>>>   -        if (migrated && !p->xnack_enabled) {
>>>>> +        if (migrated && (!p->xnack_enabled ||
>>>>> +            (prange->flags & 
>>>>> KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
>>>>> +            prange->mapped_to_gpu) {
>>>>>               pr_debug("restore_work will update mappings of 
>>>>> GPUs\n");
>>>>>               mutex_unlock(&prange->migrate_mutex);
>>>>>               continue;
>>>
>