[PATCH 1/1] drm/amdkfd: Migrate in CPU page fault use current mm

Fri Sep 9 16:35:48 UTC 2022

Am 2022-09-09 um 11:42 schrieb Philip Yang:
>
> On 2022-09-09 11:13, Felix Kuehling wrote:
>> Am 2022-09-09 um 09:12 schrieb Philip Yang:
>>> migrate_vma_setup shows below warning because we don't hold another
>>> process mm mmap_lock. We should use current vmf->vma->vm_mm instead, 
>>> the
>>> caller already hold current mmap lock inside CPU page fault handler.
>>>
>>>   WARNING: CPU: 10 PID: 3054 at include/linux/mmap_lock.h:155 find_vma
>>>   Call Trace:
>>>    walk_page_range+0x76/0x150
>>>    migrate_vma_setup+0x18a/0x640
>>>    svm_migrate_vram_to_ram+0x245/0xa10 [amdgpu]
>>>    svm_migrate_to_ram+0x36f/0x470 [amdgpu]
>>>    do_swap_page+0xcfe/0xec0
>>>    __handle_mm_fault+0x96b/0x15e0
>>>    handle_mm_fault+0x13f/0x3e0
>>>    do_user_addr_fault+0x1e7/0x690
>>>
>>> Fixes: 5e5bbf36a2c0 ("drm/amdkfd: handle CPU fault on COW mapping")
>>> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
>>
>> For a quick fix, this looks OK.
>>
>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>
>> For a better fix, I notice that svm_migrate_vram_to_ram only uses the 
>> mm to look up the vma. But you already know the vma here, so that 
>> look up is completely unnecessary. So could you just call 
>> svm_migrate_vma_to_ram directly? Then you don't need the mm at all 
>> and you save yourself an unnecessary vma lookup from the virtual 
>> address.
>
> Thanks, call svm_migrate_vma_to_ram directly works, that was my first 
> approach, but the prange we want to migrate may include multiple vma, 
> it is safer to call svm_migrate_vram_to_ram to handle this case.

Is that because we always try to migrate 2MB at a time, which may span 
more than one VMA? I think we could probably limit this to a single VMA 
for the page fault case. Separate VMAs are most likely separate 
allocations. Migrating them together due to a single page fault is 
likely unintentional.

But I agree, that should be a separate patch if we want to change that. 
We can tackle that as part of a bigger migration optimization. There is 
another item on my TODO list to get rid of splitting ranges by 
granularity and use partial migrations instead.

Thanks,
   Felix

>
> Regards,
>
> Philip
>
>>
>> Regards,
>>   Felix
>>
>>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> index f62c4561f0f4..1cfa4fcd28b3 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> @@ -947,7 +947,8 @@ static vm_fault_t svm_migrate_to_ram(struct 
>>> vm_fault *vmf)
>>>           goto out_unlock_prange;
>>>       }
>>>   -    r = svm_migrate_vram_to_ram(prange, mm, 
>>> KFD_MIGRATE_TRIGGER_PAGEFAULT_CPU);
>>> +    r = svm_migrate_vram_to_ram(prange, vmf->vma->vm_mm,
>>> +                    KFD_MIGRATE_TRIGGER_PAGEFAULT_CPU);
>>>       if (r)
>>>           pr_debug("failed %d migrate svms 0x%p range 0x%p [0x%lx 
>>> 0x%lx]\n",
>>>                r, prange->svms, prange, prange->start, prange->last);