[PATCH] drm/amdkfd: Get prange->offset after svm_range_vram_node_new

Felix Kuehling felix.kuehling at amd.com
Thu Mar 9 15:24:30 UTC 2023


Am 2023-03-08 um 13:39 schrieb Chen, Xiaogang:
>
> On 3/8/2023 11:11 AM, Felix Kuehling wrote:
>> On 2023-03-08 02:45, Xiaogang.Chen wrote:
>>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>>
>>> During miration to vram prange->offset is valid after vram buffer is 
>>> located,
>>> either use old one or allocate a new one. Move 
>>> svm_range_vram_node_new before migrate
>>> for each vma to get valid prange->offset.
>>>
>>> Signed-off-by: Xiaogang Chen <Xiaogang.Chen at amd.com>
>>
>> I'd  prefer to keep svm_range_vram_node_new in 
>> svm_migrate_copy_to_vram. Logically the memory allocation should be 
>> after migrate_vma_setup. If migrate_vma_setup finds that there is 
>> nothing to migrate, we should not allocate any memory.
>>
>> Does this fix a real issue, or is this a theoretical fix? I think it 
>> should probably work correctly without this patch. 
>> svm_range_vram_node_new sets prange->offset to 0. If no VRAM was 
>> previously allocated, it should already be 0, so nothing changes. 
>> Maybe we just need a fix to set prange->offset = 0 in 
>> svm_range_vram_node_free.
>
> A real issue is same prange migrate vram->cpu, then cpu->vram. During 
> vram->cpu pragne got split, so prange->offset got changed, then vram 
> node got freed by svm_range_vram_node_free, but not update 
> prange->offset. It is the case KFDSVMRangeTes.MigrateTest. I will 
> check by set prange->offset = 0 at svm_range_vram_node_free.
>
> In theory, getting prange->offset after svm_range_vram_node_new makes 
> code logically clearer? svm_range_vram_node_new handles different 
> cases, we are not sure what prange->offset would be before call it.

I think you're right. It's more complicated than it appears at first 
glance. Even if memory is already allocated, it may be on the wrong GPU. 
And svm_range_vram_node_free doesn't really free the BO if it's shared 
with other pranges. svm_vram_range_new can reuse the existing BO if it 
was still referenced by other ranges. So we must remember the 
prange->offset even after svm_range_vram_node_free was called.


>
> If migrate_vma_setup fail for a vma, we can svm_range_vram_node_free 
> the vram buffer got from svm_range_vram_node_new.

I agree. There is this condition at the end of svm_migrate_ram_to_vram:

         if (cpages)
                 prange->actual_loc = best_loc;

You can add

	else
		svm_range_vram_node_free(prange);

With that addition, the patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


>
>>
>> Regards,
>>   Felix
>>
>>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 12 ++++++------
>>>   1 file changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> index fd54a00e7229..15791490c23e 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> @@ -310,12 +310,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device 
>>> *adev, struct svm_range *prange,
>>>       src = scratch;
>>>       dst = (uint64_t *)(scratch + npages);
>>>   -    r = svm_range_vram_node_new(adev, prange, true);
>>> -    if (r) {
>>> -        dev_dbg(adev->dev, "fail %d to alloc vram\n", r);
>>> -        goto out;
>>> -    }
>>> -
>>>       amdgpu_res_first(prange->ttm_res, ttm_res_offset,
>>>                npages << PAGE_SHIFT, &cursor);
>>>       for (i = j = 0; i < npages; i++) {
>>> @@ -525,6 +519,12 @@ svm_migrate_ram_to_vram(struct svm_range 
>>> *prange, uint32_t best_loc,
>>>         start = prange->start << PAGE_SHIFT;
>>>       end = (prange->last + 1) << PAGE_SHIFT;
>>> +
>>> +    r = svm_range_vram_node_new(adev, prange, true);
>>> +    if (r) {
>>> +        dev_dbg(adev->dev, "fail %d to alloc vram\n", r);
>>> +        return r;
>>> +    }
>>>       ttm_res_offset = prange->offset << PAGE_SHIFT;
>>>         for (addr = start; addr < end;) {


More information about the amd-gfx mailing list