[PATCH] drm/amdkfd: Get prange->offset after svm_range_vram_node_new
Felix Kuehling
felix.kuehling at amd.com
Thu Mar 9 15:24:30 UTC 2023
Am 2023-03-08 um 13:39 schrieb Chen, Xiaogang:
>
> On 3/8/2023 11:11 AM, Felix Kuehling wrote:
>> On 2023-03-08 02:45, Xiaogang.Chen wrote:
>>> From: Xiaogang Chen <xiaogang.chen at amd.com>
>>>
>>> During miration to vram prange->offset is valid after vram buffer is
>>> located,
>>> either use old one or allocate a new one. Move
>>> svm_range_vram_node_new before migrate
>>> for each vma to get valid prange->offset.
>>>
>>> Signed-off-by: Xiaogang Chen <Xiaogang.Chen at amd.com>
>>
>> I'd prefer to keep svm_range_vram_node_new in
>> svm_migrate_copy_to_vram. Logically the memory allocation should be
>> after migrate_vma_setup. If migrate_vma_setup finds that there is
>> nothing to migrate, we should not allocate any memory.
>>
>> Does this fix a real issue, or is this a theoretical fix? I think it
>> should probably work correctly without this patch.
>> svm_range_vram_node_new sets prange->offset to 0. If no VRAM was
>> previously allocated, it should already be 0, so nothing changes.
>> Maybe we just need a fix to set prange->offset = 0 in
>> svm_range_vram_node_free.
>
> A real issue is same prange migrate vram->cpu, then cpu->vram. During
> vram->cpu pragne got split, so prange->offset got changed, then vram
> node got freed by svm_range_vram_node_free, but not update
> prange->offset. It is the case KFDSVMRangeTes.MigrateTest. I will
> check by set prange->offset = 0 at svm_range_vram_node_free.
>
> In theory, getting prange->offset after svm_range_vram_node_new makes
> code logically clearer? svm_range_vram_node_new handles different
> cases, we are not sure what prange->offset would be before call it.
I think you're right. It's more complicated than it appears at first
glance. Even if memory is already allocated, it may be on the wrong GPU.
And svm_range_vram_node_free doesn't really free the BO if it's shared
with other pranges. svm_vram_range_new can reuse the existing BO if it
was still referenced by other ranges. So we must remember the
prange->offset even after svm_range_vram_node_free was called.
>
> If migrate_vma_setup fail for a vma, we can svm_range_vram_node_free
> the vram buffer got from svm_range_vram_node_new.
I agree. There is this condition at the end of svm_migrate_ram_to_vram:
if (cpages)
prange->actual_loc = best_loc;
You can add
else
svm_range_vram_node_free(prange);
With that addition, the patch is
Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>
>>
>> Regards,
>> Felix
>>
>>
>>> ---
>>> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 12 ++++++------
>>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> index fd54a00e7229..15791490c23e 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>>> @@ -310,12 +310,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device
>>> *adev, struct svm_range *prange,
>>> src = scratch;
>>> dst = (uint64_t *)(scratch + npages);
>>> - r = svm_range_vram_node_new(adev, prange, true);
>>> - if (r) {
>>> - dev_dbg(adev->dev, "fail %d to alloc vram\n", r);
>>> - goto out;
>>> - }
>>> -
>>> amdgpu_res_first(prange->ttm_res, ttm_res_offset,
>>> npages << PAGE_SHIFT, &cursor);
>>> for (i = j = 0; i < npages; i++) {
>>> @@ -525,6 +519,12 @@ svm_migrate_ram_to_vram(struct svm_range
>>> *prange, uint32_t best_loc,
>>> start = prange->start << PAGE_SHIFT;
>>> end = (prange->last + 1) << PAGE_SHIFT;
>>> +
>>> + r = svm_range_vram_node_new(adev, prange, true);
>>> + if (r) {
>>> + dev_dbg(adev->dev, "fail %d to alloc vram\n", r);
>>> + return r;
>>> + }
>>> ttm_res_offset = prange->offset << PAGE_SHIFT;
>>> for (addr = start; addr < end;) {
More information about the amd-gfx
mailing list