[PATCH] drm/amdkfd: Add sync after creating vram bo

Philip Yang yangp at amd.com
Mon Jan 9 20:34:17 UTC 2023


On 2023-01-09 15:23, Felix Kuehling wrote:
> Am 2023-01-09 um 15:18 schrieb Philip Yang:
>>
>> On 2023-01-09 14:27, Eric Huang wrote:
>>> There will be data corruption on vram allocated by svm
>>> if initialization is not being done. Adding sync is to
>>> resolve this issue.
>>>
>>> Signed-off-by: Eric Huang <jinhuieric.huang at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index b8c9753a4818..344e20306635 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -574,6 +574,13 @@ svm_range_vram_node_new(struct amdgpu_device 
>>> *adev, struct svm_range *prange,
>>>           goto reserve_bo_failed;
>>>       }
>>
>> Thanks for catching this bug, we could optimize as clear VRAM is only 
>> needed for partial migration, ex pass the clear flag clear = (cpages 
>> != npages) from svm_migrate_vma_to_vram -> svm_migrate_copy_to_vram 
>> -> svm_range_vram_node_new.
>
> I only see one call to svm_range_vram_node_new, and it passes "true" 
> unconditionally. What am I missing?
I just realize that if all range pages are migrated to VRAM, we don't 
need clear VRAM before migration.
>
> That said, if VRAM is not cleared, there will be no fence to wait for, 
> so the amdgpu_bo_sync_wait call is basically a no-op with a little bit 
> of overhead for creating and destroying an empty sync object.

We pass true unconditionally right now, change it to conditional for 
partial migration only, then sync will be a no-op if all pages of range 
are migrated.

Regards,

Philip

>
> Regards,
>   Felix
>
>
>>
>> Regards,
>>
>> Philip
>>
>>> +    r = amdgpu_bo_sync_wait(bo, AMDGPU_FENCE_OWNER_KFD, false);
>>> +    if (r) {
>>> +        pr_debug("failed %d to sync bo\n", r);
>>> +        amdgpu_bo_unreserve(bo);
>>> +        goto reserve_bo_failed;
>>> +    }
>>> +
>>>       r = dma_resv_reserve_fences(amdkcl_ttm_resvp(&bo->tbo), 1);
>>>       if (r) {
>>>           pr_debug("failed %d to reserve bo\n", r);


More information about the amd-gfx mailing list