[PATCH] drm/amdkfd: Add sync after creating vram bo

Felix Kuehling felix.kuehling at amd.com
Mon Jan 9 20:23:00 UTC 2023


Am 2023-01-09 um 15:18 schrieb Philip Yang:
>
> On 2023-01-09 14:27, Eric Huang wrote:
>> There will be data corruption on vram allocated by svm
>> if initialization is not being done. Adding sync is to
>> resolve this issue.
>>
>> Signed-off-by: Eric Huang <jinhuieric.huang at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index b8c9753a4818..344e20306635 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -574,6 +574,13 @@ svm_range_vram_node_new(struct amdgpu_device 
>> *adev, struct svm_range *prange,
>>           goto reserve_bo_failed;
>>       }
>
> Thanks for catching this bug, we could optimize as clear VRAM is only 
> needed for partial migration, ex pass the clear flag clear = (cpages 
> != npages) from svm_migrate_vma_to_vram -> svm_migrate_copy_to_vram -> 
> svm_range_vram_node_new.

I only see one call to svm_range_vram_node_new, and it passes "true" 
unconditionally. What am I missing?

That said, if VRAM is not cleared, there will be no fence to wait for, 
so the amdgpu_bo_sync_wait call is basically a no-op with a little bit 
of overhead for creating and destroying an empty sync object.

Regards,
   Felix


>
> Regards,
>
> Philip
>
>> +    r = amdgpu_bo_sync_wait(bo, AMDGPU_FENCE_OWNER_KFD, false);
>> +    if (r) {
>> +        pr_debug("failed %d to sync bo\n", r);
>> +        amdgpu_bo_unreserve(bo);
>> +        goto reserve_bo_failed;
>> +    }
>> +
>>       r = dma_resv_reserve_fences(amdkcl_ttm_resvp(&bo->tbo), 1);
>>       if (r) {
>>           pr_debug("failed %d to reserve bo\n", r);


More information about the amd-gfx mailing list