[PATCH] drm/amdkfd: Add sync after creating vram bo

Felix Kuehling felix.kuehling at amd.com
Mon Jan 9 20:26:57 UTC 2023


Am 2023-01-09 um 15:23 schrieb Felix Kuehling:
> Am 2023-01-09 um 15:18 schrieb Philip Yang:
>>
>> On 2023-01-09 14:27, Eric Huang wrote:
>>> There will be data corruption on vram allocated by svm
>>> if initialization is not being done. Adding sync is to
>>> resolve this issue.
>>>
>>> Signed-off-by: Eric Huang <jinhuieric.huang at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index b8c9753a4818..344e20306635 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -574,6 +574,13 @@ svm_range_vram_node_new(struct amdgpu_device 
>>> *adev, struct svm_range *prange,
>>>           goto reserve_bo_failed;
>>>       }
>>
>> Thanks for catching this bug, we could optimize as clear VRAM is only 
>> needed for partial migration, ex pass the clear flag clear = (cpages 
>> != npages) from svm_migrate_vma_to_vram -> svm_migrate_copy_to_vram 
>> -> svm_range_vram_node_new.
>
> I only see one call to svm_range_vram_node_new, and it passes "true" 
> unconditionally. What am I missing?

I think you're suggesting an optimization in svm_migrate_copy_to_vram. I 
think that makes sense. We don't need to initialize VRAM if we know it's 
going to be overwritten immediately with data migrated from system 
memory. But that's outside the scope of this fix.

Regards,
   Felix


>
> That said, if VRAM is not cleared, there will be no fence to wait for, 
> so the amdgpu_bo_sync_wait call is basically a no-op with a little bit 
> of overhead for creating and destroying an empty sync object.
>
> Regards,
>   Felix
>
>
>>
>> Regards,
>>
>> Philip
>>
>>> +    r = amdgpu_bo_sync_wait(bo, AMDGPU_FENCE_OWNER_KFD, false);
>>> +    if (r) {
>>> +        pr_debug("failed %d to sync bo\n", r);
>>> +        amdgpu_bo_unreserve(bo);
>>> +        goto reserve_bo_failed;
>>> +    }
>>> +
>>>       r = dma_resv_reserve_fences(amdkcl_ttm_resvp(&bo->tbo), 1);
>>>       if (r) {
>>>           pr_debug("failed %d to reserve bo\n", r);


More information about the amd-gfx mailing list