[PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved

Christian König ckoenig.leichtzumerken at gmail.com
Fri Jul 12 07:36:20 UTC 2024


Hi River,

well that is complete nonsense.

The BOs are not meant to be used with on demand faulting, so their VM 
state absolutely shouldn't matter to the KFD.

Regards,
Christian.

Am 12.07.24 um 04:54 schrieb YuanShang Mao (River):
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> We need to make sure that all BOs of an active kfd process validated. Moving buffer will trigger process eviction.
> If mark it as invalided before process eviction, related kfd process is still active and may attempt to access this invalidated BO.
>
> Agree with Trigger. Seems kfd eviction should been synced to move notify, not the move action.
>
> Thanks
> River
>
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken at gmail.com>
> Sent: Thursday, July 11, 2024 8:39 PM
> To: Huang, Trigger <Trigger.Huang at amd.com>; YuanShang Mao (River) <YuanShang.Mao at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved
>
> Yeah, completely agree. This patch doesn't really make sense.
>
> Please explain why you would want to do this?
>
> Regards,
> Christian.
>
> Am 11.07.24 um 13:56 schrieb Huang, Trigger:
>> [AMD Official Use Only - AMD Internal Distribution Only]
>>
>> This patch seems to be wrong.
>> Quite a lot of preparations have been done in amdgpu_bo_move_notify
>> For example, amdgpu_bo_kunmap() will be called to prevent the BO from being accessed by CPU. If not called, the CPU may attempt to access the BO while it is being moved.
>>
>> Thanks,
>> Trigger
>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>>> YuanShang
>>> Sent: Thursday, July 11, 2024 5:10 PM
>>> To: amd-gfx at lists.freedesktop.org
>>> Cc: YuanShang Mao (River) <YuanShang.Mao at amd.com>; YuanShang Mao
>>> (River) <YuanShang.Mao at amd.com>
>>> Subject: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved
>>>
>>> Caution: This message originated from an External Source. Use proper
>>> caution when opening attachments, clicking links, or responding.
>>>
>>>
>>> It leads to race condition if amdgpu_bo is marked as invalid before
>>> it is really moved.
>>>
>>> Signed-off-by: YuanShang <YuanShang.Mao at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 10 +++++-----
>>>    1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> index 29e4b5875872..a29d5132ad3d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -519,8 +519,8 @@ static int amdgpu_bo_move(struct
>>> ttm_buffer_object *bo, bool evict,
>>>
>>>           if (!old_mem || (old_mem->mem_type == TTM_PL_SYSTEM &&
>>>                            bo->ttm == NULL)) {
>>> -               amdgpu_bo_move_notify(bo, evict, new_mem);
>>>                   ttm_bo_move_null(bo, new_mem);
>>> +               amdgpu_bo_move_notify(bo, evict, new_mem);
>>>                   return 0;
>>>           }
>>>           if (old_mem->mem_type == AMDGPU_GEM_DOMAIN_DGMA || @@ -
>>> 530,8 +530,8 @@ static int amdgpu_bo_move(struct ttm_buffer_object
>>> *bo, bool evict,
>>>           if (old_mem->mem_type == TTM_PL_SYSTEM &&
>>>               (new_mem->mem_type == TTM_PL_TT ||
>>>                new_mem->mem_type == AMDGPU_PL_PREEMPT)) {
>>> -               amdgpu_bo_move_notify(bo, evict, new_mem);
>>>                   ttm_bo_move_null(bo, new_mem);
>>> +               amdgpu_bo_move_notify(bo, evict, new_mem);
>>>                   return 0;
>>>           }
>>>           if ((old_mem->mem_type == TTM_PL_TT || @@ -542,9 +542,9 @@
>>> static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict,
>>>                           return r;
>>>
>>>                   amdgpu_ttm_backend_unbind(bo->bdev, bo->ttm);
>>> -               amdgpu_bo_move_notify(bo, evict, new_mem);
>>>                   ttm_resource_free(bo, &bo->resource);
>>>                   ttm_bo_assign_mem(bo, new_mem);
>>> +               amdgpu_bo_move_notify(bo, evict, new_mem);
>>>                   return 0;
>>>           }
>>>
>>> @@ -557,8 +557,8 @@ static int amdgpu_bo_move(struct
>>> ttm_buffer_object *bo, bool evict,
>>>               new_mem->mem_type == AMDGPU_PL_OA ||
>>>               new_mem->mem_type == AMDGPU_PL_DOORBELL) {
>>>                   /* Nothing to save here */
>>> -               amdgpu_bo_move_notify(bo, evict, new_mem);
>>>                   ttm_bo_move_null(bo, new_mem);
>>> +               amdgpu_bo_move_notify(bo, evict, new_mem);
>>>                   return 0;
>>>           }
>>>
>>> @@ -583,11 +583,11 @@ static int amdgpu_bo_move(struct
>>> ttm_buffer_object *bo, bool evict,
>>>                   return -EMULTIHOP;
>>>           }
>>>
>>> -       amdgpu_bo_move_notify(bo, evict, new_mem);
>>>           if (adev->mman.buffer_funcs_enabled)
>>>                   r = amdgpu_move_blit(bo, evict, new_mem, old_mem);
>>>           else
>>>                   r = -ENODEV;
>>> +       amdgpu_bo_move_notify(bo, evict, new_mem);
>>>
>>>           if (r) {
>>>                   /* Check that all memory is CPU accessible */
>>> --
>>> 2.25.1



More information about the amd-gfx mailing list