[PATCH] drm/amdkfd: add pinned BOs to kfd_bo_list

Christian König christian.koenig at amd.com
Wed Jun 1 06:21:48 UTC 2022



Am 31.05.22 um 17:22 schrieb Felix Kuehling:
> Am 2022-05-31 um 04:34 schrieb Lang Yu:
>> The kfd_bo_list is used to restore process BOs after
>> evictions. As page tables could be destroyed during
>> evictions, we should also update pinned BOs' page tables
>> during restoring to make sure they are valid.
>>
>> So for pinned BOs,
>> 1, Don't validate them, but update their page tables.
>> 2, Don't add eviction fence for them.
>>
>> Signed-off-by: Lang Yu <Lang.Yu at amd.com>
>> ---
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 39 ++++++++++---------
>>   1 file changed, 20 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 34ba9e776521..67c4bf1944d2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -1953,9 +1953,6 @@ int 
>> amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct amdgpu_device *adev,
>>           return -EINVAL;
>>       }
>>   -    /* delete kgd_mem from kfd_bo_list to avoid re-validating
>> -     * this BO in BO's restoring after eviction.
>> -     */
>>       mutex_lock(&mem->process_info->lock);
>>         ret = amdgpu_bo_reserve(bo, true);
>> @@ -1978,7 +1975,6 @@ int 
>> amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct amdgpu_device *adev,
>>         amdgpu_amdkfd_remove_eviction_fence(
>>           bo, mem->process_info->eviction_fence);
>> -    list_del_init(&mem->validate_list.head);
>>         if (size)
>>           *size = amdgpu_bo_size(bo);
>> @@ -2481,24 +2477,26 @@ int 
>> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence 
>> **ef)
>>           uint32_t domain = mem->domain;
>>           struct kfd_mem_attachment *attachment;
>>   -        total_size += amdgpu_bo_size(bo);
>> +        if (!bo->tbo.pin_count) {
>
> I think this special case is not necessary. Validating pinned BOs that 
> are already valid should have very low overhead. So adding a special 
> case to avoid that is not really worth it.

I would put this check into amdgpu_amdkfd_bo_validate(). Otherwise you 
get a nice error if a BO is pinned to VRAM and you try to validate it 
into GTT for example.

On the other hand that error might be exactly what you want in this case.

Anyway, with or without this, feel free to add an Acked-by: Christian 
König <christian.koenig at amd.com> to this patch.

Regards,
Christian.

>
> Other than that, this patch looks good to me.
>
> Regards,
>   Felix
>
>
>> +            total_size += amdgpu_bo_size(bo);
>>   -        ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
>> -        if (ret) {
>> -            pr_debug("Memory eviction: Validate BOs failed\n");
>> -            failed_size += amdgpu_bo_size(bo);
>> -            ret = amdgpu_amdkfd_bo_validate(bo,
>> -                        AMDGPU_GEM_DOMAIN_GTT, false);
>> +            ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
>> +            if (ret) {
>> +                pr_debug("Memory eviction: Validate BOs failed\n");
>> +                failed_size += amdgpu_bo_size(bo);
>> +                ret = amdgpu_amdkfd_bo_validate(bo, 
>> AMDGPU_GEM_DOMAIN_GTT, false);
>> +                if (ret) {
>> +                    pr_debug("Memory eviction: Try again\n");
>> +                    goto validate_map_fail;
>> +                }
>> +            }
>> +            ret = amdgpu_sync_fence(&sync_obj, bo->tbo.moving);
>>               if (ret) {
>> -                pr_debug("Memory eviction: Try again\n");
>> +                pr_debug("Memory eviction: Sync BO fence failed. Try 
>> again\n");
>>                   goto validate_map_fail;
>>               }
>>           }
>> -        ret = amdgpu_sync_fence(&sync_obj, bo->tbo.moving);
>> -        if (ret) {
>> -            pr_debug("Memory eviction: Sync BO fence failed. Try 
>> again\n");
>> -            goto validate_map_fail;
>> -        }
>> +
>>           list_for_each_entry(attachment, &mem->attachments, list) {
>>               if (!attachment->is_mapped)
>>                   continue;
>> @@ -2544,10 +2542,13 @@ int 
>> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence 
>> **ef)
>>         /* Attach new eviction fence to all BOs */
>>       list_for_each_entry(mem, &process_info->kfd_bo_list,
>> -        validate_list.head)
>> +        validate_list.head) {
>> +        if (mem->bo->tbo.pin_count)
>> +            continue;
>> +
>>           amdgpu_bo_fence(mem->bo,
>>               &process_info->eviction_fence->base, true);
>> -
>> +    }
>>       /* Attach eviction fence to PD / PT BOs */
>>       list_for_each_entry(peer_vm, &process_info->vm_list_head,
>>                   vm_list_node) {



More information about the amd-gfx mailing list