[PATCH 3/3] drm/amdgpu: Fix pinned GART area accounting and fdinfo reporting

Mon Apr 29 13:45:56 UTC 2024

On 29/04/2024 12:11, Christian König wrote:
> Am 29.04.24 um 11:43 schrieb Tvrtko Ursulin:
>>
>> On 26/04/2024 23:24, Felix Kuehling wrote:
>>>
>>> On 2024-04-26 12:43, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>>>
>>>> When commit b453e42a6e8b ("drm/amdgpu: Add new placement for 
>>>> preemptible
>>>> SG BOs") added a new TTM region it missed to notice the conceptual
>>>> imbalance in GART pin size accounting as done in amdgpu_bo_pin/unpin.
>>>>
>>>> That imbalance leads to such objects getting accounted against the
>>>> resource, but are not un-accounted when unpinned.
>>>
>>> AMDGPU_PL_PREEMPT is mostly used for userptr BOs, which cannot be 
>>> pinned. In any case you should make sure that the accounting is 
>>> consistent between amdgpu_bo_pin_restricted and amdgpu_bo_unpin. This 
>>> patch breaks that consistency.
>>
>> You mean amdgpu_bo_pin(_restricted) and amdgpu_bo_unpin do not run for 
>> such objects, or something else?
>>
>> If they run, then at the end of pin there is:
>>
>>     domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
>> ...
>>     } else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
>>         atomic64_add(amdgpu_bo_size(bo), &adev->gart_pin_size);
>>
>> And unpin has no handling for AMDGPU_PL_PREEMPT.
>>
>> Ah I see.. does it rely on amdgpu_mem_type_to_domain returning 0 for 
>> AMDGPU_PL_PREEMPT? My confusion was I misread the pinning check as 
>> checking the domain as stored in the bo at creation time.
>>
>> Although I am still confused by the statement userptr BOs are not 
>> pinned. It is not needed to map them via GART on AMD hardware for GPU 
>> to be able to access them?
> 
> No, a GART mapping is only needed if you want to scanout from them or 
> otherwise use them from the kernel on the GPU.
> 
> Background is that the kernel doesn't has VM with page tables..

Got it, thanks!

Presumably somewhere else in the code then it is prevented to call 
pin/unpin on those?

What to do, if anything, with the attempt to address the asymmetry in 
the accounting criteria between the pin and unpin?

I mean domain based on pin:

	domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
	if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
		atomic64_add(amdgpu_bo_size(bo), &adev->vram_pin_size);
		atomic64_add(amdgpu_vram_mgr_bo_visible_size(bo),
			     &adev->visible_pin_size);
	} else if (domain == AMDGPU_GEM_DOMAIN_GTT) {
		atomic64_add(amdgpu_bo_size(bo), &adev->gart_pin_size);
	}

Versus placement based on unpin:

	if (bo->tbo.resource->mem_type == TTM_PL_VRAM) {
		atomic64_sub(amdgpu_bo_size(bo), &adev->vram_pin_size);
		atomic64_sub(amdgpu_vram_mgr_bo_visible_size(bo),
			     &adev->visible_pin_size);
	} else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
		atomic64_sub(amdgpu_bo_size(bo), &adev->gart_pin_size);
	}

The fact amdgpu_mem_type_to_domain never translates back to 
AMDGPU_PL_PREEMPT means there is indeed currently no bug.

Is 2/3 still desirable to convert the check in pin to me mem_type based?

>>>> Fix by extending the accounting criteria in amdgpu_bo_unpin.
>>>>
>>>> What also aappears needs fixing is not reporting their size from the
>>>> amdgpu_bo_get_memory, which is used to implement fdinfo stats, so 
>>>> they are
>>>> not mixed with the regular userspace created and driver owned objects.
>>>
>>> I think that's true. It's a very fine distinction. AMDGPU_PL_PREEMPT 
>>> does use system memory and it is GPU accessible, just like GTT. The 
>>> only difference is, that it's not subject to the GTT limits because 
>>> their eviction is handled by callbacks other than TTM evictions and 
>>> doesn't need to wait for fences.
>>
>> As in you think those two hunks of the patch are correct?
> 
> I think so as well, yes. But we still need a name for preemptible BOs 
> while printing them in debugfs.

Currently it looks the name is 'CPU':

amdgpu_bo_print_info()
...
		case AMDGPU_GEM_DOMAIN_CPU:
		default:
			placement = "CPU";
			break;

Also, where to account them in struct amdgpu_mem_stats?

Regards,

Tvrtko

> 
> Regards,
> Christian.
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>
>>> Regards,
>>>    Felix
>>>
>>>
>>>>
>>>> And also amdgpu_bo_print_info for debugfs reporting.
>>>>
>>>> Note that the patch depends on the previous one which broke down the
>>>> relevant checks from the domain based to placement based.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>>> Fixes: b453e42a6e8b ("drm/amdgpu: Add new placement for preemptible 
>>>> SG BOs")
>>>> Cc: Felix Kuehling <Felix.Kuehling at amd.com>
>>>> Cc: Christian König <christian.koenig at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++---
>>>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> index fb984669fc3a..5a2bbc793953 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> @@ -1032,7 +1032,8 @@ void amdgpu_bo_unpin(struct amdgpu_bo *bo)
>>>>           atomic64_sub(amdgpu_bo_size(bo), &adev->vram_pin_size);
>>>>           atomic64_sub(amdgpu_vram_mgr_bo_visible_size(bo),
>>>>                    &adev->visible_pin_size);
>>>> -    } else if (bo->tbo.resource->mem_type == TTM_PL_TT) {
>>>> +    } else if (bo->tbo.resource->mem_type == TTM_PL_TT ||
>>>> +           bo->tbo.resource->mem_type == AMDGPU_PL_PREEMPT) {
>>>>           atomic64_sub(amdgpu_bo_size(bo), &adev->gart_pin_size);
>>>>       }
>>>> @@ -1298,7 +1299,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
>>>>               stats->vram_shared += size;
>>>>           break;
>>>>       case TTM_PL_TT:
>>>> -    case AMDGPU_PL_PREEMPT:
>>>>           stats->gtt += size;
>>>>           if (shared)
>>>>               stats->gtt_shared += size;
>>>> @@ -1599,7 +1599,6 @@ u64 amdgpu_bo_print_info(int id, struct 
>>>> amdgpu_bo *bo, struct seq_file *m)
>>>>                   placement = "VRAM";
>>>>               break;
>>>>           case TTM_PL_TT:
>>>> -        case AMDGPU_PL_PREEMPT:
>>>>               placement = "GTT";
>>>>               break;
>>>>           case TTM_PL_SYSTEM:
>