[PATCH] drm/etnaviv: Print error message if inserting IOVA address range fails

Tue Oct 1 19:26:46 UTC 2024

Hi,

On 2024/10/1 16:27, Lucas Stach wrote:
> Hi Sui,
>
> Am Dienstag, dem 01.10.2024 um 06:17 +0800 schrieb Sui Jingfeng:
>> Etnaviv assumes that GPU page size is 4KiB, yet on some systems, the CPU
>> page size is 16 KiB. The size of etnaviv buffer objects will be aligned
>> to CPU page size on kernel side, however, userspace still assumes the
>> page size is 4KiB and doing allocation with 4KiB page as unit. This
>> results in softpin(userspace managed per-process address spaces) fails.
>> Because kernel side BO takes up bigger address space than user space
>> assumes whenever the size of a BO is not CPU page size aligned.
>>
> Seems we need to track the GPU and CPU allocation sizes separately.

The idea is cool and fancy, I have been tried.

By adding a 'user_size' member into the struct etnaviv_gem_object,
and use this 'user_size'; to track the actual size that user-space
thing of. (or in other words, the actual size that potential user
allow to use)

Using 'user_size' is pin, this partly solve VA address space collision
under softpin fashion. This is partly works under my hasty test. But ...

> Userspace is correct in assuming that the GPU page size is 4K and
> buffers are aligned to this granule.

Vivante GPU support 4KB and 64KB GPU page size.

>   There should be no need to waste GPU VA space

We have nearly 4GBGPU VA space, As far as I can see it, we only use a few. So, is it true 
that we are wealthy about the VA space?

> just because the CPU page size is larger than that and we
> need to overallocate buffers to suit the CPU.

A single CPU page share the same caching property, therefore, I image that
asingle VA address  range at least should occupy entire room of a single CPU
page.

Otherwise, it possible that 4 GPUVA share a single CPU page.
if each GPUVA  mapped with a different caching property from others.
This get coherency requirements involved.

>> Insert an error message to help debug when such an issue happen.
>>
>> Signed-off-by: Sui Jingfeng <sui.jingfeng at linux.dev>
>> ---
>> For example, when running glmark2-drm:
>>
>> [kernel space debug log]
>>
>>   etnaviv 0000:03:00.0: Insert bo failed, va: fd38b000, size: 4000
>>   etnaviv 0000:03:00.0: Insert bo failed, va: fd38a000, size: 4000
>>
>> [user space debug log]
>>
>> bo->va = 0xfd48c000, bo->size=100000
>> bo->va = 0xfd38c000, bo->size=100000
>> bo->va = 0xfd38b000, bo->size=1000   <-- Insert IOVA fails started at here.
>> bo->va = 0xfd38a000, bo->size=1000
>> bo->va = 0xfd389000, bo->size=1000
>>
>> [texture] texture-filter=nearest:MESA: error: etna_cmd_stream_flush:238: submit failed: -28 (No space left on device)
>> ---
>>   drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
>> index 1661d589bf3e..682f27b27d59 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c
>> @@ -310,8 +310,12 @@ int etnaviv_iommu_map_gem(struct etnaviv_iommu_context *context,
>>   	else
>>   		ret = etnaviv_iommu_find_iova(context, node,
>>   					      etnaviv_obj->base.size);
>> -	if (ret < 0)
>> +	if (ret < 0) {
>> +		dev_err(context->global->dev,
>> +			"Insert iova failed, va: %llx, size: %zx\n",
>> +			va, etnaviv_obj->base.size);
> As this might happen for a lot of buffers in a single submit and
> userspace might be unimpressed by the submit failure and keep pushing
> new submits, this has a potential to spam the logs. Please use
> dev_err_ratelimited. Other than that, this patch looks good.
>
> Regards,
> Lucas
>
>>   		goto unlock;
>> +	}
>>   
>>   	mapping->iova = node->start;
>>   	ret = etnaviv_iommu_map(context, node->start, sgt,

-- 
Best regards,
Sui