[PATCH v3 0/8] drm: Introduce sparse GEM shmem

Fri Apr 11 08:04:07 UTC 2025

Am 10.04.25 um 20:41 schrieb Boris Brezillon:
> On Thu, 10 Apr 2025 14:01:03 -0400
> Alyssa Rosenzweig <alyssa at rosenzweig.io> wrote:
>
>>>>> In Panfrost and Lima, we don't have this concept of "incremental
>>>>> rendering", so when we fail the allocation, we just fail the GPU job
>>>>> with an unhandled GPU fault.    
>>>> To be honest I think that this is enough to mark those two drivers as
>>>> broken.  It's documented that this approach is a no-go for upstream
>>>> drivers.
>>>>
>>>> How widely is that used?  
>>> It exists in lima and panfrost, and I wouldn't be surprised if a similar
>>> mechanism was used in other drivers for tiler-based GPUs (etnaviv,
>>> freedreno, powervr, ...), because ultimately that's how tilers work:
>>> the amount of memory needed to store per-tile primitives (and metadata)
>>> depends on what the geometry pipeline feeds the tiler with, and that
>>> can't be predicted. If you over-provision, that's memory the system won't
>>> be able to use while rendering takes place, even though only a small
>>> portion might actually be used by the GPU. If your allocation is too
>>> small, it will either trigger a GPU fault (for HW not supporting an
>>> "incremental rendering" mode) or under-perform (because flushing
>>> primitives has a huge cost on tilers).  
>> Yes and no.
>>
>> Although we can't allocate more memory for /this/ frame, we know the
>> required size is probably constant across its lifetime. That gives a
>> simple heuristic to manage the tiler heap efficiently without
>> allocations - even fallible ones - in the fence signal path:
>>
>> * Start with a small fixed size tiler heap
>> * Try to render, let incremental rendering kick in when it's too small.
>> * When cleaning up the job, check if we used incremental rendering.
>> * If we did - double the size of the heap the next time we submit work.
>>
>> The tiler heap still grows dynamically - it just does so over the span
>> of a couple frames. In practice that means a tiny hit to startup time as
>> we dynamically figure out the right size, incurring extra flushing at
>> the start, without needing any "grow-on-page-fault" heroics.
>>
>> This should solve the problem completely for CSF/panthor. So it's only
>> hardware that architecturally cannot do incremental rendering (older
>> Mali: panfrost/lima) where we need this mess.
> OTOH, if we need something
> for Utgard(Lima)/Midgard/Bifrost/Valhall(Panfrost), why not use the same
> thing for CSF, since CSF is arguably the sanest of all the HW
> architectures listed above: allocation can fail/be non-blocking,
> because there's a fallback to incremental rendering when it fails.

Yeah that is a rather interesting point Alyssa noted here.

So basically you could as well implement it like this:
1. Userspace makes a submission.
2. HW finds buffer is not large enough, sets and error code and completes submission.
3. Userspace detects error, re-allocates buffer with increased size.
4. Userspace re-submits to incremental complete the submission.
5. Eventually repeat until fully completed.

That would work but is likely just not the most performant solution. So faulting in memory on demand is basically just an optimization and that is ok as far as I can see.

That is then a rather good justification for your work Boris. Because a common component makes it possible to implement a common fault injection functionality to make sure that the fallback path is properly exercised in testing.

Regards,
Christian.