[PATCH v3 0/8] drm: Introduce sparse GEM shmem

Fri Apr 11 10:55:57 UTC 2025

Am 11.04.25 um 10:38 schrieb Boris Brezillon:
> On Fri, 11 Apr 2025 10:04:07 +0200
> Christian König <christian.koenig at amd.com> wrote:
>
>> Am 10.04.25 um 20:41 schrieb Boris Brezillon:
>>> On Thu, 10 Apr 2025 14:01:03 -0400
>>> Alyssa Rosenzweig <alyssa at rosenzweig.io> wrote:
>>>  
>>>>>>> In Panfrost and Lima, we don't have this concept of "incremental
>>>>>>> rendering", so when we fail the allocation, we just fail the
>>>>>>> GPU job with an unhandled GPU fault.      
>>>>>> To be honest I think that this is enough to mark those two
>>>>>> drivers as broken.  It's documented that this approach is a
>>>>>> no-go for upstream drivers.
>>>>>>
>>>>>> How widely is that used?    
>>>>> It exists in lima and panfrost, and I wouldn't be surprised if a
>>>>> similar mechanism was used in other drivers for tiler-based GPUs
>>>>> (etnaviv, freedreno, powervr, ...), because ultimately that's how
>>>>> tilers work: the amount of memory needed to store per-tile
>>>>> primitives (and metadata) depends on what the geometry pipeline
>>>>> feeds the tiler with, and that can't be predicted. If you
>>>>> over-provision, that's memory the system won't be able to use
>>>>> while rendering takes place, even though only a small portion
>>>>> might actually be used by the GPU. If your allocation is too
>>>>> small, it will either trigger a GPU fault (for HW not supporting
>>>>> an "incremental rendering" mode) or under-perform (because
>>>>> flushing primitives has a huge cost on tilers).    
>>>> Yes and no.
>>>>
>>>> Although we can't allocate more memory for /this/ frame, we know
>>>> the required size is probably constant across its lifetime. That
>>>> gives a simple heuristic to manage the tiler heap efficiently
>>>> without allocations - even fallible ones - in the fence signal
>>>> path:
>>>>
>>>> * Start with a small fixed size tiler heap
>>>> * Try to render, let incremental rendering kick in when it's too
>>>> small.
>>>> * When cleaning up the job, check if we used incremental rendering.
>>>> * If we did - double the size of the heap the next time we submit
>>>> work.
>>>>
>>>> The tiler heap still grows dynamically - it just does so over the
>>>> span of a couple frames. In practice that means a tiny hit to
>>>> startup time as we dynamically figure out the right size,
>>>> incurring extra flushing at the start, without needing any
>>>> "grow-on-page-fault" heroics.
>>>>
>>>> This should solve the problem completely for CSF/panthor. So it's
>>>> only hardware that architecturally cannot do incremental rendering
>>>> (older Mali: panfrost/lima) where we need this mess.  
>>> OTOH, if we need something
>>> for Utgard(Lima)/Midgard/Bifrost/Valhall(Panfrost), why not use the
>>> same thing for CSF, since CSF is arguably the sanest of all the HW
>>> architectures listed above: allocation can fail/be non-blocking,
>>> because there's a fallback to incremental rendering when it fails.  
>> Yeah that is a rather interesting point Alyssa noted here.
>>
>> So basically you could as well implement it like this:
>> 1. Userspace makes a submission.
>> 2. HW finds buffer is not large enough, sets and error code and
>> completes submission. 3. Userspace detects error, re-allocates buffer
>> with increased size. 4. Userspace re-submits to incremental complete
>> the submission. 5. Eventually repeat until fully completed.
>>
>> That would work but is likely just not the most performant solution.
>> So faulting in memory on demand is basically just an optimization and
>> that is ok as far as I can see.
> Yeah, Alyssa's suggestion got me thinking too, and I think I can come
> up with a plan where we try non-blocking allocation first, and if it
> fails, we trigger incremental rendering, and queue a blocking
> heap-chunk allocation on separate workqueue, such that next time the
> tiler heap hits an OOM, it has a chunk (or multiple chunks) readily
> available if the blocking allocation completed in the meantime. That's
> basically what Alyssa suggested, with an optimization if the system is
> not under memory pressure, and without userspace being involved (so no
> uAPI changes).

That sounds like it most likely won't work. In an OOM situation the blocking allocation would just cause more pressure to complete your rendering to free up memory.

> I guess this leaves older GPUs that don't support incremental rendering
> in a bad place though.

Well what's the handling there currently? Just crash when you're OOM?

Regards,
Christian.

>
>> That is then a rather good justification for your work Boris. Because
>> a common component makes it possible to implement a common fault
>> injection functionality to make sure that the fallback path is
>> properly exercised in testing.
> I can also add an fault injection mechanism to validate that, yep.
>
> Thanks,
>
> Boris