[PATCH v3 0/8] drm: Introduce sparse GEM shmem

Thu Apr 10 18:41:55 UTC 2025

On Thu, 10 Apr 2025 14:01:03 -0400
Alyssa Rosenzweig <alyssa at rosenzweig.io> wrote:

> > > > In Panfrost and Lima, we don't have this concept of "incremental
> > > > rendering", so when we fail the allocation, we just fail the GPU job
> > > > with an unhandled GPU fault.    
> > > 
> > > To be honest I think that this is enough to mark those two drivers as
> > > broken.  It's documented that this approach is a no-go for upstream
> > > drivers.
> > > 
> > > How widely is that used?  
> > 
> > It exists in lima and panfrost, and I wouldn't be surprised if a similar
> > mechanism was used in other drivers for tiler-based GPUs (etnaviv,
> > freedreno, powervr, ...), because ultimately that's how tilers work:
> > the amount of memory needed to store per-tile primitives (and metadata)
> > depends on what the geometry pipeline feeds the tiler with, and that
> > can't be predicted. If you over-provision, that's memory the system won't
> > be able to use while rendering takes place, even though only a small
> > portion might actually be used by the GPU. If your allocation is too
> > small, it will either trigger a GPU fault (for HW not supporting an
> > "incremental rendering" mode) or under-perform (because flushing
> > primitives has a huge cost on tilers).  
> 
> Yes and no.
> 
> Although we can't allocate more memory for /this/ frame, we know the
> required size is probably constant across its lifetime. That gives a
> simple heuristic to manage the tiler heap efficiently without
> allocations - even fallible ones - in the fence signal path:
> 
> * Start with a small fixed size tiler heap
> * Try to render, let incremental rendering kick in when it's too small.
> * When cleaning up the job, check if we used incremental rendering.
> * If we did - double the size of the heap the next time we submit work.
> 
> The tiler heap still grows dynamically - it just does so over the span
> of a couple frames. In practice that means a tiny hit to startup time as
> we dynamically figure out the right size, incurring extra flushing at
> the start, without needing any "grow-on-page-fault" heroics.
> 
> This should solve the problem completely for CSF/panthor. So it's only
> hardware that architecturally cannot do incremental rendering (older
> Mali: panfrost/lima) where we need this mess.

OTOH, if we need something
for Utgard(Lima)/Midgard/Bifrost/Valhall(Panfrost), why not use the same
thing for CSF, since CSF is arguably the sanest of all the HW
architectures listed above: allocation can fail/be non-blocking,
because there's a fallback to incremental rendering when it fails.