[PATCH v3 0/8] drm: Introduce sparse GEM shmem

Fri Apr 11 18:24:33 UTC 2025

On Fri, Apr 11, 2025 at 12:55:57PM +0200, Christian König wrote:
> Am 11.04.25 um 10:38 schrieb Boris Brezillon:
> > On Fri, 11 Apr 2025 10:04:07 +0200
> > Christian König <christian.koenig at amd.com> wrote:
> >
> >> Am 10.04.25 um 20:41 schrieb Boris Brezillon:
> >>> On Thu, 10 Apr 2025 14:01:03 -0400
> >>> Alyssa Rosenzweig <alyssa at rosenzweig.io> wrote:
> >>>  
> >>>>>>> In Panfrost and Lima, we don't have this concept of "incremental
> >>>>>>> rendering", so when we fail the allocation, we just fail the
> >>>>>>> GPU job with an unhandled GPU fault.      
> >>>>>> To be honest I think that this is enough to mark those two
> >>>>>> drivers as broken.  It's documented that this approach is a
> >>>>>> no-go for upstream drivers.
> >>>>>>
> >>>>>> How widely is that used?    
> >>>>> It exists in lima and panfrost, and I wouldn't be surprised if a
> >>>>> similar mechanism was used in other drivers for tiler-based GPUs
> >>>>> (etnaviv, freedreno, powervr, ...), because ultimately that's how
> >>>>> tilers work: the amount of memory needed to store per-tile
> >>>>> primitives (and metadata) depends on what the geometry pipeline
> >>>>> feeds the tiler with, and that can't be predicted. If you
> >>>>> over-provision, that's memory the system won't be able to use
> >>>>> while rendering takes place, even though only a small portion
> >>>>> might actually be used by the GPU. If your allocation is too
> >>>>> small, it will either trigger a GPU fault (for HW not supporting
> >>>>> an "incremental rendering" mode) or under-perform (because
> >>>>> flushing primitives has a huge cost on tilers).    
> >>>> Yes and no.
> >>>>
> >>>> Although we can't allocate more memory for /this/ frame, we know
> >>>> the required size is probably constant across its lifetime. That
> >>>> gives a simple heuristic to manage the tiler heap efficiently
> >>>> without allocations - even fallible ones - in the fence signal
> >>>> path:
> >>>>
> >>>> * Start with a small fixed size tiler heap
> >>>> * Try to render, let incremental rendering kick in when it's too
> >>>> small.
> >>>> * When cleaning up the job, check if we used incremental rendering.
> >>>> * If we did - double the size of the heap the next time we submit
> >>>> work.
> >>>>
> >>>> The tiler heap still grows dynamically - it just does so over the
> >>>> span of a couple frames. In practice that means a tiny hit to
> >>>> startup time as we dynamically figure out the right size,
> >>>> incurring extra flushing at the start, without needing any
> >>>> "grow-on-page-fault" heroics.
> >>>>
> >>>> This should solve the problem completely for CSF/panthor. So it's
> >>>> only hardware that architecturally cannot do incremental rendering
> >>>> (older Mali: panfrost/lima) where we need this mess.  
> >>> OTOH, if we need something
> >>> for Utgard(Lima)/Midgard/Bifrost/Valhall(Panfrost), why not use the
> >>> same thing for CSF, since CSF is arguably the sanest of all the HW
> >>> architectures listed above: allocation can fail/be non-blocking,
> >>> because there's a fallback to incremental rendering when it fails.  
> >> Yeah that is a rather interesting point Alyssa noted here.
> >>
> >> So basically you could as well implement it like this:
> >> 1. Userspace makes a submission.
> >> 2. HW finds buffer is not large enough, sets and error code and
> >> completes submission. 3. Userspace detects error, re-allocates buffer
> >> with increased size. 4. Userspace re-submits to incremental complete
> >> the submission. 5. Eventually repeat until fully completed.
> >>
> >> That would work but is likely just not the most performant solution.
> >> So faulting in memory on demand is basically just an optimization and
> >> that is ok as far as I can see.
> > Yeah, Alyssa's suggestion got me thinking too, and I think I can come
> > up with a plan where we try non-blocking allocation first, and if it
> > fails, we trigger incremental rendering, and queue a blocking
> > heap-chunk allocation on separate workqueue, such that next time the
> > tiler heap hits an OOM, it has a chunk (or multiple chunks) readily
> > available if the blocking allocation completed in the meantime. That's
> > basically what Alyssa suggested, with an optimization if the system is
> > not under memory pressure, and without userspace being involved (so no
> > uAPI changes).

Please no background task that tries to find memory, you're just
reinventing the background kswapd shrinking. Which even a GFP_NORECLAIM
should kick off.

Instead just rely on kswapd to hopefully get you through the current job
without undue amounts of tiler flushing.

Then just grow dynamic memory synchronously with some heuristics in the
next CS ioctl, that gives you appropriate amounts of throttling, no issues
with error reporting, and you can just use GFP_KERNEL.

> That sounds like it most likely won't work. In an OOM situation the
> blocking allocation would just cause more pressure to complete your
> rendering to free up memory.

The issue isn't oom, the issue is that GFP_RECLAIM can't even throw out
clean caches and so your limited to watermarks and what kswapd manages to
clean out in parallel.

Real OOM is much, much nastier, and there you should just get an ENONMEM
from the CS ioctl. Ideally at least, because that gives throttling and all
that nice stuff for free (for some value of "nice", lots of folks really
despise the stalls that introduces).

Cheers, Sima

> > I guess this leaves older GPUs that don't support incremental rendering
> > in a bad place though.
> 
> Well what's the handling there currently? Just crash when you're OOM?
> 
> Regards,
> Christian.
> 
> >
> >> That is then a rather good justification for your work Boris. Because
> >> a common component makes it possible to implement a common fault
> >> injection functionality to make sure that the fallback path is
> >> properly exercised in testing.
> > I can also add an fault injection mechanism to validate that, yep.
> >
> > Thanks,
> >
> > Boris
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch