[PATCH v3 0/8] drm: Introduce sparse GEM shmem

Thu Apr 10 17:20:54 UTC 2025

Hi Christian,

On Thu, 10 Apr 2025 18:43:29 +0200
Christian König <christian.koenig at amd.com> wrote:

> Hi Boris,
> 
> Am 10.04.25 um 17:53 schrieb Boris Brezillon:
> > Hi Christian,
> >
> > On Thu, 10 Apr 2025 17:05:07 +0200
> > Christian König <christian.koenig at amd.com> wrote:
> >  
> >> Hi Boris,
> >>
> >> thanks for looping me in. Can you also send the full patch set to
> >> me since I don't see that on the mailing list (yet maybe).
> >>
> >> Am 10.04.25 um 16:48 schrieb Boris Brezillon:  
> >>> +Christian, Alyssa and Faith, as suggested by Sima. I'll make
> >>> sure to Cc you on v4, but before that, I'd like to get your
> >>> opinion on the drm-gem/drm-gem-shmem changes to see if sending a
> >>> v4 is actually desirable or if I should go back to the drawing
> >>> board.
> >>>
> >>> On Fri,  4 Apr 2025 11:26:26 +0200
> >>> Boris Brezillon <boris.brezillon at collabora.com> wrote:
> >>>    
> >>>> This patch series is a proposal for implementing sparse page
> >>>> allocations for shmem objects. It was initially motivated by a
> >>>> kind of BO managed by the Panfrost/Panthor and Lima drivers, the
> >>>> tiler heap, which grows on demand every time the GPU faults on a
> >>>> virtual address within the heap BO range.    
> >> Oh, wait a second! GPU faults and DMA fences are usually
> >> fundamentally incompatible.
> >>
> >> So stuff like filling in GEM objects on demand like you suggest
> >> here is usually seen as illegal. All resources must be pre-pinned
> >> before the submission.  
> > Unfortunately, that's already how it's done in lima, panfrost and
> > panthor.
> >  
> >> Faulting is only legal when you have something like HMM, SVM or
> >> whatever you call it. And then you can just use a plain shmem
> >> object to provide you with backing pages.
> >>
> >> I mean we could in theory allow faulting on GEM objects as well,
> >> but we would need to take very strict precautions on that we
> >> currently don't have as far as I know.  
> > We only use this mechanism for very specific allocations: tiler
> > memory whose maximum size can't be guessed upfront because tile
> > binning is by nature unpredictible.
> >  
> >> So could you explain how this works in the first place?  
> > I can explain you how this works in Panthor, yes. You get an initial
> > amount of memory that the tiler can use, when it runs out of
> > memory, it will first ask the system for more memory, if the
> > allocation fails, it will fallback to what they call "incremental
> > rendering", where the already binned primitives are flushed to the
> > FB in order to free memory, and the rendering starts over from
> > there, with the memory that has been freed.
> >
> > In Panthor, this on-demand allocation scheme is something that
> > allows us to speed-up rendering when there's memory available, but
> > we can make progress when that's not the case, hence the failable
> > allocation scheme I'm proposing here.  
> 
> Puh, that sounds like it can potentially work but is also very very
> fragile.
> 
> You must have a code comment when you select the GFP flags how and
> why that works.

+	/* We want non-blocking allocations, if we're OOM, we just fail the job
+	 * to unblock things.
+	 */
+	ret = drm_gem_shmem_sparse_populate_range(&bo->base, page_offset,
+						  NUM_FAULT_PAGES, page_gfp,
+						  __GFP_NORETRY | __GFP_NOWARN);

That's what I have right now in the failable allocation path. The
tiler chunk allocation uses the same flags, but doesn't have a
comment explaining that a fallback exists when the allocation fails.

> 
> > In Panfrost and Lima, we don't have this concept of "incremental
> > rendering", so when we fail the allocation, we just fail the GPU job
> > with an unhandled GPU fault.  
> 
> To be honest I think that this is enough to mark those two drivers as
> broken.  It's documented that this approach is a no-go for upstream
> drivers.
> 
> How widely is that used?

It exists in lima and panfrost, and I wouldn't be surprised if a similar
mechanism was used in other drivers for tiler-based GPUs (etnaviv,
freedreno, powervr, ...), because ultimately that's how tilers work:
the amount of memory needed to store per-tile primitives (and metadata)
depends on what the geometry pipeline feeds the tiler with, and that
can't be predicted. If you over-provision, that's memory the system won't
be able to use while rendering takes place, even though only a small
portion might actually be used by the GPU. If your allocation is too
small, it will either trigger a GPU fault (for HW not supporting an
"incremental rendering" mode) or under-perform (because flushing
primitives has a huge cost on tilers).

Calling these drivers broken without having a plan to fix them is
also not option.

> 
> > And that's how it is today, the
> > alloc-on-fault mechanism is being used in at least 3 drivers, and
> > all I'm trying to do here is standardize it and try to document the
> > constraints that comes with this model, constraint that are
> > currently being ignored. Like the fact allocations in the fault
> > handler path shouldn't block so we're guaranteed to signal the job
> > fence in finite time, and we don't risk a deadlock between the
> > driver shrinker and the job triggering the fault.  
> 
> Well on one hand it's good to that we document the pitfalls, but I
> clearly think that we should *not* spread that into common code.

Ack on not encouraging people to use that; but having a clear path
describing how this should be done for HW that don't have other
options, with helpers and extensive doc is IMHO better than letting
new drivers reproduce the mistake that were done in the past.
Because, if you tell people "don't do that", but don't have an
alternative to propose, they'll end up doing it anyway.

> 
> That would give the impression that this actually works and to be
> honest I should start to charge money to rejecting stuff like that.
> It would probably get me rich.
> 
> > I'm well aware of the implications of what I'm proposing here, but
> > ignoring the fact some drivers already violate the rules don't make
> > them disappear. So I'd rather work with you and others to clearly
> > state what the alloc-in-fault-path rules are, and enforce them in
> > some helper functions, than pretend these ugly things don't exist.
> > :D  
> 
> Yeah, but this kind of says to others it's ok to do this which
> clearly isn't as far as I can see.

Not really. At least not if we properly review any driver that use
these APIs, and clearly state in the doc that this is provided as a
last resort for HW that can't do without on-fault-allocation, because
they are designed to work this way.

> 
> What we should do instead is to add this as not ok approaches to the
> documentation and move on.

Well, that's what happened with panfrost, lima and panthor, and see
where we are now: 3 drivers that you consider broken (probably
rightfully), and more to come if we don't come up with a plan for HW
that have the same requirements (as I said, I wouldn't be surprised
if most tilers were relying on the same kind of on-demand-allocation).

As always, I appreciate your valuable inputs, and would be happy to
try anything you think might be more adapted than what is proposed
here, but saying "this is broken HW/driver, so let's ignore it" is
not really an option, at least if you don't want the bad design
pattern to spread.

Regards,

Boris