[PATCH v3 0/8] drm: Introduce sparse GEM shmem

Thu Apr 10 18:52:27 UTC 2025

Am 10.04.25 um 19:20 schrieb Boris Brezillon:
> [SNIP]
>>>> Faulting is only legal when you have something like HMM, SVM or
>>>> whatever you call it. And then you can just use a plain shmem
>>>> object to provide you with backing pages.
>>>>
>>>> I mean we could in theory allow faulting on GEM objects as well,
>>>> but we would need to take very strict precautions on that we
>>>> currently don't have as far as I know.  
>>> We only use this mechanism for very specific allocations: tiler
>>> memory whose maximum size can't be guessed upfront because tile
>>> binning is by nature unpredictible.
>>>  
>>>> So could you explain how this works in the first place?  
>>> I can explain you how this works in Panthor, yes. You get an initial
>>> amount of memory that the tiler can use, when it runs out of
>>> memory, it will first ask the system for more memory, if the
>>> allocation fails, it will fallback to what they call "incremental
>>> rendering", where the already binned primitives are flushed to the
>>> FB in order to free memory, and the rendering starts over from
>>> there, with the memory that has been freed.
>>>
>>> In Panthor, this on-demand allocation scheme is something that
>>> allows us to speed-up rendering when there's memory available, but
>>> we can make progress when that's not the case, hence the failable
>>> allocation scheme I'm proposing here.  
>> Puh, that sounds like it can potentially work but is also very very
>> fragile.
>>
>> You must have a code comment when you select the GFP flags how and
>> why that works.
> +	/* We want non-blocking allocations, if we're OOM, we just fail the job
> +	 * to unblock things.
> +	 */
> +	ret = drm_gem_shmem_sparse_populate_range(&bo->base, page_offset,
> +						  NUM_FAULT_PAGES, page_gfp,
> +						  __GFP_NORETRY | __GFP_NOWARN);
>
> That's what I have right now in the failable allocation path. The
> tiler chunk allocation uses the same flags, but doesn't have a
> comment explaining that a fallback exists when the allocation fails.

We agreed to use GFP_NOWAIT for such types of allocations to at least wake up kswapd on the low water mark.

IIRC even using __GFP_NORETRY here was illegal, but I need to double check the discussions again.

>From the comment this at minimum needs an explanation what influence this has on the submission and that you can still guarantee fence forward progress.

>>> In Panfrost and Lima, we don't have this concept of "incremental
>>> rendering", so when we fail the allocation, we just fail the GPU job
>>> with an unhandled GPU fault.  
>> To be honest I think that this is enough to mark those two drivers as
>> broken.  It's documented that this approach is a no-go for upstream
>> drivers.
>>
>> How widely is that used?
> It exists in lima and panfrost, and I wouldn't be surprised if a similar
> mechanism was used in other drivers for tiler-based GPUs (etnaviv,
> freedreno, powervr, ...), because ultimately that's how tilers work:
> the amount of memory needed to store per-tile primitives (and metadata)
> depends on what the geometry pipeline feeds the tiler with, and that
> can't be predicted. If you over-provision, that's memory the system won't
> be able to use while rendering takes place, even though only a small
> portion might actually be used by the GPU. If your allocation is too
> small, it will either trigger a GPU fault (for HW not supporting an
> "incremental rendering" mode) or under-perform (because flushing
> primitives has a huge cost on tilers).
>
> Calling these drivers broken without having a plan to fix them is
> also not option.

Yeah, agree we need to have some kind of alternative. It's just that at the moment I can't think of any.

>>> And that's how it is today, the
>>> alloc-on-fault mechanism is being used in at least 3 drivers, and
>>> all I'm trying to do here is standardize it and try to document the
>>> constraints that comes with this model, constraint that are
>>> currently being ignored. Like the fact allocations in the fault
>>> handler path shouldn't block so we're guaranteed to signal the job
>>> fence in finite time, and we don't risk a deadlock between the
>>> driver shrinker and the job triggering the fault.  
>> Well on one hand it's good to that we document the pitfalls, but I
>> clearly think that we should *not* spread that into common code.
> Ack on not encouraging people to use that; but having a clear path
> describing how this should be done for HW that don't have other
> options, with helpers and extensive doc is IMHO better than letting
> new drivers reproduce the mistake that were done in the past.
> Because, if you tell people "don't do that", but don't have an
> alternative to propose, they'll end up doing it anyway.

Just to be clear: We have already completely rejected code from going upstream because of that!

And we are not talking about anything small, but rather a full blown framework and every developed by a major player.

Additional to that both i915 and amdgpu had code which used this approach as well and we reverted back because it simply doesn't work reliable.

>> That would give the impression that this actually works and to be
>> honest I should start to charge money to rejecting stuff like that.
>> It would probably get me rich.
>>
>>> I'm well aware of the implications of what I'm proposing here, but
>>> ignoring the fact some drivers already violate the rules don't make
>>> them disappear. So I'd rather work with you and others to clearly
>>> state what the alloc-in-fault-path rules are, and enforce them in
>>> some helper functions, than pretend these ugly things don't exist.
>>> :D  
>> Yeah, but this kind of says to others it's ok to do this which
>> clearly isn't as far as I can see.
> Not really. At least not if we properly review any driver that use
> these APIs, and clearly state in the doc that this is provided as a
> last resort for HW that can't do without on-fault-allocation, because
> they are designed to work this way.
>
>> What we should do instead is to add this as not ok approaches to the
>> documentation and move on.
> Well, that's what happened with panfrost, lima and panthor, and see
> where we are now: 3 drivers that you consider broken (probably
> rightfully), and more to come if we don't come up with a plan for HW
> that have the same requirements (as I said, I wouldn't be surprised
> if most tilers were relying on the same kind of on-demand-allocation).

Well we have HW features in major drivers which we simply don't use because of that.

> As always, I appreciate your valuable inputs, and would be happy to
> try anything you think might be more adapted than what is proposed
> here, but saying "this is broken HW/driver, so let's ignore it" is
> not really an option, at least if you don't want the bad design
> pattern to spread.

Yeah, I'm not sure what to do either. We *know* from experience that this approach will fail.

We have already tried that.

Regards,
Christian.

>
> Regards,
>
> Boris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20250410/fbbef121/attachment-0001.htm>