[PATCH v3 0/8] drm: Introduce sparse GEM shmem

Mon Apr 14 15:34:35 UTC 2025

On 14/04/2025 13:47, Boris Brezillon wrote:
> On Fri, 11 Apr 2025 16:39:02 +0200
> Boris Brezillon <boris.brezillon at collabora.com> wrote:
> 
>> On Fri, 11 Apr 2025 15:13:26 +0200
>> Christian König <christian.koenig at amd.com> wrote:
>>
>>>>    
>>>>> Background is that you don't get a crash, nor error message, nor
>>>>> anything indicating what is happening.    
>>>> The job times out at some point, but we might get stuck in the fault
>>>> handler waiting for memory, which is pretty close to a deadlock, I
>>>> suspect.    
>>>
>>> I don't know those drivers that well, but at least for amdgpu the
>>> problem would be that the timeout handling would need to grab some of
>>> the locks the memory management is holding waiting for the timeout
>>> handling to do something....
>>>
>>> So that basically perfectly closes the circle. With a bit of lock you
>>> get a message after some time that the kernel is stuck, but since
>>> that are all sleeping locks I strongly doubt so.
>>>
>>> As immediately action please provide patches which changes those
>>> GFP_KERNEL into GFP_NOWAIT.  
>>
>> Sure, I can do that.
> 
> Hm, I might have been too prompt at claiming this was doable. In
> practice, doing that might regress Lima and Panfrost in situations
> where trying harder than GFP_NOWAIT would free up some memory. Not
> saying this was right to use GFP_KERNEL in the first place, but some
> expectations were set by this original mistake, so I'll probably need
> Lima developers to vouch in for this change after they've done some
> testing on a system under high memory pressure, and I'd need to do the
> same kind of testing for Panfrost and ask Steve if he's okay with that
> too.

It's a tricky one. The ideal would be to teach user space how to recover
from the memory allocation failure (even on older GPUs it's still in
theory possible to break up the work and do incremental rendering). But
that ship sailed long ago so this would be a regression.

I did consider GFP_ATOMIC as an option, but really we shouldn't be
accessing "memory reserves" in such a situation.

For background: In the "kbase" driver (Linux kernel driver for the
closed source user space 'DDK' driver for Midgard/Bifrost GPUs) we had a
"JIT memory allocator". The idea here was that if you have multiple
renderings in flight you can attempt to share the tiler heap memory
between them. So in this case when we can't allocate more memory and we
know there's another tiler heap which is going to be freed by a fragment
job that's already running, we can block knowing the memory is going to
become available.

It was designed to do the same thing as CSF's incremental rendering -
allow us to opportunistically allocate memory but not fail the rendering
if it wasn't available.

But it was a nightmare to have any confidence of it being deadlock free
and the implementation was frankly terrible - which is ultimately why
CSF GPU's have this ability in hardware to perform incremental rendering
without failing the job. But effectively this approach requires
allocating just enough memory for one complete tiler heap while ensuring
forward progress and opportunistically allowing extra memory to give a
performance boost.

TLDR; I think we should try switching to GFP_NOWAIT in Panfrost and do
some testing with memory pressure. It might be acceptable (and an
occasional job failure is better than an occasional lock up). If it
turns out it's too easy to trigger job failures like this then we'll
need to rethink.

> For Panthor, I'm less worried, because we have the incremental rendering
> fallback, and assuming GFP_NOWAIT tries hard enough to reclaim
> low-hanging fruits, the perfs shouldn't suffer much more than they
> would today with GFP_KERNEL allocations potentially delaying tiling
> operations longer than would have been with a primitive flush.

Yes for Panthor I think the approach is obvious - I believe GFP_NOWAIT
should trigger background reclaim, so over the course of a few frames it
should make the memory available (assuming there is sufficient 'free'
memory). Indeed it might even give a performance boost if it stops us
getting blocked in direct reclaim.

Thanks,
Steve