[PATCH v3 0/8] drm: Introduce sparse GEM shmem

Boris Brezillon boris.brezillon at collabora.com
Mon Apr 14 12:47:14 UTC 2025


On Fri, 11 Apr 2025 16:39:02 +0200
Boris Brezillon <boris.brezillon at collabora.com> wrote:

> On Fri, 11 Apr 2025 15:13:26 +0200
> Christian König <christian.koenig at amd.com> wrote:
> 
> > >    
> > >> Background is that you don't get a crash, nor error message, nor
> > >> anything indicating what is happening.    
> > > The job times out at some point, but we might get stuck in the fault
> > > handler waiting for memory, which is pretty close to a deadlock, I
> > > suspect.    
> > 
> > I don't know those drivers that well, but at least for amdgpu the
> > problem would be that the timeout handling would need to grab some of
> > the locks the memory management is holding waiting for the timeout
> > handling to do something....
> > 
> > So that basically perfectly closes the circle. With a bit of lock you
> > get a message after some time that the kernel is stuck, but since
> > that are all sleeping locks I strongly doubt so.
> > 
> > As immediately action please provide patches which changes those
> > GFP_KERNEL into GFP_NOWAIT.  
> 
> Sure, I can do that.

Hm, I might have been too prompt at claiming this was doable. In
practice, doing that might regress Lima and Panfrost in situations
where trying harder than GFP_NOWAIT would free up some memory. Not
saying this was right to use GFP_KERNEL in the first place, but some
expectations were set by this original mistake, so I'll probably need
Lima developers to vouch in for this change after they've done some
testing on a system under high memory pressure, and I'd need to do the
same kind of testing for Panfrost and ask Steve if he's okay with that
too.

For Panthor, I'm less worried, because we have the incremental rendering
fallback, and assuming GFP_NOWAIT tries hard enough to reclaim
low-hanging fruits, the perfs shouldn't suffer much more than they
would today with GFP_KERNEL allocations potentially delaying tiling
operations longer than would have been with a primitive flush.


More information about the dri-devel mailing list