[PATCH] drm/ttm: stop warning on TT shrinker failure

Michal Hocko mhocko at suse.com
Tue Mar 23 07:38:33 UTC 2021


On Mon 22-03-21 20:34:25, Christian König wrote:
> Am 22.03.21 um 18:02 schrieb Daniel Vetter:
> > On Mon, Mar 22, 2021 at 5:06 PM Michal Hocko <mhocko at suse.com> wrote:
> > > On Mon 22-03-21 14:05:48, Matthew Wilcox wrote:
> > > > On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
> > > > > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
> > > > > > Am 20.03.21 um 14:17 schrieb Daniel Vetter:
> > > > > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König
> > > > > > > <ckoenig.leichtzumerken at gmail.com> wrote:
> > > > > > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:
> > > > > > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
> > > > > > > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter:
> > > > > > > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
> > > > > > > > > > > > Don't print a warning when we fail to allocate a page for swapping things out.
> > > > > > > > > > > > 
> > > > > > > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
> > > > > > > > > > > Uh this part doesn't make sense. Especially since you only do it for the
> > > > > > > > > > > debugfs file, not in general. Which means you've just completely broken
> > > > > > > > > > > the shrinker.
> > > > > > > > > > Are you sure? My impression is that GFP_NOFS should now work much more out
> > > > > > > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
> > > > > > > > > Yeah, if you'd put it in the right place :-)
> > > > > > > > > 
> > > > > > > > > But also -mm folks are very clear that memalloc_no*() family is for dire
> > > > > > > > > situation where there's really no other way out. For anything where you
> > > > > > > > > know what you're doing, you really should use explicit gfp flags.
> > > > > > > > My impression is just the other way around. You should try to avoid the
> > > > > > > > NOFS/NOIO flags and use the memalloc_no* approach instead.
> > > > > > > Where did you get that idea?
> > > > > > Well from the kernel comment on GFP_NOFS:
> > > > > > 
> > > > > >   * %GFP_NOFS will use direct reclaim but will not use any filesystem
> > > > > > interfaces.
> > > > > >   * Please try to avoid using this flag directly and instead use
> > > > > >   * memalloc_nofs_{save,restore} to mark the whole scope which
> > > > > > cannot/shouldn't
> > > > > >   * recurse into the FS layer with a short explanation why. All allocation
> > > > > >   * requests will inherit GFP_NOFS implicitly.
> > > > > Huh that's interesting, since iirc Willy or Dave told me the opposite, and
> > > > > the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
> > > > > GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
> > > > > 
> > > > > Adding them, maybe I got confused.
> > > > My impression is that the scoped API is preferred these days.
> > > > 
> > > > https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
> > > > 
> > > > I'd probably need to spend a few months learning the DRM subsystem to
> > > > have a more detailed opinion on whether passing GFP flags around explicitly
> > > > or using the scope API is the better approach for your situation.
> > > yes, in an ideal world we would have a clearly defined scope of the
> > > reclaim recursion wrt FS/IO associated with it. I've got back to
> > > https://lore.kernel.org/amd-gfx/20210319140857.2262-1-christian.koenig@amd.com/
> > > and there are two things standing out. Why does ttm_tt_debugfs_shrink_show
> > > really require NOFS semantic? And why does it play with
> > > fs_reclaim_acquire?
> > It's our shrinker. shrink_show simply triggers that specific shrinker
> > asking it to shrink everything it can, which helps a lot with testing
> > without having to drive the entire system against the OOM wall.

Yes I figured that much. But...

> > fs_reclaim_acquire is there to make sure lockdep understands that this
> > is a shrinker and that it checks all the dependencies for us like if
> > we'd be in real reclaim. There is some drop caches interfaces in proc
> > iirc, but those drop everything, and they don't have the fs_reclaim
> > annotations to teach lockdep about what we're doing.

... I really do not follow this. You shouldn't really care whether this
is a reclaim interface or not. Or maybe I just do not understand this...
 
> To summarize the debugfs code is basically to test if that stuff really
> works with GFP_NOFS.

What do you mean by testing GFP_NOFS. Do you mean to test that GFP_NOFS
context is sufficiently powerful to reclaim enough objects due to some
internal constrains?

> My only concern is that if I could rely on memalloc_no* being used we could
> optimize this quite a bit further.

Yes you can use the scope API and you will be guaranteed that _any_
allocation from the enclosed context will inherit GFP_NO* semantic.

-- 
Michal Hocko
SUSE Labs


More information about the amd-gfx mailing list