[Intel-gfx] [PATCH v2] drm/i915: Remove __GFP_NORETRY from our buffer allocator

Chris Wilson chris at chris-wilson.co.uk
Mon Jun 5 13:39:17 UTC 2017


Quoting Michal Hocko (2017-06-05 14:08:10)
> On Mon 05-06-17 13:49:38, Chris Wilson wrote:
> > Quoting Michal Hocko (2017-06-05 13:26:30)
> > > On Mon 05-06-17 11:35:12, Chris Wilson wrote:
> > > > I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
> > > > struggles with handling reclaim via kswapd (through inconsistency within
> > > > throttle_direct_reclaim() and even then the race between multiple
> > > > allocators makes the two step of reclaim then allocate fragile), and as
> > > > our buffers are always dirty (with very few exceptions), we required
> > > > kswapd to perform pageout on them. The only effective means of waiting
> > > > on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
> > > > leaves us with the dilemma of invoking the oomkiller instead of
> > > > propagating the allocation failure back to userspace where it can be
> > > > handled more gracefully (one hopes).  In the future we may have
> > > > __GFP_MAYFAIL to allow repeats up until we genuinely run out of memory
> > > > and the oomkiller would have been invoked. Until then, let the oomkiller
> > > > wreck havoc.
> > > > 
> > > > v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
> > > > 
> > > > Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
> > > > Testcase: igt/gem_tiled_swapping
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > > Cc: Michal Hocko <mhocko at suse.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem.c | 15 ++++++++++++++-
> > > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index 7286f5dd3e64..845df6067e90 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -2406,7 +2406,20 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
> > > >                       if (!*s) {
> > > >                               /* reclaim and warn, but no oom */
> > > >                               gfp = mapping_gfp_mask(mapping);
> > > > -                             gfp |= __GFP_NORETRY;
> > > > +
> > > > +                             /* Our bo are always dirty and so we require
> > > > +                              * kswapd to reclaim our pages (direct reclaim
> > > > +                              * performs no swapping on its own). However,
> > > 
> > > Not sure whether this is exactly what you mean. The only pageout the
> > > direct reclaim is allowed to the swap partition (so anonymous and
> > > shmem). So the above is not 100% correct.
> > 
> > Hmm, I didn't see anything that allows direct reclaim to perform
> > writeback into swap.

Hmm, maybe I was looking too hard at reclaim_clean_pages_from_list(), or
at least I was looking for ways I could start pageout earlier.
 
> shrink_page_list
>   add_to_swap
>     add_to_swap_cache (gains mapping see page_mapping)
>   pageout
>     mapping->a_ops->writepage = shmem_writepage
>       swap_writepage

Right, if we have sc->write_page it should be hitting here the first
time it sees a dirty page on a reclaim list. Hmm, I interpreted the
"only if nonblocking" to mean that we wouldn't swapout for direct
reclaim. Perhaps a better question then is why doesn't shmemfs make
better progress for writeback under direct reclaim?

> note that the regular writeback is not allowed by 
> shrink_page_list:
>                         if (page_is_file_cache(page) &&
>                             (!current_is_kswapd() || !PageReclaim(page) ||
>                              !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
>                                 /*
>                                  * Immediately reclaim when written back.
>                                  * Similar in principal to deactivate_page()
>                                  * except we already have the page isolated
>                                  * and know it's dirty
>                                  */
>                                 inc_node_page_state(page, NR_VMSCAN_IMMEDIATE);
>                                 SetPageReclaim(page);
> 
>                                 goto activate_locked;
>                         }
> called right before pageout()

At one point I thought that was just there to make my life more
miserable. But it's purpose is clear, no need to write a clean copy of a
file to swap when it can be read back in from the file later.

> > The issue for us (i915) is that our buffers are
> > almost exclusively dirty, so even after we unpin them, in order to make
> > room they need to be paged out. Afaict, throttle_direct_reclaim() is
> > supposed to be the point at which direct reclaim waits for writeback via
> > kswapd and doesn't invoke writeback directly.
> 
> Well, throttle_direct_reclaim is mostly about preventing over reclaim
> than anything else. It doesn't check the amount of dirty data and such.

It's the only mechanism I could see there that did try and wait on
kswapd. __GFP_KSWAPD_RECLAIM just means to kick kswapd and not actually
wait for progress from kswapd. I felt misled by that flag.
-Chris


More information about the Intel-gfx mailing list