[PATCH 03/27] drm/i915: Remove __GFP_NORETRY from our buffer allocator

Chris Wilson chris at chris-wilson.co.uk
Thu Jun 1 17:55:31 UTC 2017

I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
struggles with handling reclaim via kswapd (through inconsistency within
throttle_direct_reclaim() and even then the race between multiple
allocators makes the two step of reclaim then allocate fragile), and as
our buffers are always dirty (with very few exceptions), we required
kswapd to perform pageout on them. The only effective means of waiting
on kswapd is to retry the allocations (i.e. not set __GFP_NORETRY). That
leaves us with the dilemma of invoking the oomkiller instead of
propagating the allocation failure back to userspace where it can be
handled more gracefully (one hopes). We cheat and note that __GFP_THISNODE
has the side-effect of preventing oom and has no consequence for our final
attempt at allocation.

Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Testcase: igt/gem_tiled_swapping
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
 drivers/gpu/drm/i915/i915_gem.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 129515d8482a..53c51787d2ed 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2406,7 +2406,21 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 			if (!*s) {
 				/* reclaim and warn, but no oom */
 				gfp = mapping_gfp_mask(mapping);
-				gfp |= __GFP_NORETRY;
+				/* Our bo are always dirty and so we require
+				 * kswapd to reclaim our pages (direct reclaim
+				 * performs no swapping on its own). However,
+				 * direct reclaim is meant to wait for kswapd
+				 * when under pressure, this is broken. As a
+				 * result __GFP_RECLAIM is unreliable and fails
+				 * to actually reclaim dirty pages -- unless
+				 * you try over and over again with
+				 * !__GFP_NORETRY. However, we still want to
+				 * fail this allocation rather than trigger
+				 * the out-of-memory killer and for this we
+				 * subvert __GFP_THISNODE for that side effect.
+				 */
+				gfp |= __GFP_THISNODE;
 		} while (1);

