[Intel-gfx] [PATCH] drm/i915: Selectively enable self-reclaim
torvalds at linux-foundation.org
Thu Jul 1 01:07:01 CEST 2010
On Wed, Jun 30, 2010 at 12:05 AM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> Reviewing the patch again, we no longer set the default gfpmask on the
> inode to contain NORETRY and instead add the NORETRY at the one spot in
> the code where we are trying to do a large allocation and our shrinker
> would be prevented from running (due to contention on struct_mutex).
> I do not know how this causes memory corruption across hibernate and would
> appreciate any insights.
Hmm. More likely is the __GFP_MOVABLE flag, I think.
That commit changes the page cache allocation to use
+ mapping_gfp_mask (mapping) |
+ __GFP_COLD |
if I read it right. And the default mapping_gfp_mask() is
GFP_HIGHUSER_MOVABLE, so I think you get all of
(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | __GFP_HIGHMEM)
set by default.
The old code didn't just play games with ~__GFP_NORETRY and change
that at runtime (which was buggy - no locking, no protection, no
nothing), it also initialized the gfp mask. And that code also got
- /* Basically we want to disable the OOM killer and handle ENOMEM
- * ourselves by sacrificing pages from cached buffers.
- * XXX shmem_file_[gs]et_gfp_mask()
- GFP_HIGHUSER |
- __GFP_COLD |
- __GFP_FS |
- __GFP_RECLAIMABLE |
- __GFP_NORETRY |
- __GFP_NOWARN |
(and note how it doesn't have __GFP_MOVABLE set).
So I wonder if we shouldn't re-instate that mapping_set_gfp_mask() for
the _initial_ setting when the file descriptor is created. That part
wasn't the bug - the bug was the code that used to try to do that
whole per-allocation dance with the bits incorrectly (ie this part of
- gfp = i915_gem_object_get_page_gfp_mask(obj);
- i915_gem_object_set_page_gfp_mask(obj, gfp & ~__GFP_NORETRY);
- ret = i915_gem_object_get_pages(obj);
- i915_gem_object_set_page_gfp_mask (obj, gfp);
in that patch).
I could easily see that something would get very unhappy and corrupt
memory if the suspend-to-disk phase ends up compacting memory and
moving the pages around from under the i915 driver.
I dunno. But that seems more likely than the __GFP_NORETRY flag, which
should have no semantic meaning (except making it more likely for
allocations to fail, of course, but that's what the i915 code
More information about the Intel-gfx