i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)

Linus Torvalds torvalds at linux-foundation.org
Sat Mar 17 15:52:15 PDT 2012


On Sat, Mar 17, 2012 at 3:09 PM, Hugh Dickins <hughd at google.com> wrote:
>
> I've got to go out for an hour: I'll digest more and think more about
> this when I get back.  If someone could explain the original problem
> with _MOVABLE, that would help me:

I do not believe we actually ever uncovered the original problem with
_MOVABLE: the problem was bisected down to the stable-backported
version of commit 4bdadb978569 ("drm/i915: Selectively enable
self-reclaim"), and I looked at the changes and decided that one of
the main ones was the removal of the mapping_set_gfp_mask() - which
resulted in __GFP_MOVABLE being on for the mapping.

So we just tried re-introducing that, and it fixed the problem.

Exactly *why* movable pages are a problem was never all that clear.
The assumption was (and I guess still is) that there are physical
pointers to the pages in the DRM pages themselves. Quoting from the
original thread:

 "I could easily see that something would get very unhappy and corrupt
  memory if the suspend-to-disk phase ends up compacting memory and
  moving the pages around from under the i915 driver."

but I didn't actually see why the i915 page pinning would be defeated
by __GFP_MOVABLE. The code does get a reference to them afaik.

So no, we don't have a real first cause. We just have a band-aid for
the symptom. It might be a pure bug in hibernation that just assumes
that any pages in MOVABLE zones can always be moved without even
checking any refcounts, so not using __GFP_MOVABLE might just be
working around problems elsewhere.

Or it's entirely possible that the DRM layer does *not* increment the
refcounts of the pages properly, and that hibernate with movable pages
is just a very efficient way of showing the problems that results in.

> A factor which might turn out to be relevant: swapin_readahead() (or
> swapoff) brings in pages from swap before it knows who they belong to,
> so the swapped-in swapcache pages don't necessarily have the limitations
> mapping_set_gfp_mask() has asked for.  It's been discussed with Alan,
> we know it will need fixing for gma500 when eventually that comes to
> be used on machines with more than 4GB, but for now I wasn't aware of
> a problem.

So i915 shouldn't much care about highmem vs non-highmem, it's just
that it does hold on to pages and take their physical address. But as
far as I can tell, whenever the physical address is in use, it does
have a refcount to the page.

So for example, i915_gem_object_get_pages_gtt() will use
shmem_read_mapping_page_gfp() which will increment the page count for
the page it gets, so all the obj->pages[] entries should have properly
incremented page counts. And they get released by
i915_gem_object_put_pages_gtt(), but maybe that is called too early
while the pages are still in use by the GFX unit...

If there is a refcounting issue, hibernate + __GFP_MOVABLE might just
be a great way to see it.

I don't really know the code, though.

                             Linus


More information about the dri-devel mailing list