i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
torvalds at linux-foundation.org
Sat Mar 17 15:52:15 PDT 2012
On Sat, Mar 17, 2012 at 3:09 PM, Hugh Dickins <hughd at google.com> wrote:
> I've got to go out for an hour: I'll digest more and think more about
> this when I get back. If someone could explain the original problem
> with _MOVABLE, that would help me:
I do not believe we actually ever uncovered the original problem with
_MOVABLE: the problem was bisected down to the stable-backported
version of commit 4bdadb978569 ("drm/i915: Selectively enable
self-reclaim"), and I looked at the changes and decided that one of
the main ones was the removal of the mapping_set_gfp_mask() - which
resulted in __GFP_MOVABLE being on for the mapping.
So we just tried re-introducing that, and it fixed the problem.
Exactly *why* movable pages are a problem was never all that clear.
The assumption was (and I guess still is) that there are physical
pointers to the pages in the DRM pages themselves. Quoting from the
"I could easily see that something would get very unhappy and corrupt
memory if the suspend-to-disk phase ends up compacting memory and
moving the pages around from under the i915 driver."
but I didn't actually see why the i915 page pinning would be defeated
by __GFP_MOVABLE. The code does get a reference to them afaik.
So no, we don't have a real first cause. We just have a band-aid for
the symptom. It might be a pure bug in hibernation that just assumes
that any pages in MOVABLE zones can always be moved without even
checking any refcounts, so not using __GFP_MOVABLE might just be
working around problems elsewhere.
Or it's entirely possible that the DRM layer does *not* increment the
refcounts of the pages properly, and that hibernate with movable pages
is just a very efficient way of showing the problems that results in.
> A factor which might turn out to be relevant: swapin_readahead() (or
> swapoff) brings in pages from swap before it knows who they belong to,
> so the swapped-in swapcache pages don't necessarily have the limitations
> mapping_set_gfp_mask() has asked for. It's been discussed with Alan,
> we know it will need fixing for gma500 when eventually that comes to
> be used on machines with more than 4GB, but for now I wasn't aware of
> a problem.
So i915 shouldn't much care about highmem vs non-highmem, it's just
that it does hold on to pages and take their physical address. But as
far as I can tell, whenever the physical address is in use, it does
have a refcount to the page.
So for example, i915_gem_object_get_pages_gtt() will use
shmem_read_mapping_page_gfp() which will increment the page count for
the page it gets, so all the obj->pages entries should have properly
incremented page counts. And they get released by
i915_gem_object_put_pages_gtt(), but maybe that is called too early
while the pages are still in use by the GFX unit...
If there is a refcounting issue, hibernate + __GFP_MOVABLE might just
be a great way to see it.
I don't really know the code, though.
More information about the dri-devel