i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
hughd at google.com
Sat Mar 17 18:44:18 PDT 2012
On Sat, 17 Mar 2012, Keith Packard wrote:
> On Sat, 17 Mar 2012 15:52:15 -0700, Linus Torvalds <torvalds at linux-foundation.org> wrote:
> > I do not believe we actually ever uncovered the original problem with
> > _MOVABLE: the problem was bisected down to the stable-backported
> > version of commit 4bdadb978569 ("drm/i915: Selectively enable
> > self-reclaim"), and I looked at the changes and decided that one of
> > the main ones was the removal of the mapping_set_gfp_mask() - which
> > resulted in __GFP_MOVABLE being on for the mapping.
> Can anyone explain what __GFP_MOVABLE even does? I can't understand what
> this flag would be for; if the page is locked (with get_page), then the
> page cannot move. If it isn't locked, then it's subject to swapping, in
> which case the page will almost certainly move when it returns from
> disk. Is it that the page won't move if it isn't swapped? That doesn't
> seem all that useful to me.
__GFP_MOVABLE is a hint to page allocation that there's a good likelihood
that this logical page can be migrated elsewhere in physical memory later
on if mm wants, so it's a good idea to allocate it from a physical area of
similarly MOVABLE pages; then if later on someone wants a large contiguous
area for something (or wants to hot-unplug that memory), it should be easy
to clear the whole area out, moving existing pages elsewhere. (I think
that's right: several questions come to me as I write it, but now is not
the time to research all those details.) Page migration can only be done
later if it can account for all of page_count(page).
> > but I didn't actually see why the i915 page pinning would be defeated
> > by __GFP_MOVABLE. The code does get a reference to them afaik.
> GTT mapping and page locking are done in lock-step in the driver:
> pins the pages
> maps to GTT
> unmaps from GTT
> unpins the pages.
> There are no other code paths to unmapping objects from the GTT or
> unpinning the pages that I can find.
> > So for example, i915_gem_object_get_pages_gtt() will use
> > shmem_read_mapping_page_gfp() which will increment the page count for
> > the page it gets, so all the obj->pages entries should have properly
> > incremented page counts. And they get released by
> > i915_gem_object_put_pages_gtt(), but maybe that is called too early
> > while the pages are still in use by the GFX unit...
> This seems the most likely problem -- there are so many caches and
> buffers involved. However, we're seeing troubles on hibernate resume, at
> which point there isn't any acceleration going on, just two fbdev
> drivers poking the hardware. Which really reduces the complexity quite a
> bit; it's just CPU reads/writes through the GTT aperture created for the
> two console frame buffers. That makes this an interesting place to look
> for trouble; we can ignore vast areas within the driver that deal with
> acceleration, at least for this case.
I keep worrying about the sequence when the machine is powered on again
after hibernation: can i915 get up to anything before it is resumed from
the hibernation image? Get to use certain pages at that stage, then
continue to poke at them after the hibernation image is restored (which
changes the story of what pages are free and what are used for what):
lacking some kind of flush?
More information about the dri-devel