i915 modeset memory corruption issues? (Fwd: Oops in ext3_block_to_path.isra.40+0x26/0x11b)
Rafael J. Wysocki
rjw at sisk.pl
Sun Mar 18 04:55:44 PDT 2012
On Sunday, March 18, 2012, Hugh Dickins wrote:
> Added Rafael to the Cc: Rafael, we're pondering over one or more of these
> recurrent threads about corruption after resume, seemingly related to i915.
Thanks for letting me know. :-)
I actually have a confirmation that the issue isn't present if the KMS is
> On Sat, 17 Mar 2012, Keith Packard wrote:
> > On Sat, 17 Mar 2012 18:44:18 -0700 (PDT), Hugh Dickins <hughd at google.com> wrote:
> > > I keep worrying about the sequence when the machine is powered on again
> > > after hibernation: can i915 get up to anything before it is resumed from
> > > the hibernation image?
> > Well, the frame buffer is presumably still using whatever mapping it had
> > before suspend occurred; is there any way it could be writing through
> > that before the graphics driver was resumed?
> It's hibernation restore here, so I don't think it could be using the
> mapping from before hibernation until after resuming from hibernation
> snapshot: it would be using the rebooting kernel's mapping until then.
> > What I don't understand is the relationship between the boot kernel and
> > the resumed kernel; when does the boot kernel stop writing to the
> > console, and how does it hand off control of the frame buffer at that
> > time.
> I believe the handoff point comes in the late initcall software_resume():
> which loads the image and calls hibernation_restore -> resume_target_kernel
> -> swsusp_arch_resume, which emerges into the restored hibernation image.
That's correct too. It may, however, be done through the
SNAPSHOT_ATOMIC_RESTORE ioctl in kernel/power/user.c, that calls
hibernation_restore() directly, too.
In either case, hibernation_restore() calls suspend_console() which is the
point the console is supposed to be left alone at.
Later, it calls dpm_suspend_start() that executes .freeze() callbacks of
all device drivers and it calls resume_target_kernel() that executes the
driver's .freeze_noirq() callbacks. If that goes well, swsusp_arch_resume()
is run that jumps into the image kernel and then it starts over in
create_image() at the restore_processor_state() call.
> As a late initcall, I imagine some work has already been done via the
> framebuffer, but I have no conception of what kind of mappings that
> involves (would shmem objects come into it at all? and is that even
> a relevant question, could enough damage be done without them?), nor
> whether they're properly torn down before emerging into the hibernimage.
During resume from hibernation we're very careful to restore the entire
contents of RAM and to switch the CPU (there's only one of them executing
code at that point) to the pre-hibernation page tables. As a result, all
of the CPU's memory mappings will be the same as before the hibernation
when that's been completed. However, I'm not sure what the contents of
the graphics' registers is at this point and whether or not it may possibly
access "wrong" memory regions through DMA.
> > It would be great if we could separate out the boot kernel access to the
> > graphics system from the resumed system -- if the boot kernel was run
> > without the i915 driver loaded at all, and just used VGA text mode, then
> > any damage as a result of resume wouldn't be caused by the boot kernel
> > GTT mappings getting used at the wrong time.
That shouldn't be very difficult to verify, if i915 is built as a module and
is not loaded until software_resume() is run.
> But you're giving my worry more credence than it deserves there:
> we don't have any evidence that this is where the problem lies,
> that's just a suspicion of mine at the moment.
Well, pretty much the only explanation of the observed symptoms I can imagine
is some kind of "leakage" of the boot kernel's memory mappings through the
graphics adapter into the post-restore system.
More information about the dri-devel