[Intel-gfx] i915 corrupting memory on shutdown since 4.4?

Chris Wilson chris at chris-wilson.co.uk
Fri Jan 29 04:37:22 PST 2016


On Fri, Jan 29, 2016 at 12:02:26AM +0000, Chris Bainbridge wrote:
> Hi,
> 
> I'm using 4.5-rc1 and before that 4.4. Twice in the past month I've
> rebooted and the root btrfs partition has become corrupted and
> unbootable. I was wondering if the cause could be i915 and if so is
> there any better way to track it down?
> 
> In test 200 reboots gave only 3 errors, but the third error did corrupt
> the btrfs partition and it became unbootable. But I still don't have a
> reliable way to automatically reproduce this. I'm not sure, but it may
> be related to use of Chrome or Iceweasel (Debian Firefox) with many open
> tabs. The type of display does not seem to matter - this testing was
> with external displays, but the fs corruption previously happened with
> laptop display only.
> 
> While searching I came across some links which sound similar:
> http://lists.freedesktop.org/archives/intel-gfx/2014-June/046476.html (i915_gem_shrink oom)
> http://codemonkey.org.uk/tag/i915 (i915/hibernate memory corruption)
> 
> Relevant log extracts (from beginning of problems) follow. The reason I
> suspect i915 is that in two of the cases the first trace included
> i915_gem_shrinker_oom, and after turning on debugging I got odebug
> warnings about a timer error between underrun errors from the gpu code.

Neither of the i915 suspects here are significant. The
i915_gem_shrinker_oom is just an oom notifier that runs to free up
memory from the GPU upon an oom. The underruns are just a symptom from
the internals of the display subsystem. The corruption doesn't have the
telltale of pixel data - but we don't have any evidence of what pattern
the corruption is yet.

The timer error is definitely interesting as it implies that the backing 
storage device is corrupt, which correlates well with the swap errors.
I'd turn up the use-after-free debugging, redzoning, poisoning (which
hopefully would let us see the corruption) etc and if at all possible run
through with kmemcheck/kasan.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list