[Intel-gfx] [BUG] 2.6.38-rc1-git1: hard lockup related to i915 / automated cgroup scheduling

Linus Torvalds torvalds at linux-foundation.org
Thu Jan 20 18:58:42 CET 2011


On Thu, Jan 20, 2011 at 9:29 AM, Knut Petersen
<Knut_Petersen at t-online.de> wrote:
> Kernel 2.6.38-rc1 and -git1 will lock my AOpen i915GMm-HFS
> at the end of  KDE startup if automatic process group scheduling
> is actived in kernel config. A hard reset is necessary.
> Without automatic process group scheduling everything is ok.

Interesting. Most likely timing-related, but maybe there's some actual
memory corruption. Adding the scheduler guys just in case.

It might be interesting to see if enabling SLUB debugging makes any
difference. Interesting for two reasons:

 - it may just make the problem go away because it changes timings
radically enough (which is the bad case, since that doesn't really
help us very much)

 - maybe it's not timing-related, and instead shows some slab misuse
and corruption that explains the problem.

I dunno.

> Reproducibility of bug: 100 %
> System: AOpen i915GMm-Hfs, 2GB, Pentium M
> Distribution: openSuSE 11.3
>
> cu,
>  Knut
>
> Jan 20 17:57:07 golem kernel: [   58.087054] ------------[ cut here ]------------
> Jan 20 17:57:07 golem kernel: [   58.087117] kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3254!

Grr. Hate people who do BUG_ON() calls that kill the machine and make
things harder to debug.

What happens if you replace that

  BUG_ON(obj->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT);

with a

  if (WARN_ON_ONCE(obj->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
    return -ENOMEM;

or similar? Does it limp along? I'm not suggesting that as a fix
(obviously), but I do think that we have way too many BUG_ON's, and
too few people thinking about "how can I make the machine possibly
limp on so that the oops is easier to see and report"

                     Linus



More information about the Intel-gfx mailing list