[Intel-gfx] I've got the RC6 bug

Fri Jan 20 11:46:07 CET 2012

On Fri, Jan 20, 2012 at 11:30:24AM +0100, Daniel Vetter wrote:
> On Wed, Jan 18, 2012 at 01:24:26AM +0100, Daniel Vetter wrote:
> > On Wed, Jan 18, 2012 at 01:16:02AM +0100, CC wrote:
> > > On Mon, Jan 16, 2012 at 5:36 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
> > > 
> > > > On Mon, Jan 16, 2012 at 05:18:17PM +0100, CC wrote:
> > > > > Hi,
> > > > >
> > > > > I've heard that you need users having the RC6 bug.
> > > > >
> > > > > I have the following setup:
> > > > > CPU: Intel Core i5-2500K
> > > > > Mainboard: ASRock Z68 Pro3-M
> > > > > Memory: Corsair Vengeance CMZ8GX3M2A1866C9
> > > > >
> > > > > Although the CPU doesn't support VT-d, I disabled all virtualization
> > > > > support in the UEFI setup.
> > > > >
> > > > > I use Arch Linux and Gnome 3 in the fallback mode. The problem is more
> > > > > drastic without fallback mode, however.
> > > > >
> > > > > Whenever I enable RC6, I get the a few of these errors in dmesg:
> > > > >
> > > > > [   48.900000] WARNING: at drivers/gpu/drm/i915/i915_drv.c:387
> > > > > __gen6_gt_wait_for_fifo+0x94/0xa0 [i915]()
> > > > > [   48.900002] Hardware name: To Be Filled By O.E.M.
> > > > > [   48.900002] Modules linked in: ipv6 fuse ext2 snd_hda_codec_hdmi
> > > > > snd_hda_codec_realtek mei(C) joydev r8169 shpchp pci_hotplug usbhid hid
> > > > > snd_hda_intel iTCO_wdt mii iTCO_vendor_support i2c_i801 snd_hda_codec
> > > > > processor snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc
> > > > psmouse
> > > > > serio_raw pcspkr evdev ext4 mbcache jbd2 crc16 xhci_hcd ehci_hcd usbcore
> > > > > i915 drm_kms_helper drm intel_agp i2c_algo_bit button intel_gtt i2c_core
> > > > > video sd_mod ahci libahci libata scsi_mod
> > > > > [   48.900019] Pid: 623, comm: Xorg Tainted: G        WC  3.1.9-2-ARCH #1
> > > > > [   48.900020] Call Trace:
> > > > > [   48.900023]  [<ffffffff81061bef>] warn_slowpath_common+0x7f/0xc0
> > > > > [   48.900025]  [<ffffffff81061c4a>] warn_slowpath_null+0x1a/0x20
> > > > > [   48.900028]  [<ffffffffa00e0764>] __gen6_gt_wait_for_fifo+0x94/0xa0
> > > > > [i915]
> > > > > [   48.900032]  [<ffffffffa015d2d5>] ring_write_tail+0x65/0x120 [i915]
> > > > > [   48.900036]  [<ffffffffa01619bc>] render_ring_flush+0xbc/0xe0 [i915]
> > > > > [   48.900040]  [<ffffffffa010b803>] i915_gem_flush_ring+0x43/0x250
> > > > [i915]
> > > > > [   48.900044]  [<ffffffffa0112b50>]
> > > > > i915_gem_do_execbuffer.isra.7+0x1020/0x16d0 [i915]
> > > > > [   48.900048]  [<ffffffffa01136bb>] i915_gem_execbuffer2+0x8b/0x240
> > > > [i915]
> > > > > [   48.900051]  [<ffffffffa0098434>] drm_ioctl+0x3e4/0x4c0 [drm]
> > > > > [   48.900053]  [<ffffffff810746cb>] ? recalc_sigpending+0x1b/0x50
> > > > > [   48.900057]  [<ffffffffa0113630>] ? i915_gem_execbuffer+0x430/0x430
> > > > > [i915]
> > > > > [   48.900059]  [<ffffffff8101e9b1>] ? fpu_finit+0x21/0x40
> > > > > [   48.900061]  [<ffffffff8116fddf>] do_vfs_ioctl+0x8f/0x500
> > > > > [   48.900063]  [<ffffffff81014beb>] ? sys_rt_sigreturn+0x1eb/0x200
> > > > > [   48.900064]  [<ffffffff811702e1>] sys_ioctl+0x91/0xa0
> > > > > [   48.900066]  [<ffffffff8140c3c2>] system_call_fastpath+0x16/0x1b
> > > > > [   48.900067] ---[ end trace 9a23b8b32b16a424 ]---
> > > >
> > > > This is a known side-effect of a dying gpu. It essentially means that the
> > > > gpu refuses to wake up from deep-sleep states.
> > > >
> > > > > and then
> > > > >
> > > > > [   53.163526] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
> > > > > elapsed... GPU hung
> > > > > [   53.165046] [drm] capturing error event; look for more information in
> > > > > /debug/dri/0/i915_error_state
> > > > > [   53.177356] [drm:i915_wait_request] *ERROR* i915_wait_request returns
> > > > > -11 (awaiting 1593 at 1592, next 1594)
> > > > > [   53.181979] [drm:init_ring_common] *ERROR* render ring initialization
> > > > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > > > [   53.185522] [drm:init_ring_common] *ERROR* gen6 bsd ring
> > > > initialization
> > > > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > > > [   53.188558] [drm:init_ring_common] *ERROR* blt ring initialization
> > > > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > > > [   55.330146] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
> > > > > elapsed... GPU hung
> > > > > [   55.332202] [drm:i915_wait_request] *ERROR* i915_wait_request returns
> > > > > -11 (awaiting 1594 at 1591, next 1595)
> > > > > [   55.333258] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring
> > > > > wedged!
> > > > > [   55.333260] [drm:i915_reset] *ERROR* Failed to reset chip.
> > > > >
> > > > > Of course, I'd be willing to test out stuff. I'd need a bit of guide,
> > > > > however.
> > > >
> > > > Can you please attach i915_error_state from debugfs (you need to retrigger
> > > > the issue)? It contains a gpu dump which is useful to diagnose the bug.
> > > >
> > > > Yours, Daniel
> > > > --
> > > > Daniel Vetter
> > > > Mail: daniel at ffwll.ch
> > > > Mobile: +41 (0)79 365 57 48
> > > >
> > > 
> > > I attached the error state.
> > 
> > Nice one, your gpu seems to have simply disappeared. And the ringbuffer
> > contains a rather peculiar cmd sequence. Putting Chris (maybe he
> > recognizes the pattern) and Ben (he's got a patch in the works to dump a
> > debug register that might be interesting here) on cc. It's too late atm
> > for me to think about this some more.
> 
> Chris and me looked some more at this one and it's a keeper. Can you
> please file a bug report on bugs.freedesktop.org against drm/i915 with the
> usual details and these 2 error_states attached.

Chris just had a new idea that would explain your error_state rather
neatly. Can you try the latest drm-intel-fixes branch from

https://git.kernel.org/?p=linux/kernel/git/keithp/linux.git;a=summary

That contains a forcewake locking fix, the lack of which would explain all
the 0s in the registers of your dump (assuming the gpu went to sleep for
whatever reasons).

Thanks, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48