[Intel-gfx] I've got the RC6 bug
Daniel Vetter
daniel at ffwll.ch
Fri Jan 20 11:46:07 CET 2012
On Fri, Jan 20, 2012 at 11:30:24AM +0100, Daniel Vetter wrote:
> On Wed, Jan 18, 2012 at 01:24:26AM +0100, Daniel Vetter wrote:
> > On Wed, Jan 18, 2012 at 01:16:02AM +0100, CC wrote:
> > > On Mon, Jan 16, 2012 at 5:36 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
> > >
> > > > On Mon, Jan 16, 2012 at 05:18:17PM +0100, CC wrote:
> > > > > Hi,
> > > > >
> > > > > I've heard that you need users having the RC6 bug.
> > > > >
> > > > > I have the following setup:
> > > > > CPU: Intel Core i5-2500K
> > > > > Mainboard: ASRock Z68 Pro3-M
> > > > > Memory: Corsair Vengeance CMZ8GX3M2A1866C9
> > > > >
> > > > > Although the CPU doesn't support VT-d, I disabled all virtualization
> > > > > support in the UEFI setup.
> > > > >
> > > > > I use Arch Linux and Gnome 3 in the fallback mode. The problem is more
> > > > > drastic without fallback mode, however.
> > > > >
> > > > > Whenever I enable RC6, I get the a few of these errors in dmesg:
> > > > >
> > > > > [ 48.900000] WARNING: at drivers/gpu/drm/i915/i915_drv.c:387
> > > > > __gen6_gt_wait_for_fifo+0x94/0xa0 [i915]()
> > > > > [ 48.900002] Hardware name: To Be Filled By O.E.M.
> > > > > [ 48.900002] Modules linked in: ipv6 fuse ext2 snd_hda_codec_hdmi
> > > > > snd_hda_codec_realtek mei(C) joydev r8169 shpchp pci_hotplug usbhid hid
> > > > > snd_hda_intel iTCO_wdt mii iTCO_vendor_support i2c_i801 snd_hda_codec
> > > > > processor snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc
> > > > psmouse
> > > > > serio_raw pcspkr evdev ext4 mbcache jbd2 crc16 xhci_hcd ehci_hcd usbcore
> > > > > i915 drm_kms_helper drm intel_agp i2c_algo_bit button intel_gtt i2c_core
> > > > > video sd_mod ahci libahci libata scsi_mod
> > > > > [ 48.900019] Pid: 623, comm: Xorg Tainted: G WC 3.1.9-2-ARCH #1
> > > > > [ 48.900020] Call Trace:
> > > > > [ 48.900023] [<ffffffff81061bef>] warn_slowpath_common+0x7f/0xc0
> > > > > [ 48.900025] [<ffffffff81061c4a>] warn_slowpath_null+0x1a/0x20
> > > > > [ 48.900028] [<ffffffffa00e0764>] __gen6_gt_wait_for_fifo+0x94/0xa0
> > > > > [i915]
> > > > > [ 48.900032] [<ffffffffa015d2d5>] ring_write_tail+0x65/0x120 [i915]
> > > > > [ 48.900036] [<ffffffffa01619bc>] render_ring_flush+0xbc/0xe0 [i915]
> > > > > [ 48.900040] [<ffffffffa010b803>] i915_gem_flush_ring+0x43/0x250
> > > > [i915]
> > > > > [ 48.900044] [<ffffffffa0112b50>]
> > > > > i915_gem_do_execbuffer.isra.7+0x1020/0x16d0 [i915]
> > > > > [ 48.900048] [<ffffffffa01136bb>] i915_gem_execbuffer2+0x8b/0x240
> > > > [i915]
> > > > > [ 48.900051] [<ffffffffa0098434>] drm_ioctl+0x3e4/0x4c0 [drm]
> > > > > [ 48.900053] [<ffffffff810746cb>] ? recalc_sigpending+0x1b/0x50
> > > > > [ 48.900057] [<ffffffffa0113630>] ? i915_gem_execbuffer+0x430/0x430
> > > > > [i915]
> > > > > [ 48.900059] [<ffffffff8101e9b1>] ? fpu_finit+0x21/0x40
> > > > > [ 48.900061] [<ffffffff8116fddf>] do_vfs_ioctl+0x8f/0x500
> > > > > [ 48.900063] [<ffffffff81014beb>] ? sys_rt_sigreturn+0x1eb/0x200
> > > > > [ 48.900064] [<ffffffff811702e1>] sys_ioctl+0x91/0xa0
> > > > > [ 48.900066] [<ffffffff8140c3c2>] system_call_fastpath+0x16/0x1b
> > > > > [ 48.900067] ---[ end trace 9a23b8b32b16a424 ]---
> > > >
> > > > This is a known side-effect of a dying gpu. It essentially means that the
> > > > gpu refuses to wake up from deep-sleep states.
> > > >
> > > > > and then
> > > > >
> > > > > [ 53.163526] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
> > > > > elapsed... GPU hung
> > > > > [ 53.165046] [drm] capturing error event; look for more information in
> > > > > /debug/dri/0/i915_error_state
> > > > > [ 53.177356] [drm:i915_wait_request] *ERROR* i915_wait_request returns
> > > > > -11 (awaiting 1593 at 1592, next 1594)
> > > > > [ 53.181979] [drm:init_ring_common] *ERROR* render ring initialization
> > > > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > > > [ 53.185522] [drm:init_ring_common] *ERROR* gen6 bsd ring
> > > > initialization
> > > > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > > > [ 53.188558] [drm:init_ring_common] *ERROR* blt ring initialization
> > > > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > > > [ 55.330146] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
> > > > > elapsed... GPU hung
> > > > > [ 55.332202] [drm:i915_wait_request] *ERROR* i915_wait_request returns
> > > > > -11 (awaiting 1594 at 1591, next 1595)
> > > > > [ 55.333258] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring
> > > > > wedged!
> > > > > [ 55.333260] [drm:i915_reset] *ERROR* Failed to reset chip.
> > > > >
> > > > > Of course, I'd be willing to test out stuff. I'd need a bit of guide,
> > > > > however.
> > > >
> > > > Can you please attach i915_error_state from debugfs (you need to retrigger
> > > > the issue)? It contains a gpu dump which is useful to diagnose the bug.
> > > >
> > > > Yours, Daniel
> > > > --
> > > > Daniel Vetter
> > > > Mail: daniel at ffwll.ch
> > > > Mobile: +41 (0)79 365 57 48
> > > >
> > >
> > > I attached the error state.
> >
> > Nice one, your gpu seems to have simply disappeared. And the ringbuffer
> > contains a rather peculiar cmd sequence. Putting Chris (maybe he
> > recognizes the pattern) and Ben (he's got a patch in the works to dump a
> > debug register that might be interesting here) on cc. It's too late atm
> > for me to think about this some more.
>
> Chris and me looked some more at this one and it's a keeper. Can you
> please file a bug report on bugs.freedesktop.org against drm/i915 with the
> usual details and these 2 error_states attached.
Chris just had a new idea that would explain your error_state rather
neatly. Can you try the latest drm-intel-fixes branch from
https://git.kernel.org/?p=linux/kernel/git/keithp/linux.git;a=summary
That contains a forcewake locking fix, the lack of which would explain all
the 0s in the registers of your dump (assuming the gpu went to sleep for
whatever reasons).
Thanks, Daniel
--
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48
More information about the Intel-gfx
mailing list