[Bug 69340] New: Recent mesa git revisions cause frequent gpu hangs on radeonsi

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Sep 13 16:10:45 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=69340

          Priority: medium
            Bug ID: 69340
          Assignee: dri-devel at lists.freedesktop.org
           Summary: Recent mesa git revisions cause frequent gpu hangs on
                    radeonsi
          Severity: normal
    Classification: Unclassified
                OS: Linux (All)
          Reporter: j.suarez.agapito at gmail.com
          Hardware: x86-64 (AMD64)
            Status: NEW
           Version: git
         Component: Drivers/Gallium/radeonsi
           Product: Mesa

After installing mesa git 395b9410 (from oibaf's ppa on Kubuntu raring) I am
experiencing gpu hangs and kernel panics when launching "somewhat complex" 3D
games. For example, glxgears and supertuxkart do not produce the gpu hang, but
speed-dreams2 (it hangs when the game should show your car in order to drive),
L4D2 (just after the Valve logo-video, just when the game intro movie should
start playing) and Crusader Kings II (just at the very beginning, when the
loading screen should come up).

The last mesa git version I had installed was 505fad04, which works correctly.

Moreover, the crashes happen both with radeon.dpm=1 and radeon.dpm=0.

I have managed to get some dmesg outputs of the crashes:

Crash #1

[  334.162270] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[  334.162280] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000160ea)
[  334.162289] radeon 0000:01:00.0: failed to get a new IB (-35)
[  334.162291] [TTM] Failed to expire sync object before buffer eviction
[  334.162299] [drm:radeon_cs_ib_vm_chunk] *ERROR* Failed to get ib !
[  334.162378] [TTM] Failed to expire sync object before buffer eviction
[  334.172123] radeon 0000:01:00.0: sa_manager is not empty, clearing anyway
[  334.381742] radeon 0000:01:00.0: Saved 97917 dwords of commands on ring 0.
[  334.381879] radeon 0000:01:00.0: GPU softreset: 0x00000049
[  334.381882] radeon 0000:01:00.0:   GRBM_STATUS               = 0xE5D04028
[  334.381884] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xEE400000
[  334.381886] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0xEE400000
[  334.381889] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  334.382000] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  334.382002] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  334.382004] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00018000
[  334.382006] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00408002
[  334.382009] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x84038643
[  334.382011] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  334.382013] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  334.382016] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00000000
[  334.382018] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x00000000
[  334.386528] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
[  334.386582] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  334.387728] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003028
[  334.387730] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000006
[  334.387731] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000006
[  334.387733] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  334.387844] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  334.387846] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  334.387848] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  334.387850] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  334.387852] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[  334.387854] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  334.387856] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  334.387981] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  334.415260] [drm] probing gen 2 caps for device 1002:5a16 = 31cd02/0
[  334.415264] [drm] PCIE gen 2 link speeds already enabled
[  334.417417] [drm] PCIE GART of 512M enabled (table at 0x0000000000276000).
[  334.417520] radeon 0000:01:00.0: WB enabled
[  334.417522] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr
0x0000000080000c00 and cpu addr 0xffff880412af4c00
[  334.417524] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr
0x0000000080000c04 and cpu addr 0xffff880412af4c04
[  334.417526] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr
0x0000000080000c08 and cpu addr 0xffff880412af4c08
[  334.417528] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr
0x0000000080000c0c and cpu addr 0xffff880412af4c0c
[  334.417530] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr
0x0000000080000c10 and cpu addr 0xffff880412af4c10
[  334.418521] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr
0x0000000000075a18 and cpu addr 0xffffc90011db5a18
[  334.436919] [drm] ring test on 0 succeeded in 3 usecs
[  334.436924] [drm] ring test on 1 succeeded in 1 usecs
[  334.436929] [drm] ring test on 2 succeeded in 1 usecs
[  334.436992] [drm] ring test on 3 succeeded in 2 usecs
[  334.437003] [drm] ring test on 4 succeeded in 1 usecs
[  334.612443] [drm] ring test on 5 succeeded in 2 usecs
[  334.612447] [drm] UVD initialized successfully.
[  334.657863] [drm] ib test on ring 0 succeeded in 0 usecs
[  334.658379] [drm] ib test on ring 1 succeeded in 0 usecs
[  334.658543] [drm] ib test on ring 2 succeeded in 0 usecs
[  334.658587] [drm] ib test on ring 3 succeeded in 0 usecs
[  334.658627] [drm] ib test on ring 4 succeeded in 1 usecs

Crash #2

[  768.143440] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[  768.143452] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000015f1
last fence id 0x00000000000015ed)
[  768.642649] radeon 0000:01:00.0: GPU lockup CP stall for more than 10500msec
[  768.642659] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000073db9)
[  768.642666] radeon 0000:01:00.0: failed to get a new IB (-35)
[  768.642689] BUG: unable to handle kernel paging request at 0000100000000018
[  768.642756] IP: [<ffffffffa014f13d>] radeon_ib_sync_to+0x1d/0x40 [radeon]
[  768.642862] PGD 0
[  768.642883] Oops: 0000 [#1] SMP
[  768.642915] Modules linked in: snd_hrtimer parport_pc ppdev bnep rfcomm
bluetooth binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek sp5100_tco
eeepc_wmi asus_wmi sparse_keymap video snd_hda_intel snd_hda_codec snd_hwdep
snd_pcm snd_page_alloc arc4 snd_seq_midi snd_seq_midi_event rt61pci snd_rawmidi
rt2x00pci rt2x00mmio rt2x00lib mac80211 snd_seq snd_seq_device snd_timer
cfg80211 edac_core snd psmouse eeprom_93cx6 edac_mce_amd crc_itu_t serio_raw
fam15h_power k10temp i2c_piix4 ohci_pci soundcore mac_hid it87 hwmon_vid lp
parport hid_generic usbhid hid mxm_wmi radeon i2c_algo_bit ttm e1000e
drm_kms_helper ahci ptp drm pps_core libahci wmi
[  768.643499] CPU: 4 PID: 3886 Comm: ck2 Not tainted 3.11.0-031100-generic
#201309021735
[  768.643565] Hardware name: To be filled by O.E.M. To be filled by
O.E.M./Crosshair V Formula, BIOS 1605 09/21/2012
[  768.643650] task: ffff8803c7cac650 ti: ffff88037be68000 task.ti:
ffff88037be68000
[  768.643712] RIP: 0010:[<ffffffffa014f13d>]  [<ffffffffa014f13d>]
radeon_ib_sync_to+0x1d/0x40 [radeon]
[  768.643830] RSP: 0018:ffff88037be69a08  EFLAGS: 00210206
[  768.643875] RAX: 0000100000000000 RBX: ffff880415251900 RCX:
0000000000000000
[  768.643934] RDX: 0000000000000000 RSI: ffff88037bdca8c0 RDI:
ffff88037be69a30
[  768.643992] RBP: ffff88037be69a08 R08: 0000000000000006 R09:
0000000000001000
[  768.644051] R10: 0000000000000005 R11: 0000000000000000 R12:
ffff88037c644ea0
[  768.644109] R13: ffff88041218c000 R14: 0000000000000000 R15:
0000000000000000
[  768.644169] FS:  00007ff86a366740(0000) GS:ffff88042ed00000(0000)
knlGS:00000000e7a5fb40
[  768.644236] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  768.644283] CR2: 0000100000000018 CR3: 00000003cb38d000 CR4:
00000000000407e0
[  768.644342] Stack:
[  768.644360]  ffff88037be69ad8 ffffffffa013cb35 ffff880400000016
ffff88042ecd4580
[  768.644428]  ffff88037be69a48 0000000000000000 ffff880300000010
ffff8804130ba6e8
[  768.644496]  ffff88037be69a58 ffffffff8106a164 003fe00024200002
0000006000000000
[  768.644563] Call Trace:
[  768.644624]  [<ffffffffa013cb35>] radeon_vm_bo_update_pte+0x165/0x270
[radeon]
[  768.644691]  [<ffffffff8106a164>] ? local_bh_enable+0x94/0xa0
[  768.644776]  [<ffffffffa013cc9b>] radeon_vm_bo_rmv+0x5b/0xf0 [radeon]
[  768.644870]  [<ffffffffa014e203>] radeon_gem_object_close+0xf3/0x110
[radeon]
[  768.644951]  [<ffffffffa0028502>] drm_gem_object_release_handle+0x72/0xf0
[drm]
[  768.645016]  [<ffffffff81368051>] idr_for_each+0xa1/0xf0
[  768.645079]  [<ffffffffa0028490>] ? drm_gem_handle_create+0xf0/0xf0 [drm]
[  768.645140]  [<ffffffff81728c6d>] ? mutex_lock+0x1d/0x41
[  768.645203]  [<ffffffffa0028a14>] drm_gem_release+0x24/0x40 [drm]
[  768.645271]  [<ffffffffa00270e2>] drm_release+0x482/0x520 [drm]
[  768.645326]  [<ffffffff811b3f1a>] __fput+0xba/0x240
[  768.645371]  [<ffffffff811b40ee>] ____fput+0xe/0x10
[  768.645415]  [<ffffffff81085848>] task_work_run+0xc8/0xf0
[  768.645463]  [<ffffffff81067ede>] do_exit+0x19e/0x480
[  768.645510]  [<ffffffff81068254>] do_group_exit+0x44/0xa0
[  768.645558]  [<ffffffff810782a1>] get_signal_to_deliver+0x231/0x480
[  768.645615]  [<ffffffff81013be7>] do_signal+0x47/0x140
[  768.645662]  [<ffffffff81712eda>] ? is_prefetch.isra.12.part.13+0x1a4/0x1ff
[  768.645724]  [<ffffffff8109d134>] ? vtime_account_user+0x74/0x90
[  768.645777]  [<ffffffff81013d68>] do_notify_resume+0x88/0xc0
[  768.645827]  [<ffffffff8172cdbc>] retint_signal+0x48/0x8c
[  768.645873] Code: 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 44 00 00 66 66 66 66 90
55 48 85 f6 48 89 e5 74 25 8b 4e 18 89 ca 48 8b 44 d7 40 48 85 c0 74 1b <3b> 48
18 75 1b 48 8b 48 10 48 39 4e 10 48 0f 47 c6 48 89 44 d7
[  768.646149] RIP  [<ffffffffa014f13d>] radeon_ib_sync_to+0x1d/0x40 [radeon]
[  768.646246]  RSP <ffff88037be69a08>
[  768.646276] CR2: 0000100000000018
[  768.666026] ---[ end trace 50e00cc0d778d510 ]---
[  768.666030] Fixing recursive fault but reboot is needed!
[  769.141752] radeon 0000:01:00.0: GPU lockup CP stall for more than 11000msec
[  769.141760] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000073db9)
[  769.141768] radeon 0000:01:00.0: failed to get a new IB (-35)
[  769.141772] [drm:radeon_cs_ib_vm_chunk] *ERROR* Failed to get ib !

Crash #3

[  125.411256] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[  125.411265] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000014894)
[  125.411273] radeon 0000:01:00.0: failed to get a new IB (-35)
[  125.411278] [drm:radeon_cs_ib_vm_chunk] *ERROR* Failed to get ib !
[  125.430578] radeon 0000:01:00.0: sa_manager is not empty, clearing anyway
[  125.640276] radeon 0000:01:00.0: Saved 96301 dwords of commands on ring 0.
[  125.640416] radeon 0000:01:00.0: GPU softreset: 0x000000CD
[  125.640418] radeon 0000:01:00.0:   GRBM_STATUS               = 0xE5D04028
[  125.640420] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xEE400000
[  125.640423] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0xEE400000
[  125.640425] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200046C0
[  125.640535] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  125.640538] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  125.640540] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00018000
[  125.640542] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00408002
[  125.640544] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x84038643
[  125.640546] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x60C83146
[  125.640548] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  125.640551] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00000000
[  125.640553] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x00000000
[  125.645027] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
[  125.645080] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00108100
[  125.646226] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003028
[  125.646228] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000006
[  125.646230] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000006
[  125.646232] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200008C0
[  125.646343] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  125.646345] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  125.646347] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  125.646349] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  125.646351] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[  125.646353] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  125.646355] radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
[  125.646479] radeon 0000:01:00.0: GPU reset succeeded, trying to resume

Although the last log stated the GPU reset was successful, the system never
recovered from the crash.

Between those two mesa commits, based on the kernel log output, I only see a
possible culprit, which could be     a81beee37e0dd7b75422448420e8e8b0b4b76c1e.

My PC specs are as follows (taken from steam's system info):

Información sobre el procesador:
    Fabricante:  AuthenticAMD
    CPU Family:  0x15
    CPU Model:  0x1
    CPU Stepping:  0x2
    CPU Type:  0x0
    Velocidad: 3600 Mhz
    Procesadores lógicos 8
    Procesadores físicos 8
    HyperThreading:  No compatible
    FCMOV:  Compatible
    SSE2:  Compatible
    SSE3:  Compatible
    SSSE3:  Compatible
    SSE4a:  Compatible
    SSE41:  Compatible
    SSE42:  Compatible

Información sobre la red:
    Velocidad de la red:  

Versión del sistema operativo:
    Ubuntu 13.04 (64 bits)
    Nombre de kernel: Linux
    Versión de kernel: 3.11.0-031100-generic
    Editor de X Server: The X.Org Foundation
    Versión de X Server: 11303000
    Gestor X Window: KWin
    Versión del runtime de Steam: steam-runtime-release_2013-09-05

Tarjeta de vídeo:
    Controlador:  X.Org Gallium 0.4 on AMD PITCAIRN

    Versión de controlador: 2.1 Mesa 9.3.0-devel (git-505fad0 raring-oibaf-ppa)
    OpenGL Version: 2.1
    Densidad de color del escritorio: 24 bits por píxel
    Frecuencia de actualización del monitor: 60 Hz
    Identificador del fabricante: 0x1002
    Identificador del dispositivo: 0x6818
    Número de monitores: 1
    Número de tarjetas de vídeo lógicas: 1
    Resolución de pantalla principal: 1920 x 1080
    Resolución de escritorio: 1920 x 1080
    Tamaño de pantalla principal: 18,78" x 10,55"  (21,54" diag)
                                            47,7cm x 26,8cm  (54,7cm diag)
    No se ha detectado la memoria VRAM principal

Tarjeta de sonido:
    Dispositivo de sonido: Realtek ALC889

Memoria:
    RAM:  15993 Mb

Varios:
    Idioma de la IU:  Español
    LANG:  es_ES.UTF-8
    Micrófono:  Not set
    Espacio total en disco disponible: 469324 MB
    Bloque libre más grande en el disco: 187784 MB

Software Instalado:

Informes de fallos recientes:

The GPU is a Radeon HD 7870. VRAM is 2 GB. llvm's version is 3.3-5ubuntu1~r~gd,
and libdrm is at version 2.4.46+git1309121700.b6da44, both installed from
oibaf's ppa.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20130913/ebc7ca5e/attachment-0001.html>


More information about the dri-devel mailing list