[Bug 99264] Deterministic crash on RX460 "NULL pointer dereference"

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Jan 3 20:44:15 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=99264

            Bug ID: 99264
           Summary: Deterministic crash on RX460 "NULL pointer
                    dereference"
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel at lists.freedesktop.org
          Reporter: daniel.mantione at freepascal.org

Created attachment 128734
  --> https://bugs.freedesktop.org/attachment.cgi?id=128734&action=edit
Complete syslog data from boot to crash

Hello,

I am in the process of migrating from a Radeon HD6670 to an RX 460 for quite a
few months now. I regularily fit the RX460, but keep running into issues,
crashes and others, that force me to install the HD6670 again if I need my
computer for serious work or even a more stable gaming situation. However, I am
making progress identifying issues, and it looks like there are 3 different
causes for crashes. One of them I can now reproduce very easy, and smells like
a real driver bug, so therefore I would like to report it.

My hardware is as follows:
 Xeon E5-2650v2 CPU (once it was an Opteron, but you stopped making new ones :(
)
 Supermicro X9SRE-3F mainboard
 32GB RAM
 HIS Radeon RX 460 2GB
 3 * HP LP2065 1600x1200 monitor
  - One connected via active DP to DVI converter
  - One connected via DVI
  - One connected via HDMI to DVI cable

My software configuration is as follows:
 OpenSuSE 13.1 with the following modifications:
  - Amd-staging-4.7 kernel as of 21 december 2016 (compiled it myself)
      (DAL is needed to use all my 3 monitors)
  - Xorg upgraded to 7.6  (via
http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_13.1/ )
  - Mesa upgraded to 13.0.1 (via
http://download.opensuse.org/repositories/X11:/XOrg/openSUSE_13.1/ )

How to reproduce?

Using the game The Great Whale Road, I have captured the OpenGL command stream
with Apitrace:

http://apitrace.github.io/

I have uploaded it here, be warned that this is a 770MB download:

http://www.freepascal.org/~daniel/greatwhaleroad.trace.bz2

Then:

bunzip2 greatwhaleroad.trace.bz2
apitrace replay greatwhaleroad.trace

At the end of the replay, all monitors lose signal and go black. Because my
mainboard has a small Aspeed onboard VGA controller, I can switch my monitor
input to that VGA controller, login to the Linux VGA text console and recover
some information. In dmesg the stack trace below is visible. You can also see
that the X server and game processes are still running, but hanging inside the
kernel, so they cannot be killed.

Best regards,

Daniƫl Mantione

[ 1631.286172] BUG: unable to handle kernel NULL pointer dereference at
0000000000000030
[ 1631.333419] IP: [<ffffffffa08e3e7a>] amdgpu_gtt_mgr_alloc+0x2a/0x150
[amdgpu]
[ 1631.367823] PGD b9adf067 PUD b8ef5067 PMD 0
[ 1631.402734] Oops: 0000 [#1] SMP
[ 1631.436707] Modules linked in: ppdev parport zram lz4_compress
lz4_decompress fuse af_packet k8temp hwmon_vid sr_mod cdrom amdkfd amd_iommu_v2
amdgpu x86_pkg_temp_thermal
intel_powerclamp coretemp snd_seq_dummy snd_seq_oss snd_emu10k1_synth
snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_seq_midi
snd_seq_midi_event kvm_intel
snd_hda_codec_hdmi snd_emu10k1 snd_hda_intel kvm snd_hda_codec snd_rawmidi
snd_hda_core snd_ac97_codec ipmi_ssif ac97_bus snd_pcm_oss snd_pcm irqbypass
crct10dif_pclmul isci
crc32_pclmul crc32c_intel ttm ghash_clmulni_intel snd_util_mem snd_hwdep
snd_seq drbg iTCO_wdt iTCO_vendor_support igb ansi_cprng drm_kms_helper libsas
aesni_intel
snd_seq_device snd_timer ablk_helper cryptd lrw snd_mixer_oss gf128mul mei_me
ptp glue_helper drm snd emu10k1_gp usb_storage mei scsi_transport_sas
[ 1631.592832]  aes_x86_64 pps_core joydev md_mod gameport ioatdma backlight
soundcore fb_sys_fops pcspkr serio_raw shpchp sysimgblt i2c_i801 sysfillrect
syscopyarea lpc_ich
dca i2c_algo_bit mfd_core wmi ipmi_si ipmi_msghandler button binfmt_misc sg
dm_mod autofs4 ext4 mbcache jbd2 crc16 hid_generic usbhid ehci_pci ehci_hcd
usbcore sd_mod
usb_common xenbus_probe_frontend reiserfs fan thermal ahci libahci libata
scsi_mod [last unloaded: parport_pc]
[ 1631.721604] CPU: 0 PID: 3948 Comm: glretrace Not tainted 4.7.0-2-default+ #1
[ 1631.764334] Hardware name: Supermicro
X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2a
08/31/2015
[ 1631.785241] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
last signaled seq=10998, last emitted seq=10998
[ 1631.785245] [drm] IP block:tonga_ih is hung!
[ 1631.785518] [drm] Atomic commit: RESET. crtc id 0:[ffff880816a06000]
[ 1631.785540] [drm] Atomic commit: RESET. crtc id 1:[ffff880817700000]
[ 1631.785559] [drm] Atomic commit: RESET. crtc id 2:[ffff88081c3e4000]
[ 1631.785578] [drm] dc_commit_targets: 0 targets
[ 1632.074160] task: ffff88080564ccc0 ti: ffff8808057c0000 task.ti:
ffff8808057c0000
[ 1632.119037] RIP: 0010:[<ffffffffa08e3e7a>]  [<ffffffffa08e3e7a>]
amdgpu_gtt_mgr_alloc+0x2a/0x150 [amdgpu]
[ 1632.164304] RSP: 0018:ffff8808057c3a10  EFLAGS: 00010282
[ 1632.210314] RAX: ffff880808fe1970 RBX: ffff880818f1f890 RCX:
7fffffffffffffff
[ 1632.256563] RDX: 0000000000000000 RSI: ffff880818f1f858 RDI:
ffff880808fe1970
[ 1632.302370] RBP: ffff8808057c3a70 R08: 0000000000000001 R09:
ffff8806ccb7b928
[ 1632.348805] R10: ffff880811ebe540 R11: 0000000000000287 R12:
0000000000000000
[ 1632.394486] R13: ffff880818f1f890 R14: ffff880818f1f800 R15:
ffff8807bfaf4d80
[ 1632.439637] FS:  00007fed53c56700(0000) GS:ffff88081f200000(0000)
knlGS:0000000000000000
[ 1632.485411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1632.531172] CR2: 0000000000000030 CR3: 00000000355cf000 CR4:
00000000001406f0
[ 1632.577066] Stack:
[ 1632.622940]  ffff8806ccb7b000 0000000000000680 0000000000000246
ffff8808057c3a50
[ 1632.669400]  ffffffff810a1799 ffff880808fe8608 ffff880808fe8608
ffff8807de2293c0
[ 1632.719718]  ffff8807de229470 ffff880818f1f890 ffff880818f1f800
ffff8807bfaf4d80
[ 1632.769794] Call Trace:
[ 1632.824132]  [<ffffffff810a1799>] ? __might_sleep+0x49/0x80
[ 1632.879008]  [<ffffffffa08c7afb>] amdgpu_ttm_bind+0x5b/0x150 [amdgpu]
[ 1632.932452]  [<ffffffffa08df45d>] amdgpu_vm_update_page_directory+0x7d/0x480
[amdgpu]
[ 1632.978788]  [<ffffffff811b019b>] ? krealloc+0x2b/0xa0
[ 1633.025524]  [<ffffffffa040ff54>] ? ttm_eu_reserve_buffers+0x184/0x330 [ttm]
[ 1633.072382]  [<ffffffffa08ce70b>] amdgpu_gem_va_update_vm+0x13b/0x180
[amdgpu]
[ 1633.119681]  [<ffffffffa0409c99>] ? ttm_bo_add_to_lru+0x89/0xe0 [ttm]
[ 1633.167081]  [<ffffffffa08cf7af>] amdgpu_gem_va_ioctl+0x1df/0x2a0 [amdgpu]
[ 1633.215410]  [<ffffffff810a1799>] ? __might_sleep+0x49/0x80
[ 1633.262823]  [<ffffffffa050062d>] drm_ioctl+0x25d/0x510 [drm]
[ 1633.310491]  [<ffffffff8122ea93>] ? touch_atime+0x23/0xa0
[ 1633.358466]  [<ffffffffa08cf5d0>] ? amdgpu_gem_metadata_ioctl+0x1f0/0x1f0
[amdgpu]
[ 1633.406696]  [<ffffffffa08b504b>] amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
[ 1633.454146]  [<ffffffff81224896>] do_vfs_ioctl+0x96/0x690
[ 1633.501757]  [<ffffffff81003246>] ? do_audit_syscall_entry+0x66/0x70
[ 1633.549670]  [<ffffffff81003729>] ? syscall_trace_enter_phase1+0xf9/0x110
[ 1633.597796]  [<ffffffff81224f09>] SyS_ioctl+0x79/0x90
[ 1633.645903]  [<ffffffff81003a79>] do_syscall_64+0x69/0x110
[ 1633.694445]  [<ffffffff81600925>] entry_SYSCALL64_slow_path+0x25/0x25
[ 1633.742299] Code: 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53
48 89 cb 48 83 ec 38 4c 8b 21 48 b9 ff ff ff ff ff ff ff 7f 4c 8b 57 30 <49> 39
4c 24 30 74 11
31 c0 48 83 c4 38 5b 41 5c 41 5d 41 5e 41
[ 1633.842789] RIP  [<ffffffffa08e3e7a>] amdgpu_gtt_mgr_alloc+0x2a/0x150
[amdgpu]
[ 1633.894212]  RSP <ffff8808057c3a10>
[ 1633.944739] CR2: 0000000000000030
[ 1633.994974] ---[ end trace d06de6dc7a13ea3e ]---
[ 1634.047545] [drm] dc_link_handle_hpd_rx_irq: Got short pulse HPD on link 0
[ 1634.047549] amdgpu 0000:04:00.0: SRBM_SOFT_RESET=0x00000400
[ 1635.184392] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.184446] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_set: called for non-implemented irq source
[ 1635.184501] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.184543] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_set: called for non-implemented irq source
[ 1635.184585] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.184623] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_set: called for non-implemented irq source
[ 1635.184673] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR*
amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :4
[ 1635.184717] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR*
amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :5
[ 1635.184758] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR*
amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :6
[ 1635.184799] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR*
amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :7
[ 1635.184838] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR*
amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :8
[ 1635.184879] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR*
amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :9
[ 1635.184917] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR*
amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :10
[ 1635.184955] [drm:amdgpu_dm_set_crtc_irq_state [amdgpu]] *ERROR*
amdgpu_dm_set_crtc_irq_state: crtc is NULL at id :11
[ 1635.184998] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.185038] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_set: called for non-implemented irq source
[ 1635.185078] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.185116] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_set: called for non-implemented irq source
[ 1635.185158] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_ack: called for non-implemented irq source
[ 1635.185195] [drm:log_to_debug_console [amdgpu]] *ERROR*
dal_irq_service_dummy_set: called for non-implemented irq source

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20170103/e3c98d96/attachment-0001.html>


More information about the dri-devel mailing list