Kernel crash/Null pointer dereference on vblank

Martin Babutzka martin.babutzka at online.de
Sun Nov 19 21:54:15 UTC 2017


Dear AMD Developers,

At first congratulations for the DC code submission to the 4.15 kernel.
Unfortunately the major regression which I reported on 29.09., 06.10.,
02.11. and 05.11. still exists. But this time I got additional
debugging information maybe this helps to fix it.

Summary: I am running Xubuntu 17.10 with the amd-staging-drm-next
kernel patched to 4.14.0. The latest build which I tested is from
includes all commits up to now (including 2017-11-17 19:51:57 (GMT)
commit	85d09ce5e5039644487e9508d6359f9f4cf64427).

Some vblank operations make the kernel crash and hang up the whole
system. The error is reproducible by enabling the screen lock or the
suspend mode. The system can not return to proper state from either of
these (after all I am not 100% sure it is the same error). Debugging is
 easier with screen lock. Attached you can find the kernel crash and
the dce110_vblank_set function modified by some kernel prints. It looks
like the function is called twice and does not work the second time.
The whole code around dce110_vblank_set also looks interrupt-ish -
could this be a race condition or timing problem? Objects being cleared
from memory and then accessed by dce110_vblank_set?

Bug reports on this issue:
https://github.com/M-Bab/linux-kernel-amdgpu-binaries/issues/37
https://github.com/M-Bab/linux-kernel-amdgpu-binaries/issues/29

Many regards,
Martin (M-bab)
-------------- next part --------------
bool dce110_vblank_set(
                struct irq_service *irq_service,
                const struct irq_source_info *info,
                bool enable)
{
printk(KERN_ALERT "DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__);
        struct dc_context *dc_ctx = irq_service->ctx;
printk(KERN_ALERT "DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__);
        struct dc *core_dc = irq_service->ctx->dc;
printk(KERN_ALERT "DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__);
        enum dc_irq_source dal_irq_src = dc_interrupt_to_irq_source(
                                                                                irq_service->ctx->dc,
                                                                                info->src_id,
                                                                                info->ext_id);
        uint8_t pipe_offset = dal_irq_src - IRQ_TYPE_VBLANK;
printk(KERN_ALERT "DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__);

        struct timing_generator *tg =
                        core_dc->current_state->res_ctx.pipe_ctx[pipe_offset].stream_res.tg;
printk(KERN_ALERT "DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__);

        if (enable) {
                if (!tg->funcs->arm_vert_intr(tg, 2)) {
                        DC_ERROR("Failed to get VBLANK!\n");
                        return false;
                }
        }
printk(KERN_ALERT "DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__);

        dal_irq_service_set_generic(irq_service, info, enable);
printk(KERN_ALERT "DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__);
        return true;

}


"normal" vblank during boot:
Nov 19 22:33:10 Main-PC kernel: [   17.605100] DEBUG: Passed dce110_vblank_set 208 
Nov 19 22:33:10 Main-PC kernel: [   17.605102] DEBUG: Passed dce110_vblank_set 210 
Nov 19 22:33:10 Main-PC kernel: [   17.605103] DEBUG: Passed dce110_vblank_set 212 
Nov 19 22:33:10 Main-PC kernel: [   17.605104] DEBUG: Passed dce110_vblank_set 218 
Nov 19 22:33:10 Main-PC kernel: [   17.605104] DEBUG: Passed dce110_vblank_set 222 
Nov 19 22:33:10 Main-PC kernel: [   17.605108] DEBUG: Passed dce110_vblank_set 230 
Nov 19 22:33:10 Main-PC kernel: [   17.605110] DEBUG: Passed dce110_vblank_set 233 

vblank on screen lock in kernel.log/syslog:
Nov 19 22:34:10 Main-PC kernel: [   78.664890] DEBUG: Passed dce110_vblank_set 208 
Nov 19 22:34:10 Main-PC kernel: [   78.664892] DEBUG: Passed dce110_vblank_set 210 
Nov 19 22:34:10 Main-PC kernel: [   78.664893] DEBUG: Passed dce110_vblank_set 212 
Nov 19 22:34:10 Main-PC kernel: [   78.664894] DEBUG: Passed dce110_vblank_set 218 
Nov 19 22:34:10 Main-PC kernel: [   78.664894] DEBUG: Passed dce110_vblank_set 222 
Nov 19 22:34:10 Main-PC kernel: [   78.664895] DEBUG: Passed dce110_vblank_set 230 
Nov 19 22:34:10 Main-PC kernel: [   78.664896] DEBUG: Passed dce110_vblank_set 233 
Nov 19 22:34:27 Main-PC kernel: [   96.113426] DEBUG: Passed dce110_vblank_set 208 
Nov 19 22:34:27 Main-PC kernel: [   96.113433] DEBUG: Passed dce110_vblank_set 210 
Nov 19 22:34:27 Main-PC kernel: [   96.113435] DEBUG: Passed dce110_vblank_set 212 
Nov 19 22:34:27 Main-PC kernel: [   96.113438] DEBUG: Passed dce110_vblank_set 218 
Nov 19 22:34:27 Main-PC kernel: [   96.113440] DEBUG: Passed dce110_vblank_set 222 
Nov 19 22:34:27 Main-PC kernel: [   96.113448] BUG: unable to handle kernel NULL pointer dereference at           (null)
Nov 19 22:34:27 Main-PC kernel: [   96.113521] IP: dce110_vblank_set+0xe2/0x160 [amdgpu]
Nov 19 22:34:27 Main-PC kernel: [   96.113524] PGD 0 P4D 0 
Nov 19 22:34:27 Main-PC kernel: [   96.113531] Oops: 0000 [#1] SMP
Nov 19 22:34:27 Main-PC kernel: [   96.113535] Modules linked in: rfcomm bnep binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_pcm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_seq_midi pcbc dm_crypt snd_seq_midi_event aesni_intel snd_rawmidi aes_x86_64 crypto_simd glue_helper snd_seq cryptd snd_seq_device snd_timer intel_cstate intel_rapl_perf snd btusb serio_raw joydev input_leds soundcore btrtl hci_uart mei_me shpchp btbcm mei serdev btqca btintel bluetooth ecdh_generic intel_lpss_acpi intel_lpss acpi_als mac_hid kfifo_buf acpi_pad tpm_infineon industrialio parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic uas usb_storage usbhid amdkfd amd_iommu_v2
Nov 19 22:34:27 Main-PC kernel: [   96.113614]  amdgpu chash i2c_algo_bit ttm drm_kms_helper e1000e syscopyarea sysfillrect sysimgblt fb_sys_fops ptp r8169 pps_core drm ahci mii libahci wmi pinctrl_sunrisepoint video i2c_hid pinctrl_intel hid
Nov 19 22:34:27 Main-PC kernel: [   96.113643] CPU: 2 PID: 1462 Comm: xfwm4 Not tainted 4.14.0+ #3
Nov 19 22:34:27 Main-PC kernel: [   96.113645] Hardware name: Gigabyte Technology Co., Ltd. B250-HD3P/B250-HD3P-CF, BIOS F3 12/07/2016
Nov 19 22:34:27 Main-PC kernel: [   96.113649] task: ffff998d53040000 task.stack: ffffa59103150000
Nov 19 22:34:27 Main-PC kernel: [   96.113710] RIP: 0010:dce110_vblank_set+0xe2/0x160 [amdgpu]
Nov 19 22:34:27 Main-PC kernel: [   96.113713] RSP: 0018:ffffa59103153b28 EFLAGS: 00010002
Nov 19 22:34:27 Main-PC kernel: [   96.113717] RAX: 0000000000000024 RBX: ffff998d5c3d4300 RCX: 0000000000000006
Nov 19 22:34:27 Main-PC kernel: [   96.113720] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff998d6ec8dc90
Nov 19 22:34:27 Main-PC kernel: [   96.113723] RBP: ffffa59103153b58 R08: 0000000000000000 R09: 00000000000003ff
Nov 19 22:34:27 Main-PC kernel: [   96.113726] R10: 00007ffebd2bebc0 R11: ffffffffa354feed R12: ffffffffc052b3e0
Nov 19 22:34:27 Main-PC kernel: [   96.113728] R13: 0000000000000001 R14: ffff998d51695100 R15: 0000000000000000
Nov 19 22:34:27 Main-PC kernel: [   96.113732] FS:  00007f4e2f002a80(0000) GS:ffff998d6ec80000(0000) knlGS:0000000000000000
Nov 19 22:34:27 Main-PC kernel: [   96.113735] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 19 22:34:27 Main-PC kernel: [   96.113738] CR2: 0000000000000000 CR3: 00000004181e5001 CR4: 00000000003606e0
Nov 19 22:34:27 Main-PC kernel: [   96.113741] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 19 22:34:27 Main-PC kernel: [   96.113744] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 19 22:34:27 Main-PC kernel: [   96.113746] Call Trace:
Nov 19 22:34:27 Main-PC kernel: [   96.113807]  dal_irq_service_set+0x49/0x90 [amdgpu]
Nov 19 22:34:27 Main-PC kernel: [   96.113863]  dc_interrupt_set+0x24/0x30 [amdgpu]
Nov 19 22:34:27 Main-PC kernel: [   96.113933]  amdgpu_dm_set_crtc_irq_state+0x35/0x60 [amdgpu]
Nov 19 22:34:27 Main-PC kernel: [   96.113989]  amdgpu_irq_update+0x58/0xa0 [amdgpu]
Nov 19 22:34:27 Main-PC kernel: [   96.114041]  amdgpu_irq_get+0x49/0x60 [amdgpu]
Nov 19 22:34:27 Main-PC kernel: [   96.114076]  amdgpu_enable_vblank_kms+0x27/0x30 [amdgpu]
Nov 19 22:34:27 Main-PC kernel: [   96.114091]  drm_vblank_enable+0x84/0x100 [drm]
Nov 19 22:34:27 Main-PC kernel: [   96.114104]  drm_vblank_get+0x92/0xb0 [drm]
Nov 19 22:34:27 Main-PC kernel: [   96.114116]  drm_wait_vblank_ioctl+0xb4/0x580 [drm]
Nov 19 22:34:27 Main-PC kernel: [   96.114123]  ? unix_stream_recvmsg+0x51/0x70
Nov 19 22:34:27 Main-PC kernel: [   96.114127]  ? __unix_insert_socket+0x40/0x40
Nov 19 22:34:27 Main-PC kernel: [   96.114140]  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Nov 19 22:34:27 Main-PC kernel: [   96.114152]  drm_ioctl_kernel+0x5d/0xb0 [drm]
Nov 19 22:34:27 Main-PC kernel: [   96.114163]  drm_ioctl+0x31b/0x3d0 [drm]
Nov 19 22:34:27 Main-PC kernel: [   96.114174]  ? drm_legacy_modeset_ctl_ioctl+0x100/0x100 [drm]
Nov 19 22:34:27 Main-PC kernel: [   96.114180]  ? do_iter_write+0xe1/0x190
Nov 19 22:34:27 Main-PC kernel: [   96.114215]  amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
Nov 19 22:34:27 Main-PC kernel: [   96.114222]  do_vfs_ioctl+0xa5/0x610
Nov 19 22:34:27 Main-PC kernel: [   96.114227]  ? __sys_recvmsg+0x51/0x90
Nov 19 22:34:27 Main-PC kernel: [   96.114231]  ? __sys_recvmsg+0x51/0x90
Nov 19 22:34:27 Main-PC kernel: [   96.114237]  SyS_ioctl+0x79/0x90
Nov 19 22:34:27 Main-PC kernel: [   96.114243]  entry_SYSCALL_64_fastpath+0x1e/0xa9
Nov 19 22:34:27 Main-PC kernel: [   96.114247] RIP: 0033:0x7f4e2b64dea7
Nov 19 22:34:27 Main-PC kernel: [   96.114250] RSP: 002b:00007ffebd2bec08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Nov 19 22:34:27 Main-PC kernel: [   96.114254] RAX: ffffffffffffffda RBX: 0000562e1f5938c0 RCX: 00007f4e2b64dea7
Nov 19 22:34:27 Main-PC kernel: [   96.114257] RDX: 00007ffebd2bec80 RSI: 00000000c018643a RDI: 0000000000000006
Nov 19 22:34:27 Main-PC kernel: [   96.114259] RBP: 0000562e1f620ce0 R08: 00000000006001e5 R09: 0000000000000000
Nov 19 22:34:27 Main-PC kernel: [   96.114262] R10: 00007ffebd2bebc0 R11: 0000000000000246 R12: 0000000000000000
Nov 19 22:34:27 Main-PC kernel: [   96.114264] R13: 0000000000000007 R14: 0000000000000007 R15: 0000562e1f5938c0
Nov 19 22:34:27 Main-PC kernel: [   96.114268] Code: 48 89 d0 48 c1 e0 05 48 01 d0 ba de 00 00 00 48 c1 e0 05 49 03 87 30 01 00 00 4c 8b b8 78 02 00 00 e8 c4 c2 04 e2 45 84 ed 74 38 <49> 8b 07 be 02 00 00 00 4c 89 ff ff 90 e0 00 00 00 84 c0 75 23 
Nov 19 22:34:27 Main-PC kernel: [   96.114392] RIP: dce110_vblank_set+0xe2/0x160 [amdgpu] RSP: ffffa59103153b28
Nov 19 22:34:27 Main-PC kernel: [   96.114394] CR2: 0000000000000000
Nov 19 22:34:27 Main-PC kernel: [   96.114399] ---[ end trace 4160248d2f91cb42 ]---



More information about the amd-gfx mailing list