[Bug 109181] Mesa git causes AMDGPU hang, Tonga Firepro chip W7170M MXM

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sun Dec 30 08:48:27 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=109181

            Bug ID: 109181
           Summary: Mesa git causes AMDGPU hang, Tonga Firepro chip W7170M
                    MXM
           Product: Mesa
           Version: git
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: Drivers/Gallium/radeonsi
          Assignee: dri-devel at lists.freedesktop.org
          Reporter: Babblebones at gmail.com
        QA Contact: dri-devel at lists.freedesktop.org

Hello,

I've run into a bug wherein AMDGPU hangs as mesa 19 does something it doesn't
like in particular applications. Mesa 18.3 is totally fine, as per padoka
stable ppa. Padoka unstable and Oibaf both crash.
OpenGL like team fortress 2 seem to be fine but Vulkan (DXVK) wrapped
applications  will just bury the whole GPU.
Even steam itself seems to hard lock when I start it which I can sidesetep by
allowing GPU recovery with the kernel parameter and proceed. It does not
recover when I start something vulkan and graphically intensive from steam
itself.



Below is my dmesg from the card. This may be an issue with mesa or it may be
AMDGPU, I am very curious as to which as this has affected me for about a month
now across both Arch Linux and my new Ubuntu install, making mesa git unusable
on my new card.

To make matters worse there is an issue wherein the EDID is messed up on boot
with amdgpu.dc=1, worth mentioning if it's part of a deeper issue in AMDGPU.




[   55.671991] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=1614, emitted seq=1616
[   55.671996] amdgpu 0000:01:00.0: GPU reset begin!
[   55.678313] amdgpu 0000:01:00.0: GPU pci config reset
[   55.682867] amdgpu 0000:01:00.0: GPU reset succeeded, trying to resume
[   55.683678] [drm] PCIE GART of 1024M enabled (table at 0x000000F4007E9000).
[   55.687097] amdgpu: [powerplay] dpm has been enabled
[   55.741072] [drm] UVD initialized successfully.
[   55.950146] [drm] VCE initialized successfully.
[   55.952966] [drm] recover vram bo from shadow start
[   55.955698] [drm] recover vram bo from shadow done
[   55.955746] WARNING: CPU: 5 PID: 120 at
/build/linux-liquorix-eJ9K8E/linux-liquorix-4.19/include/linux/dma-fence.h:503
drm_sched_job_recovery+0x1db/0x1e0 [gpu_sched]
[   55.955747] Modules linked in: rfcomm fuse ccm ext4 jbd2 fscrypto af_packet
cmac bnep uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_common btusb btrtl btbcm btintel videodev bluetooth media
ecdh_generic crc16 nls_utf8 nls_cp437 vfat fat ext2 mbcache squashfs loop
snd_hda_codec_idt snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel
snd_hda_codec snd_hda_core arc4 snd_hwdep intel_rapl snd_pcm
x86_pkg_temp_thermal intel_powerclamp snd_seq_dummy coretemp snd_seq_oss
snd_seq_midi kvm_intel snd_seq_midi_event ath9k ath9k_common snd_rawmidi
ath9k_hw kvm snd_seq ath irqbypass crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel snd_seq_device mac80211 snd_timer pcbc aesni_intel
aes_x86_64 crypto_simd cryptd glue_helper joydev input_leds snd cfg80211
tpm_infineon hp_wmi sparse_keymap
[   55.955775]  serio_raw wmi_bmof sg lpc_ich rfkill soundcore tpm_tis
tpm_tis_core tpm rng_core hp_accel lis3lv02d input_polldev evdev pcc_cpufreq
acpi_cpufreq battery ac hp_wireless sch_fq_codel parport_pc ppdev lp parport
ip_tables x_tables ipv6 crc_ccitt autofs4 btrfs xor raid6_pq libcrc32c
crc32c_generic bcache crc64 sr_mod cdrom sd_mod hid_generic usbhid amdkfd
amd_iommu_v2 amdgpu chash gpu_sched ahci i2c_algo_bit libahci ttm sdhci_pci
libata cqhci drm_kms_helper sdhci ehci_pci crc32c_intel firewire_ohci xhci_pci
drm psmouse i2c_i801 firewire_core scsi_mod mmc_core crc_itu_t e1000e ehci_hcd
i2c_core xhci_hcd thermal wmi rtc_cmos video button
[   55.955806] CPU: 5 PID: 120 Comm: kworker/5:1 Not tainted
4.19.0-13.1-liquorix-amd64 #1 liquorix 4.19-8ubuntu1~bionic
[   55.955807] Hardware name: Hewlett-Packard /176C, BIOS 68IAV Ver. F.70
07/30/2018
[   55.955809] Workqueue: events drm_sched_job_timedout [gpu_sched]
[   55.955811] RIP: 0010:drm_sched_job_recovery+0x1db/0x1e0 [gpu_sched]
[   55.955812] Code: ff ff ff 48 8b 3c 24 48 83 c4 20 5b 5d 41 5c 41 5d 41 5e
41 5f e9 d5 ab 71 e1 4c 89 f6 4c 89 ff e8 5a fd ff ff e9 33 ff ff ff <0f> 0b eb
93 90 55 53 48 89 fb 48 8b 46 10 48 89 f7 48 8b 68 08 48
[   55.955813] RSP: 0018:ffffc900038e7de0 EFLAGS: 00210202
[   55.955814] RAX: 0000000000000523 RBX: ffff888811704df0 RCX:
0000000000000001
[   55.955814] RDX: ffff88878c5d8050 RSI: ffff888107c18c00 RDI:
0000000000200286
[   55.955815] RBP: ffff888811704d10 R08: 0000000000000000 R09:
0000000000000001
[   55.955816] R10: ffffc90003213dd0 R11: 0000000000000026 R12:
ffff88881491cb00
[   55.955816] R13: ffff888811704e28 R14: ffff88878c5d8000 R15:
ffff888811704c98
[   55.955817] FS:  0000000000000000(0000) GS:ffff88881db40000(0000)
knlGS:0000000000000000
[   55.955818] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   55.955819] CR2: 00000000c12ec008 CR3: 000000000320a003 CR4:
00000000001606e0
[   55.955819] Call Trace:
[   55.955855]  amdgpu_device_gpu_recover+0x3bd/0xa30 [amdgpu]
[   55.955860]  process_one_work+0x1f5/0x420
[   55.955862]  worker_thread+0x43/0x490
[   55.955864]  ? rescuer_thread+0x490/0x490
[   55.955865]  kthread+0x153/0x170
[   55.955866]  ? kthread_park+0x80/0x80
[   55.955869]  ret_from_fork+0x35/0x40
[   55.955870] ---[ end trace 2f9f5d70a335c56f ]---
[   55.955875] [drm] Skip scheduling IBs!
[   56.541150] amdgpu 0000:01:00.0: GPU reset(1) succeeded!

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20181230/e6e83d84/attachment.html>


More information about the dri-devel mailing list