<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body><span class="vcard"><a class="email" href="mailto:kugel@rockbox.org" title="Thomas Martitz <kugel@rockbox.org>"> <span class="fn">Thomas Martitz</span></a>
</span> changed
<a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED - [4.17-rc1] RIP: smu7_populate_single_firmware_entry.isra.6+0x57/0xc0 [amdgpu] RSP: ffffa17901efb930"
href="https://bugs.freedesktop.org/show_bug.cgi?id=105760">bug 105760</a>
<br>
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>What</th>
<th>Removed</th>
<th>Added</th>
</tr>
<tr>
<td style="text-align:right;">Attachment #140592 is obsolete</td>
<td>
</td>
<td>1
</td>
</tr></table>
<p>
<div>
<b><a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED - [4.17-rc1] RIP: smu7_populate_single_firmware_entry.isra.6+0x57/0xc0 [amdgpu] RSP: ffffa17901efb930"
href="https://bugs.freedesktop.org/show_bug.cgi?id=105760#c42">Comment # 42</a>
on <a class="bz_bug_link
bz_status_REOPENED "
title="REOPENED - [4.17-rc1] RIP: smu7_populate_single_firmware_entry.isra.6+0x57/0xc0 [amdgpu] RSP: ffffa17901efb930"
href="https://bugs.freedesktop.org/show_bug.cgi?id=105760">bug 105760</a>
from <span class="vcard"><a class="email" href="mailto:kugel@rockbox.org" title="Thomas Martitz <kugel@rockbox.org>"> <span class="fn">Thomas Martitz</span></a>
</span></b>
<pre>Created <span class=""><a href="attachment.cgi?id=140611" name="attach_140611" title="dmesg with 0001-workaround-v2.patch + 0001-drm-amdgpu-add-ATPX-quirk-for-a-polaris-12-laptop.patch">attachment 140611</a> <a href="attachment.cgi?id=140611&action=edit" title="dmesg with 0001-workaround-v2.patch + 0001-drm-amdgpu-add-ATPX-quirk-for-a-polaris-12-laptop.patch">[details]</a></span>
dmesg with 0001-workaround-v2.patch +
0001-drm-amdgpu-add-ATPX-quirk-for-a-polaris-12-laptop.patch
Sorry to say, but this patch makes things actually *worse*.
First, by accident, I added your latest patch on-top of my previous workaround
v2. This gives working suspend/resume but many more error messages in dmesg, in
particular a WARN() triggers:
[ 385.996911] Modules linked in: cmac rfcomm ccm arc4 snd_hda_codec_hdmi
snd_hda_codec_conexant snd_hda_codec_generic joydev intel_rapl mousedev
x86_pkg_temp_thermal intel_powerclamp bnep coretemp iwlmvm snd_soc_skl
snd_soc_skl_ipc hid_multitouch snd_soc_sst_ipc mac80211 snd_soc_sst_dsp
hid_generic kvm snd_hda_ext_core mei_wdt snd_soc_core nls_iso8859_1 i915
nls_cp437 iwlwifi btusb irqbypass btrtl vfat btbcm crct10dif_pclmul
snd_compress btintel iTCO_wdt crc32_pclmul iTCO_vendor_support snd_soc_acpi
ghash_clmulni_intel fat bluetooth pcbc intel_wmi_thunderbolt hp_wmi
sparse_keymap snd_hda_intel wmi_bmof snd_hda_codec cfg80211 crc16 snd_hwdep
ecdh_generic aesni_intel snd_hda_core aes_x86_64 crypto_simd snd_pcm cryptd
e1000e snd_timer glue_helper intel_cstate intel_uncore intel_rapl_perf uvcvideo
idma64
[ 385.996938] tpm_crb input_leds led_class videobuf2_vmalloc snd psmouse
videobuf2_memops mei_me i2c_i801 ptp mei videobuf2_v4l2 pps_core
processor_thermal_device ucsi_acpi i2c_hid videobuf2_common typec_ucsi
intel_lpss_pci soundcore typec rfkill intel_pch_thermal wmi intel_gtt
intel_lpss intel_soc_dts_iosf hid videodev tpm_tis tpm_tis_core int3403_thermal
int340x_thermal_zone rtc_cmos media evdev tpm ac int3400_thermal mac_hid
battery acpi_thermal_rel rng_core hp_wireless sg scsi_mod crypto_user ip_tables
x_tables btrfs libcrc32c crc32c_generic xor zstd_decompress zstd_compress
xxhash serio_raw raid6_pq atkbd libps2 xhci_pci xhci_hcd crc32c_intel usbcore
usb_common i8042 serio
[ 385.996964] CPU: 4 PID: 215 Comm: kworker/4:2 Tainted: G U W
4.18.0-rc3-custom+ #73
[ 385.996965] Hardware name: HP HP ZBook 14u G5/83B2, BIOS Q78 Ver. 01.00.05
01/25/2018
[ 385.996968] Workqueue: pm pm_runtime_work
[ 385.996970] RIP: 0010:generic_reg_wait+0xe7/0x160
[ 385.996970] Code: 44 24 58 8b 54 24 48 89 de 44 89 4c 24 08 48 8b 4c 24 50
48 c7 c7 f8 29 19 bd e8 c4 24 e5 ff 83 7d 20 01 44 8b 4c 24 08 74 02 <0f> 0b 48
83 c4 10 44 89 c8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 0f
[ 385.996989] RSP: 0018:ffffb45681fefbd8 EFLAGS: 00010297
[ 385.996990] RAX: 000000000000006b RBX: 000000000000000a RCX:
0000000000000001
[ 385.996991] RDX: 0000000080000001 RSI: ffffffffbd1151a6 RDI:
00000000ffffffff
[ 385.996991] RBP: ffff97f4e37e3240 R08: ffffffffbc499790 R09:
00000000ffffffff
[ 385.996992] R10: 0000000000000004 R11: ffffffffbdab8f2d R12:
0000000000000bb9
[ 385.996992] R13: 0000000000004ea4 R14: 0000000000010000 R15:
0000000000000000
[ 385.996993] FS: 0000000000000000(0000) GS:ffff97f4ef500000(0000)
knlGS:0000000000000000
[ 385.996994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 385.996994] CR2: 00007ff68c6ef000 CR3: 00000001ee40a002 CR4:
00000000003606e0
[ 385.996995] Call Trace:
[ 385.997000] dce110_stream_encoder_dp_blank+0x11c/0x180
[ 385.997002] power_down_all_hw_blocks+0x3d/0x1c0
[ 385.997003] dce110_power_down+0xe/0x20
[ 385.997005] dc_set_power_state+0x1b/0x70
[ 385.997007] dm_suspend+0x4a/0x60
[ 385.997009] amdgpu_device_ip_suspend+0xe4/0x170
[ 385.997011] amdgpu_device_suspend+0x251/0x3a0
[ 385.997013] amdgpu_pmops_runtime_suspend+0x44/0xb0
[ 385.997015] pci_pm_runtime_suspend+0x64/0x180
[ 385.997017] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 385.997019] vga_switcheroo_runtime_suspend+0x24/0xb0
[ 385.997020] __rpm_callback+0x75/0x1b0
[ 385.997022] ? __switch_to_asm+0x30/0x60
[ 385.997024] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 385.997025] rpm_callback+0x1f/0x70
[ 385.997026] ? vga_switcheroo_runtime_resume+0x60/0x60
[ 385.997028] rpm_suspend+0x12a/0x610
[ 385.997030] ? finish_task_switch+0x83/0x2e0
[ 385.997031] ? __switch_to_asm+0x24/0x60
[ 385.997032] pm_runtime_work+0x7d/0xa0
[ 385.997034] process_one_work+0x1eb/0x3c0
[ 385.997035] worker_thread+0x2d/0x3d0
[ 385.997037] ? process_one_work+0x3c0/0x3c0
[ 385.997038] kthread+0x112/0x130
[ 385.997039] ? kthread_flush_work_fn+0x10/0x10
[ 385.997041] ret_from_fork+0x35/0x40
[ 385.997043] ---[ end trace 04724a7f4f9fccf6 ]---
Then, there is new fatal error messages like this (the last line is new with
your patch):
[ 436.030371] amdgpu: [powerplay]
failed to send message 261 ret is 65535
[ 436.030394] amdgpu: [powerplay]
last message was failed ret is 65535
[ 436.030410] amdgpu: [powerplay]
failed to send message 261 ret is 65535
[ 436.030433] amdgpu: [powerplay]
last message was failed ret is 65535
[ 436.030448] amdgpu: [powerplay]
failed to send message 261 ret is 65535
[ 436.030471] amdgpu: [powerplay]
last message was failed ret is 65535
[ 436.030487] amdgpu: [powerplay]
failed to send message 261 ret is 65535
[ 436.145782] amdgpu 0000:01:00.0: GPU pci config reset
[ 437.106049] [drm:amdgpu_device_suspend] *ERROR* amdgpu asic reset failed
I'm also quite sure I haven't seen the following before:
[ 370.888835] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 0 test failed
(scratch(0xC040)=0xFFFFFFFF)
[ 370.888839] [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block
<gfx_v8_0> failed -22
[ 370.888841] [drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume
failed (-22).
Most importantly, my observation that reading toc->num_entries returns -1 is
still occuring:
[ 368.991914] amdgpu: [powerplay] smu7_request_smu_load_fw: 10
ffffb456a0081000 0 1
[ 368.991927] amdgpu: [powerplay] smu7_request_smu_load_fw: 20
ffffb456a0081000 -1 -1
Then, after I found my workaround is still aplied, I tried without.
Unfortunately, with just your patch I can't get behind the SDDM login screen.
The laptop freezes once the KDE session loads (I'm assuming starting X causes
the freeze).g</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>