[Bug 108781] 4.19 Regression - Hawaii (R9 390) boot failure - Invalid PCC GPIO / invalid powerlevel state / Fatal error during GPU init

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Nov 17 21:25:58 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=108781

            Bug ID: 108781
           Summary: 4.19 Regression - Hawaii (R9 390) boot failure -
                    Invalid PCC GPIO / invalid powerlevel state / Fatal
                    error during GPU init
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: critical
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel at lists.freedesktop.org
          Reporter: jamespharvey20 at gmail.com

Created attachment 142499
  --> https://bugs.freedesktop.org/attachment.cgi?id=142499&action=edit
dmesg (journalctl) of failure on 4.19.2.arch1-1

arch 4.18.16.arch1-1 works, using kernel parameters:

 radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.dpm=1 amdgpu.dc=1

Upgraded to 4.19.2.arch1-1, and started getting this failure.  Going back to
4.19.arch1-1 still gives this failure.

Full dmesg (journalctl) attached for 4.19.2.arch1-1 (failing), 4.19.arch1-1
(failing), and 4.18.16.arch1-1 (working).  But pertinent part of failure is
below for search.

This failure occurs booting to a tty, so no X logs are involved.  (You might
see on 4.18.16.arch1-1, there is a [drm:generic_reg_wait [amdgpu]] error and
backtrace which has been happening forever, but it works and doesn't cause a
noticeable problem.)

-----

# lspci -v
...
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Hawaii PRO [Radeon R9 290/390] (rev 80) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. Hawaii PRO [Radeon R9 290/390]
        Flags: bus master, fast devsel, latency 0, IRQ 75, NUMA node 0
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=8M]
        I/O ports at 8000 [size=256]
        Memory at dfe00000 (32-bit, non-prefetchable) [size=256K]
        Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010
<?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [200] Resizable BAR <?>
        Capabilities: [270] Secondary PCI Express <?>
        Capabilities: [2b0] Address Translation Service (ATS)
        Capabilities: [2c0] Page Request Interface (PRI)
        Capabilities: [2d0] Process Address Space ID (PASID)
        Kernel driver in use: amdgpu
        Kernel modules: radeon, amdgpu

-----

[drm] Invalid PCC GPIO: 13!
        ui class: none
        internal class: boot
        caps:
        uvd    vclk: 0 dclk: 0
                power level 0    sclk: 30000 mclk: 15000 pcie gen: 3 pcie
lanes: 16                                           
        status: c r b
        ui class: performance
        internal class: none
        caps:
        uvd    vclk: 0 dclk: 0
                power level 0    sclk: 30000 mclk: 15000 pcie gen: 3 pcie
lanes: 16                                           
                power level 1    sclk: 105000 mclk: 150000 pcie gen: 3 pcie
lanes: 16                                         
        status:
[drm] amdgpu: dpm initialized
[drm] Found UVD firmware Version: 1.64 Family ID: 9                             
[drm] Found VCE firmware Version: 50.10 Binary ID: 2                            
[drm] PCIE gen 3 link speeds already enabled
[drm:dm_pp_get_static_clocks [amdgpu]] *ERROR* DM_PPLIB: invalid powerlevel
state: 0!                                         
[drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS
with error code 4!                             
[drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS
with error code 4!                             
[drm] Display Core initialized with v3.1.59!
[drm] DM_MST: Differing MST start on aconnector: 00000000d3bd29d7 [id: 55]      
[drm] DM_MST: Differing MST start on aconnector: 000000004b0d56b6 [id: 57]      
[drm] DM_MST: Differing MST start on aconnector: 0000000058d5a853 [id: 59]      
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).                     
[drm] Driver supports precise vblank timestamp query.                           
[drm] UVD initialized successfully.
[drm:amdgpu_vce_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 12 test failed    
[drm:amdgpu_device_init.cold.14 [amdgpu]] *ERROR* hw_init of IP block
<vce_v2_0> failed -110                                  
amdgpu 0000:03:00.0: amdgpu_device_ip_init failed                               
amdgpu 0000:03:00.0: Fatal error during GPU init                                
[drm] amdgpu: finishing device.
------------[ cut here ]------------
Memory manager not clean during takedown.
WARNING: CPU: 0 PID: 670 at drivers/gpu/drm/drm_mm.c:950
drm_mm_takedown+0x1f/0x30 [drm]                                      
Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) intel_rapl sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i>
 x_tables sr_mod cdrom btrfs xor sd_mod dm_thin_pool dm_persistent_data
raid6_pq dm_bio_prison dm_bufio libcrc32c crc32c_gener>
CPU: 0 PID: 670 Comm: kworker/0:4 Not tainted 4.19.0-arch1-1-ARCH #1            
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C602, BIOS
P1.90 04/12/2018                                   
Workqueue: events work_for_cpu_fn
RIP: 0010:drm_mm_takedown+0x1f/0x30 [drm]
Code: 0d d0 cb 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 47 38 48 83 c7 38
48 39 c7 75 01 c3 48 c7 c7 08 b1 1b c1 e8 5b 10 >
RSP: 0018:ffff91764827bd08 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8e5a1b613200 RCX: 0000000000000000
RDX: 0000000000000007 RSI: ffffffff8de9d696 RDI: 00000000ffffffff
RBP: ffff8e5a0ca729a0 R08: 0000000000000001 R09: 00000000000005aa
R10: 0000000000000004 R11: 0000000000000000 R12: ffff8e5a1b6132e8
R13: 0000000000000000 R14: 0000000000000170 R15: ffff8e5a0c69e650
FS:  0000000000000000(0000) GS:ffff8e5a1f800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4f26530480 CR3: 00000001f0a0a006 CR4: 00000000000606f0
Call Trace:
 amdgpu_vram_mgr_fini+0x27/0x50 [amdgpu]
 ttm_bo_clean_mm+0xa9/0xb0 [ttm]
 amdgpu_ttm_fini+0x71/0x100 [amdgpu]
 amdgpu_bo_fini+0xe/0x30 [amdgpu]
 gmc_v7_0_sw_fini+0x32/0x60 [amdgpu]
 amdgpu_device_fini+0x2cc/0x4aa [amdgpu]
 amdgpu_driver_unload_kms+0x42/0x90 [amdgpu]
 amdgpu_driver_load_kms+0x168/0x2c0 [amdgpu]
 drm_dev_register+0x109/0x140 [drm]
 amdgpu_pci_probe+0x13c/0x1c0 [amdgpu]
 ? _raw_spin_unlock_irqrestore+0x20/0x40
 local_pci_probe+0x41/0x90
 work_for_cpu_fn+0x16/0x20
 process_one_work+0x1eb/0x410
 worker_thread+0x218/0x3d0
 ? process_one_work+0x410/0x410
 kthread+0x112/0x130
 ? kthread_park+0x80/0x80
 ret_from_fork+0x35/0x40
---[ end trace 3cf1bcf02bf4fe1a ]---

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20181117/252c5993/attachment-0001.html>


More information about the dri-devel mailing list