Linux Mint 20.04 5.11 issue

Felix Kuehling felix.kuehling at amd.com
Thu Jul 29 15:14:10 UTC 2021


Am 2021-07-28 um 12:10 p.m. schrieb Tim Cahill:
> Hi Felix,

I'm not sure why you're calling me out by name. I'm not working on
anything obviously related to your crashes.

Anyway, I took a quick look at the backtraces. They all point at libgdk.
Two of them are segfaults, one is an abort. It's not clear how these
would be related to the GPU driver. That said, when you boot with
nomodeset, the GPU driver and all HW acceleration is completely
disabled. If that makes the problem disappear, the GPU driver is clearly
involved in the problem in some way.

The abort points at a problem while freeing memory. This could be caused
by a double-free problem in some unrelated code, possibly related to the
GPU driver. This would be a problem in a user mode component (maybe
Mesa), not the kernel mode driver.

I believe the messages you're seeing when you move the mouse are the
result of runtime power management that puts the GPU to sleep when it's
idle and reinitializes it when it's needed. You have 2 GPUs in your
laptop, an integrated Renoir GPU in the Ryzen CPU, and an external
Navi10 GPU for higher gaming performance. The GPU that goes to sleep and
wakes up is the external Navi10 GPU.

The OpenGL renderer string specifies "RENOIR". Therefore I'm surprised
that the Navi10 GPU wakes up when you move the mouse. Ideally it
shouldn't be used at all when you're just using the desktop.

If you suspect that runtime power management is responsible for your
problems, you could disable it with amdgpu.runpm=0 on the kernel command
line. That means the Navi10 GPU won't go into the low power mode and
drain your battery more quickly. So this is not a permanent solution.
Just an experiment to narrow down the problem.

Regards,
  Felix


>
> I'm not sure how to do this as I haven't had to report a bug before.
> I've looked to a variety of bug reporting sites to see if anyone else
> is running into the same issues that I'm having (such as the Mate
> project) and haven't seen anything at all similar to the issue I'm
> having. Since I had issues with AMD drivers with my distro (info
> below) and some consistent and high volume dmesg content shows up,
> I've decided that I should start here with the AMD kernel team.
>
> I have a fairly new MSI laptop with the following configuration:
>
> [code]
> System:    Kernel: 5.11.0-25-generic x86_64 bits: 64 compiler: N/A
> Desktop: MATE 1.24.0 wm: marco 
>            dm: LightDM Distro: Linux Mint 20.2 Uma base: Ubuntu 20.04
> focal 
> Machine:   Type: Laptop System: Micro-Star product: Alpha 17 A4DEK v:
> REV:1.0 serial: <filter> 
>            Chassis: type: 10 serial: <filter> 
>            Mobo: Micro-Star model: MS-17EK v: REV:1.0 serial: <filter>
> UEFI: American Megatrends 
>            v: E17EKAMS.101 date: 10/26/2020 
> Battery:   ID-1: BAT1 charge: 66.2 Wh condition: 67.0/65.7 Wh (102%)
> volts: 12.4/10.8 
>            model: MSI Corp. MS-17EK serial: N/A status: Unknown 
> CPU:       Topology: 8-Core model: AMD Ryzen 7 4800H with Radeon
> Graphics bits: 64 type: MT MCP 
>            arch: Zen rev: 1 L2 cache: 4096 KiB 
>            flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a
> ssse3 svm bogomips: 92630 
>            Speed: 4278 MHz min/max: 1400/2900 MHz Core speeds (MHz):
> 1: 4280 2: 1865 3: 1397 
>            4: 2188 5: 1489 6: 2265 7: 1907 8: 1906 9: 1729 10: 1397
> 11: 1397 12: 1397 13: 1397 
>            14: 1397 15: 1907 16: 1740 
> Graphics:  Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT /
> 5700/5700 XT] 
>            vendor: Micro-Star MSI driver: amdgpu v: kernel bus ID:
> 03:00.0 chip ID: 1002:731f 
>            Device-2: AMD Renoir vendor: Micro-Star MSI driver: amdgpu
> v: kernel bus ID: 08:00.0 
>            chip ID: 1002:1636 
>            Display: x11 server: X.Org 1.20.9 driver: amdgpu,ati 
>            unloaded: fbdev,modesetting,radeon,vesa compositor: marco
> resolution: 1920x1080~144Hz 
>            OpenGL: renderer: AMD RENOIR (DRM 3.40.0 5.11.0-25-generic
> LLVM 11.0.0) 
>            v: 4.6 Mesa 20.2.6 direct render: Yes 
> Audio:     Device-1: AMD Navi 10 HDMI Audio vendor: Micro-Star MSI
> driver: snd_hda_intel v: kernel 
>            bus ID: 03:00.1 chip ID: 1002:ab38 
>            Device-2: AMD Raven/Raven2/FireFlight/Renoir Audio
> Processor vendor: Micro-Star MSI 
>            driver: N/A bus ID: 08:00.5 chip ID: 1022:15e2 
>            Device-3: AMD Family 17h HD Audio vendor: Micro-Star MSI
> driver: snd_hda_intel 
>            v: kernel bus ID: 08:00.6 chip ID: 1022:15e3 
>            Sound Server: ALSA v: k5.11.0-25-generic 
> Network:   Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel bus
> ID: 04:00.0 
>            chip ID: 8086:2723 
>            IF: wlp4s0 state: up mac: <filter> 
>            Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit
> Ethernet vendor: Micro-Star MSI 
>            driver: r8169 v: kernel port: f000 bus ID: 05:00.0 chip ID:
> 10ec:8168 
>            IF: eno1 state: down mac: <filter> 
> Drives:    Local Storage: total: 476.94 GiB used: 89.79 GiB (18.8%) 
>            ID-1: /dev/nvme0n1 vendor: Kingston model: OM8PCP3512F-AI1
> size: 476.94 GiB 
>            speed: 31.6 Gb/s lanes: 4 serial: <filter> 
> Partition: ID-1: / size: 466.30 GiB used: 89.28 GiB (19.1%) fs: ext4
> dev: /dev/dm-1 
>            ID-2: /boot size: 704.5 MiB used: 519.7 MiB (73.8%) fs:
> ext4 dev: /dev/nvme0n1p2 
>            ID-3: swap-1 size: 980.0 MiB used: 0 KiB (0.0%) fs: swap
> dev: /dev/dm-2 
> USB:       Hub: 1-0:1 info: Full speed (or root) Hub ports: 4 rev: 2.0
> chip ID: 1d6b:0002 
>            Device-1: 1-3:2 info: SteelSeries ApS SteelSeries KLC type:
> HID 
>            driver: hid-generic,usbhid rev: 2.0 chip ID: 1038:1122 
>            Device-2: 1-4:3 info: Acer HD Webcam type: Video driver:
> uvcvideo rev: 2.0 
>            chip ID: 5986:211c 
>            Hub: 2-0:1 info: Full speed (or root) Hub ports: 2 rev: 3.1
> chip ID: 1d6b:0003 
>            Hub: 3-0:1 info: Full speed (or root) Hub ports: 4 rev: 2.0
> chip ID: 1d6b:0002 
>            Device-3: 3-3:2 info: Intel type: Bluetooth driver: btusb
> rev: 2.0 chip ID: 8087:0029 
>            Hub: 4-0:1 info: Full speed (or root) Hub ports: 2 rev: 3.1
> chip ID: 1d6b:0003 
> Sensors:   System Temperatures: cpu: 46.5 C mobo: N/A 
>            Fan Speeds (RPM): N/A 
>            GPU: device: amdgpu temp: 0 C fan: 65535 device: amdgpu
> temp: 31 C 
> Repos:     No active apt repos in: /etc/apt/sources.list 
>            Active apt repos in:
> /etc/apt/sources.list.d/official-package-repositories.list 
>            1: deb http: //mirrors.seas.harvard.edu/linuxmint-packages
> uma main upstream import backport
>            2: deb http: //mirror.us-ny2.kamatera.com/ubuntu focal main
> restricted universe multiverse
>            3: deb http: //mirror.us-ny2.kamatera.com/ubuntu
> focal-updates main restricted universe multiverse
>            4: deb http: //mirror.us-ny2.kamatera.com/ubuntu
> focal-backports main restricted universe multiverse
>            5: deb http: //security.ubuntu.com/ubuntu/ focal-security
> main restricted universe multiverse
>            6: deb http: //archive.canonical.com/ubuntu/ focal partner
> Info:      Processes: 372 Uptime: 2h 44m Memory: 15.10 GiB used: 1.15
> GiB (7.6%) Init: systemd 
>            v: 245 runlevel: 5 Compilers: gcc: 9.3.0 alt: 9 Client:
> Unknown python3.8 client 
>            inxi: 3.0.38 
> [/code]
>
>
> If I am using it interactively, I get random crashes that seems to hit
> elements of mate (mate-panel, etc.) 
> consistently - just not predictably. LibreOffice applications, xed,
> Firefox, and Evolution seem to be more prone
> to crashing the X session. I can easily move to tty1, login, and kill
> services running in tty7 as the crashes
> don't appear to completely kill tty7. Sometimes, I can kill mate and
> launch a new instance to salvage
> the tty7 session. However, i usually end up having to kill the root
> pid of the xwindows session in order
> to re-login. But I think this is related to the AMD GPU driver because
> every  time I simply move the mouse in 
> tty7 session, I get the following in dmesg:
>
> [13164.399550] [drm] PCIE GART of 512M enabled (table at
> 0x0000008000000000).
> [13164.399579] [drm] PSP is resuming...
> [13164.486593] [drm] reserve 0xa00000 from 0x800f400000 for PSP TMR
> [13164.678788] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode
> is not available
> [13164.702624] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode
> is not available
> [13164.702639] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
> [13164.702648] amdgpu 0000:03:00.0: amdgpu: smu driver if version =
> 0x00000036, smu fw if version = 0x00000037, smu fw version =
> 0x002a3f00 (42.63.0)
> [13164.702664] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not
> matched
> [13164.746143] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
> [13164.768978] [drm] kiq ring mec 2 pipe 1 q 0
> [13164.779651] [drm] VCN decode and encode initialized
> successfully(under DPG Mode).
> [13164.779758] [drm] JPEG decode initialized successfully.
> [13164.779779] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv
> eng 0 on hub 0
> [13164.779783] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM
> inv eng 1 on hub 0
> [13164.779784] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM
> inv eng 4 on hub 0
> [13164.779785] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM
> inv eng 5 on hub 0
> [13164.779786] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM
> inv eng 6 on hub 0
> [13164.779787] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM
> inv eng 7 on hub 0
> [13164.779788] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM
> inv eng 8 on hub 0
> [13164.779789] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM
> inv eng 9 on hub 0
> [13164.779790] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM
> inv eng 10 on hub 0
> [13164.779792] amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv
> eng 11 on hub 0
> [13164.779793] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng
> 12 on hub 0
> [13164.779803] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng
> 13 on hub 0
> [13164.779804] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec uses VM inv
> eng 0 on hub 1
> [13164.779805] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc0 uses VM inv
> eng 1 on hub 1
> [13164.779806] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc1 uses VM inv
> eng 4 on hub 1
> [13164.779807] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv
> eng 5 on hub 1
> [13164.783807] amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
> [13170.722306] [drm] free PSP TMR buffer
>
> If I boot with nomodeset, I can operate fine - just without screen
> brightness control, etc. It just
> seems strange that an event is generated like this all the time.
>
> I only get sporadic crashes, though. Humorously, I've been running
> only Firefox, crash reporter and
> Mate Terminal this morning and it's run fine for over 4 hours. There
> were times when I wouldn't run
> anything at all and it's lock up on me. So I just can't find any
> common denominator for this (using vi 
> in terminal to type this - going to copy-paste into email client
> [Evolution] once I'm done this).
>
> I've attached 3 crash reports that were captured on the system over
> the last couple days. I apologize
> in advance - profusely! - if the problem turns out to be somewhere else. 
>
> Thanks,
> Tim 
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list