A problem with the system freeze during the initialization of the radeon driver after a hot reboot.

Дмитрий Терехин jqt4 at basealt.ru
Mon Sep 23 11:13:44 UTC 2019


Hellow collegues!

I am one of the developers of the Alt OS distribution
https://en.altlinux.org
https://en.wikipedia.org/wiki/ALT_Linux
for the computer BFK3.1
https://www.baikalelectronics.ru/products/bfk31/
https://medium.com/@malafeev/first-independent-tests-of-baikal-t1-processor-and-bfk-3-1-evaluation-board-bc7c1db12046
The computer is based on the processor BE-T1000 (Baikal-T1) https://en.wikipedia.org/wiki/Baikal_CPU

I use AMD SAPPHIRE Radeon R5 230 / HD 6450 (Caicos) graphics card with BFK3.1 and
Linux kernel from Baikal Electronics https://github.com/baikalelectronics/Linux-kernel.4.4.xx
Video driver drivers/gpu/drm/radeon

There is a problem with the system freeze during the initialization of the radeon driver after a hot reboot.
After a reboot with a complete power off, no freeze occurs.

A hang occurs during the initialization of the radeon driver, when manipulating the graphics card registers through
MMIO mechanism.

I assume that the hang may be due to some kind of improper state of the video card
at the time of driver initialization.

After a hot reboot, I observed the following variants for the softreset GPU:

1. In this case, no hang occurs.
radeon 0000:01:00.0: GPU softreset: 0x00000008
radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0003828
radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010100
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020180
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038042
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57

2. After this, it hangs, usually when manipulating the graphics card registers
in the evergreen_init_golden_registers function
radeon 0000:01:00.0: GPU softreset: 0x00000004
radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83146
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100000
radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57

3. The softreset GPU does not execute at all, no messages are issued about this.
After this, it hangs, usually when manipulating the graphics card registers
in the evergreen_init_golden_registers function

I think that to workaround the problem it is need to perform a complete restart of the video card.
I do not know how to do that.

BFK3.1 does not allow disconnecting power on the PCI slot from Linux.
The following commands
echo "1" > /sys/bus/pci/devices/0000:01:00.0/remove
echo "1" > /sys/bus/pci/rescan
do not restart of the video card.

I tried unconditionally calling evergreen_gpu_pci_config_reset from evergreen_asic_reset.
Got a permanent hang when initializing the driver.

I ask for your help.
Tell me, please, how to properly restart the video card when initializing the driver.

Dmitry Terekhin
jqt4 at basealt.ru
jqt4 at altlinux.org



More information about the amd-gfx mailing list