After Vega 56/64 GPU hang I unable reboot system

Mikhail Gavrilov mikhail.v.gavrilov at gmail.com
Thu Dec 20 16:07:43 UTC 2018


On Thu, 20 Dec 2018 at 19:19, StDenis, Tom <Tom.StDenis at amd.com> wrote:
>
> Ya I was right.  With a plain build I can access the files just fine.
>
>
>
> I did manage to get into a weird shell where I couldn't cat
> amdgpu_gca_config from bash though after a reboot (had updates pending)
> it works fine.
>
> If you can't cat those files then neither can umr.
>
> So NOTABUG :-)
>

I am very happy for you. But what about me?
I don't have idea how make this files available on my system.
And of course I tried reboot and try again cat amdgpu_gca_config
several times but all times without success.

Also I note that not all files not permitted for read from
/sys/kernel/debug/dri/0/*
I was able to dump contents of some files in debugfs.txt (see attachments)
List of available for readind files:
amdgpu_evict_gtt
amdgpu_evict_vram
amdgpu_fence_info
amdgpu_firmware_info
amdgpu_gds_mm
amdgpu_gem_info
amdgpu_gpu_recover
amdgpu_gtt_mm
amdgpu_gws_mm
amdgpu_oa_mm
amdgpu_pm_info
amdgpu_sa_info
amdgpu_test_ib
amdgpu_vbios
amdgpu_vram_mm
clients
framebuffer
gem_names
internal_clients
name
state
ttm_page_pool

May some kernel options restrict access for files in debugfs (for
example to amdgpu_gca_config)?
If yes on which options should I pay attention?
I have no more ideas. I tried everything.




--
Best Regards,
Mike Gavrilov.
-------------- next part --------------
# head /sys/kernel/debug/dri/0/*
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_dm_dtn_log' for reading: Operation not permitted
==> /sys/kernel/debug/dri/0/amdgpu_evict_gtt <==
(0)

==> /sys/kernel/debug/dri/0/amdgpu_evict_vram <==
(0)

==> /sys/kernel/debug/dri/0/amdgpu_fence_info <==
--- ring 0 (gfx) ---
Last signaled fence 0x00000216
Last emitted        0x00000216
Last preempted      0x00000000
Last reset          0x00000000
Last both           0x00000000
--- ring 1 (comp_1.0.0) ---
Last signaled fence 0x00000009
Last emitted        0x00000009
--- ring 2 (comp_1.1.0) ---

==> /sys/kernel/debug/dri/0/amdgpu_firmware_info <==
VCE feature version: 0, firmware version: 0x37030400
UVD feature version: 0, firmware version: 0x01571100
MC feature version: 0, firmware version: 0x00000000
ME feature version: 40, firmware version: 0x00000099
PFP feature version: 40, firmware version: 0x000000ae
CE feature version: 40, firmware version: 0x0000004d
RLC feature version: 0, firmware version: 0x00000058
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_gca_config' for reading: Operation not permitted

==> /sys/kernel/debug/dri/0/amdgpu_gds_mm <==
0x0000000000000000-0x0000000000001000: 4096: used
0x0000000000001000-0x0000000000010000: 61440: free
total: 65536, used 4096 free 61440

==> /sys/kernel/debug/dri/0/amdgpu_gem_info <==
pid     2219 command Xwayland:
	0x00000001:       131072 byte  CPU CPU_ACCESS_REQUIRED CPU_GTT_USWC
	0x00000002:         4096 byte  CPU CPU_ACCESS_REQUIRED
	0x00000003:       131072 byte  CPU CPU_ACCESS_REQUIRED CPU_GTT_USWC
	0x00000004:       131072 byte  CPU CPU_GTT_USWC
	0x00000005:       131072 byte  CPU CPU_ACCESS_REQUIRED CPU_GTT_USWC
pid     2219 command Xwayland:
	0x00000001:       131072 byte  CPU CPU_ACCESS_REQUIRED CPU_GTT_USWC
	0x00000002:         4096 byte  CPU CPU_ACCESS_REQUIRED
	0x00000003:       131072 byte  CPU CPU_ACCESS_REQUIRED CPU_GTT_USWC
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_gpr' for reading: Operation not permitted

==> /sys/kernel/debug/dri/0/amdgpu_gpu_recover <==
gpu recover

==> /sys/kernel/debug/dri/0/amdgpu_gtt_mm <==
0x0000000000000400-0x0000000000000401: 1: used
0x0000000000000401-0x0000000000000405: 4: used
0x0000000000000405-0x0000000000000447: 66: used
0x0000000000000447-0x0000000000000449: 2: used
0x0000000000000449-0x000000000000044b: 2: used
0x000000000000044b-0x000000000000044d: 2: used
0x000000000000044d-0x000000000000044f: 2: used
0x000000000000044f-0x0000000000000451: 2: used
0x0000000000000451-0x0000000000000453: 2: used
0x0000000000000453-0x0000000000000455: 2: used

==> /sys/kernel/debug/dri/0/amdgpu_gws_mm <==
0x0000000000000000-0x0000000000000004: 4: used
0x0000000000000004-0x0000000000000040: 60: free
total: 64, used 4 free 60
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_iomem' for reading: Operation not permitted

==> /sys/kernel/debug/dri/0/amdgpu_oa_mm <==
0x0000000000000000-0x0000000000000004: 4: used
0x0000000000000004-0x0000000000000010: 12: free
total: 16, used 4 free 12

==> /sys/kernel/debug/dri/0/amdgpu_pm_info <==
Clock Gating Flags Mask: 0x888200
	Graphics Medium Grain Clock Gating: Off
	Graphics Medium Grain memory Light Sleep: Off
	Graphics Coarse Grain Clock Gating: Off
	Graphics Coarse Grain memory Light Sleep: Off
	Graphics Coarse Grain Tree Shader Clock Gating: Off
	Graphics Coarse Grain Tree Shader Light Sleep: Off
	Graphics Command Processor Light Sleep: Off
	Graphics Run List Controller Light Sleep: Off
	Graphics 3D Coarse Grain Clock Gating: Off
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_regs' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_regs_didt' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_regs_pcie' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_regs_smc' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_comp_1.0.0' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_comp_1.0.1' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_comp_1.1.0' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_comp_1.1.1' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_comp_1.2.0' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_comp_1.2.1' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_comp_1.3.0' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_comp_1.3.1' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_gfx' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_kiq_2.1.0' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_sdma0' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_sdma1' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_uvd<0>' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_uvd_enc0<0>' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_uvd_enc1<0>' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_vce0' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_vce1' for reading: Operation not permitted
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_ring_vce2' for reading: Operation not permitted

==> /sys/kernel/debug/dri/0/amdgpu_sa_info <==
 [0x0000bca000 0x0000bcb020] size     4128 protected by 0x00000863 on context 47
 [0x0000bcb100 0x0000bcb120] size       32 protected by 0x00000864 on context 47
 [0x0000bcb200 0x0000bcc220] size     4128 protected by 0x00000865 on context 47
 [0x0000bcc300 0x0000bcc320] size       32 protected by 0x00000866 on context 47
 [0x0000bcc400 0x0000bcd420] size     4128 protected by 0x00000867 on context 47
 [0x0000bcd500 0x0000bcd520] size       32 protected by 0x00000868 on context 47
 [0x0000bcd600 0x0000bce620] size     4128 protected by 0x00000869 on context 47
 [0x0000bce700 0x0000bce720] size       32 protected by 0x0000086a on context 47
 [0x0000bce800 0x0000bcf820] size     4128 protected by 0x0000086b on context 47
 [0x0000bcf900 0x0000bcf920] size       32 protected by 0x0000086c on context 47
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_sensors' for reading: Operation not permitted

==> /sys/kernel/debug/dri/0/amdgpu_test_ib <==
run ib test:
ib ring tests passed.

==> /sys/kernel/debug/dri/0/amdgpu_vbios <==
Uªwéë   À             œ    IBMÿ¬Š            761295520              r      10/26/17 23:36  6   éÙ éã   H	   €è    € € € ô8ŠPã(D zÁË8žòÿÿ      Œ¯   høªñ  0U 8ˆo ¸ƒ0‚¯:„ `Àû‹B ‚€Í12     )@    h        "           2              ‡ X#    à          ùI"     0$0 €^  p  e     Àè¢  øÐ�    xxx-xxx-xxx VEGA10 PCI_EXPRESS HBM2 
GV-RXVEGA64GAMING OC-8GD/F1/062E                                            
 
 
 (C) 1988-2010, Advanced Micro Devices, Inc. ATOMBIOSBK-AMD VER016.001.001.000.000000 RVG64GO.F1   1475497  400795            GBT_VEGA10_D05001_MBA_A1_HBM_8GB_GAOC\config.h   �( ATOM ÀëÁl   àX#œø�žŽ   ›    PCIRh      w     AMD ATOMBIOS ·-š×                                          fPfQfRfSfUfVfW£ Œ ² èÕ(Àu¢ f_f^f]f[fZfYfXËèï)èÁ'2Ò�>r‰Uè}è=è—è­èÔ$Àtè9 è«Oè<è Pè© ´€è»'ŠÇfÁàŠã° f£¬Šè¬èOèÊ)f_f^f]f[fZfYfXË.‹ ƒ>ý u‰û.ŽóœúfÇeð ðÇ@ ‰B Ç´‰¶Ç| ŽX‰~ Ç¡\‰Ç¨ÀR‰ª.Žû‹Ã£ÂR£ÒR£äR�ÃPMIDòB     ° ¸ À    »  è¢
fÁ裢» è•
�>rf‰Eà èø(€ü uè= ë€üOuèå=ëè:Cë´èÿ(ÏèÖ(è  ë´èñ(ËèÈ(€üOuè¿=ëèCë´èÙ(ËfPfQfRfSfUfVfW<u).¡ fÁà.¡ �6jŠ<€ï0³‹ì‰^‰FfÁè‰F é(<uèl&è?&.¡ ‹ì‰V‰F ‰^é<u2èÉfÑà‹ì‰F» èÓˆFè½f‰F»	 èÃ3Àf‰F .‹¢‰VéÖ <u$
Ûu¹€ » ‹ì‰^‰Né½ ŠÇèÐ è4„© é® <uèrtèã
èJè"è83É‹È‹ì‰Né� <uE
ÿuèù‹ì‰Fèo‰Nëuè=uh» èr'fÀt]#ÉtYè¾ èatN¾ °@è°è‹ì‰FëD<‚u€ûu
ÿu
è‹ì‰Fë-èÓ
të&<Žu€ÿt€ÿu€Áè,ë‹ìÆFë‹ìÆFë2ä‹ìˆff_f^f]f[fZfYfXÃQŠÈ¸ ÓàYÃèr'ÃÃPQ°¶æC°3æB°æBäaŠàæaŠÄ¹È èz#æaYXà V€> €vÆ €Š Áá	üÆ!  3ö2ä¬àâûöÔþĈ&! ^à WÀuè‚	t%è$ öÃtóë‹Èè´!#Átè öÃt
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_vram' for reading: Operation not permitted

==> /sys/kernel/debug/dri/0/amdgpu_vram_mm <==
0x0000000000000000-0x0000000000000900: 2304: used
0x0000000000000900-0x0000000000000a00: 256: used
0x0000000000000a00-0x0000000000000a01: 1: used
0x0000000000000a01-0x0000000000000a02: 1: used
0x0000000000000a02-0x0000000000000a03: 1: used
0x0000000000000a03-0x0000000000000a04: 1: used
0x0000000000000a04-0x0000000000000a05: 1: used
0x0000000000000a05-0x0000000000000a06: 1: used
0x0000000000000a06-0x0000000000000a1f: 25: used
0x0000000000000a1f-0x0000000000000a20: 1: used
head: cannot open '/sys/kernel/debug/dri/0/amdgpu_wave' for reading: Operation not permitted

==> /sys/kernel/debug/dri/0/clients <==
             command   pid dev master a   uid      magic
      systemd-logind   998   0   y    y     0          0
            Xwayland  2219   0   n    y  1000          1
            Xwayland  2219   0   n    y  1000          2
            Xwayland  2219   0   n    y  1000          3
            Xwayland  2219   0   n    y  1000          4
            Xwayland  2219   0   n    y  1000          5
            Xwayland  2219   0   n    y  1000          6
            Xwayland  2219   0   n    y  1000          7
            Xwayland  2219   0   n    y  1000          8

==> /sys/kernel/debug/dri/0/crtc-0 <==
head: error reading '/sys/kernel/debug/dri/0/crtc-0': Is a directory

==> /sys/kernel/debug/dri/0/crtc-1 <==
head: error reading '/sys/kernel/debug/dri/0/crtc-1': Is a directory

==> /sys/kernel/debug/dri/0/crtc-2 <==
head: error reading '/sys/kernel/debug/dri/0/crtc-2': Is a directory

==> /sys/kernel/debug/dri/0/crtc-3 <==
head: error reading '/sys/kernel/debug/dri/0/crtc-3': Is a directory

==> /sys/kernel/debug/dri/0/crtc-4 <==
head: error reading '/sys/kernel/debug/dri/0/crtc-4': Is a directory

==> /sys/kernel/debug/dri/0/crtc-5 <==
head: error reading '/sys/kernel/debug/dri/0/crtc-5': Is a directory

==> /sys/kernel/debug/dri/0/DP-1 <==
head: error reading '/sys/kernel/debug/dri/0/DP-1': Is a directory

==> /sys/kernel/debug/dri/0/DP-2 <==
head: error reading '/sys/kernel/debug/dri/0/DP-2': Is a directory

==> /sys/kernel/debug/dri/0/DP-3 <==
head: error reading '/sys/kernel/debug/dri/0/DP-3': Is a directory

==> /sys/kernel/debug/dri/0/framebuffer <==
framebuffer[59]:
	allocated by = gnome-shell
	refcount=2
	format=XR24 little-endian (0x34325258)
	modifier=0x0
	size=3840x2160
	layers:
		size[0]=3840x2160
		pitch[0]=15360
		offset[0]=0

==> /sys/kernel/debug/dri/0/gem_names <==
  name     size handles refcount

==> /sys/kernel/debug/dri/0/HDMI-A-1 <==
head: error reading '/sys/kernel/debug/dri/0/HDMI-A-1': Is a directory

==> /sys/kernel/debug/dri/0/HDMI-A-2 <==
head: error reading '/sys/kernel/debug/dri/0/HDMI-A-2': Is a directory

==> /sys/kernel/debug/dri/0/HDMI-A-3 <==
head: error reading '/sys/kernel/debug/dri/0/HDMI-A-3': Is a directory

==> /sys/kernel/debug/dri/0/internal_clients <==

==> /sys/kernel/debug/dri/0/name <==
amdgpu dev=0000:0b:00.0 unique=0000:0b:00.0

==> /sys/kernel/debug/dri/0/state <==
plane[37]: plane-0
	crtc=(null)
	fb=0
	crtc-pos=0x0+0+0
	src-pos=0.000000x0.000000+0.000000+0.000000
	rotation=1
	normalized-zpos=0
	color-encoding=ITU-R BT.601 YCbCr
	color-range=YCbCr limited range
plane[38]: plane-1

==> /sys/kernel/debug/dri/0/ttm_page_pool <==
   pool      refills   pages freed     size
     wc            8             0      353
     uc            0             0        0
 wc dma            0             0        0
 uc dma            0             0        0
wc huge            0             0        0
uc huge            0             0        0
head: cannot open '2' for reading: No such file or directory


More information about the amd-gfx mailing list