Looking for pointers on diagnosing ring test failure in amdgpu

Matthew Macy mmacy at nextbsd.org
Mon Jun 13 22:03:18 UTC 2016




 ---- On Mon, 13 Jun 2016 01:35:34 -0700 Christian König <christian.koenig at amd.com> wrote ---- 
 > Hi Matthew,
 > 
 > sounds like the UVD block doesn't want to initialize. No idea off hand 
 > why, could be anything. I would need the hardware here for a closer 
 > inspection.
 > 
 > For a workaround you can try to disable the UVD blokc using the 
 > ip_block_mask module parameter (it's a bitmask of enabled blocks e.g. 
 > 0xffffffff means all blocks enabled, UVD is bit 7 on Carrizo IIRC).


When I clear bit 7 I get the following now:

Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 10 use gpu addr 0x00000000400000b0, cpu addr 0x0xfffff800bd4320b0
Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 11 use gpu addr 0x00000000400000c0, cpu addr 0x0xfffff800bd4320c0
Jun 14 07:58:19 trainwreck kernel: drmn0: SMU check loaded firmware failed, expecting 0x17f, getting 0x0[drm:0xffffffff826d4dc4s] *ERROR* amdgpu: smc start failed
Jun 14 07:58:19 trainwreck kernel: [drm:0xffffffff8269fc40s] *ERROR* hw_init 3 failed -22
Jun 14 07:58:19 trainwreck kernel: drmn0: amdgpu_init failed

Which is hard to correlate without spending a lot more quality time with the driver than I've had time for yet.

One thing that occurs to me is that Linux is usually compiled with gcc6 - has amdgpu ever been tested as compiled with clang?

Below is a list of the warnings I have to disable in order to get amdgpu to compile without disabling Werror altogether. The -Wno-format is an artifact of clang or FreeBSD treating long long and uint64_t as distinct types and the  -Wno-pointer-arith is to accept the linux convention of doing pointer arithmetic on void pointers. All the others are arguably oversights in the code (similar silencing has to be done in i915, but I've had better luck with it to date). I haven't fixed the warnings because I try to treat it as vendor code and minimize any local changes. Will you accept quasi-cosmetic patches from other operating systems / compilers?

Thanks.

-M


CWARNFLAGS+=    -Wno-pointer-arith
CWARNFLAGS+=    -Wno-pointer-sign ${CWARNFLAGS.${.IMPSRC:T}}

CWARNFLAGS.amdgpu_acpi.c=       -Wno-int-conversion -Wno-missing-prototypes -Wno-unused-variable
CWARNFLAGS.amdgpu_amdkfd.c=     -Wno-missing-prototypes
CWARNFLAGS.amdgpu_bo_list.c=    -Wno-missing-prototypes
CWARNFLAGS.amdgpu_cs.c= -Wno-missing-prototypes
CWARNFLAGS.amdgpu_device.c=     -Wno-format -Wno-cast-qual
CWARNFLAGS.amdgpu_fence.c=      -Wno-format
CWARNFLAGS.amdgpu_gfx.c=        -Wno-missing-prototypes
CWARNFLAGS.amdgpu_amdkfd_gfx_v7.c=      -Wno-cast-qual
CWARNFLAGS.amdgpu_amdkfd_gfx_v8.c=      -Wno-cast-qual
CWARNFLAGS.amdgpu_atpx_handler.c=       -Wno-missing-prototypes
CWARNFLAGS.amdgpu_ih.c= -Wno-cast-qual
CWARNFLAGS.amdgpu_ioc32.c=      -Wno-missing-prototypes
CWARNFLAGS.amdgpu_object.c=     -Wno-format
CWARNFLAGS.amdgpu_mn.c=         -Wno-unused-variable
CWARNFLAGS.amdgpu_pll.c=        -Wno-missing-prototypes
CWARNFLAGS.amdgpu_pm.c=         -Wno-missing-prototypes -Wno-enum-conversion
CWARNFLAGS.amdgpu_ring.c=       -Wno-cast-qual
CWARNFLAGS.amdgpu_ttm.c=        -Wno-missing-prototypes
CWARNFLAGS.amdgpu_ucode.c=      -Wno-incompatible-pointer-types-discards-qualifiers -Wno-cast-qual
CWARNFLAGS.amdgpu_uvd.c=        -Wno-format
CWARNFLAGS.amdgpu_vce.c=        -Wno-format
CWARNFLAGS.amdgpu_vce.c=        -Wno-format
CWARNFLAGS.amdgpu_vm.c=         -Wno-format
CWARNFLAGS.amdgpu_test.c=       -Wno-format
CWARNFLAGS.amdgpu_vm.c=         -Wno-format
CWARNFLAGS.atombios_crtc.c=     -Wno-missing-prototypes
CWARNFLAGS.atombios_dp.c=       -Wno-format
CWARNFLAGS.atombios_i2c.c=      -Wno-missing-prototypes
CWARNFLAGS.ci_dpm.c=    -Wno-unused-const-variable
CWARNFLAGS.cz_smc.c=    -Wno-missing-prototypes
CWARNFLAGS.fiji_smc.c=  -Wno-cast-qual
CWARNFLAGS.gfx_v7_0.c=  -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.gfx_v8_0.c=  -Wno-missing-prototypes
CWARNFLAGS.iceland_smc.c=       -Wno-missing-prototypes
CWARNFLAGS.kv_dpm.c=    -Wno-unused-const-variable
CWARNFLAGS.tonga_smc.c= -Wno-cast-qual
CWARNFLAGS.gpu_scheduler.c=     -Wno-format -Wno-missing-prototypes
CWARNFLAGS.amd_powerplay.c=     -Wno-missing-prototypes
CWARNFLAGS.eventtasks.c=        -Wno-missing-prototypes
CWARNFLAGS.cz_clockpowergating.c=       -Wno-missing-prototypes -Wno-enum-conversion
CWARNFLAGS.cz_hwmgr.c=  -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.fiji_hwmgr.c=        -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.fiji_thermal.c=      -Wno-missing-prototypes
CWARNFLAGS.pp_acpi.c=   -Wno-missing-prototypes 
CWARNFLAGS.ppatomctrl.c=        -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.processpptables.c=   -Wno-missing-prototypes -Wno-sometimes-uninitialized
CWARNFLAGS.tonga_clockpowergating.c=    -Wno-missing-prototypes -Wno-enum-conversion
CWARNFLAGS.tonga_hwmgr.c=       -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.tonga_processpptables.c=     -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.tonga_thermal.c=     -Wno-missing-prototypes
CWARNFLAGS.tonga_smumgr.c=      -Wno-missing-prototypes -Wno-cast-qual
CWARNFLAGS.fiji_smumgr.c=       -Wno-missing-prototypes -Wno-cast-qual





 > 
 > Regards,
 > Christian.
 > 
 > Am 13.06.2016 um 03:35 schrieb Matthew Macy:
 > >
 > > I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case it matters) on FreeBSD. The driver is essentially unmodified from what is found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi shims. The shims work well enough that i915/drm from 4.6 works extremely well on most hardware (I have yet to diagnose / fix the severe artifacts on Cherry Trail and Atom).
 > >
 > > On my A10 ring 11 test is failing:
 > >    https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
 > >
 > > On my friend's A12 based EliteBook ring initialization succeeds:
 > > https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
 > >
 > > With minor timing perturbations ring tests  will fail as early as ring 0.
 > >
 > > I'm hoping that one of the amdgpu developers might give me pointers on how to diagnose further and or what bugs in the linuxkpi might be causing this. I know that I can selectively disable the rings, but that doesn't help fix the underlying problem.
 > >
 > > Thanks in advance.
 > >
 > > -M
 > >
 > > _______________________________________________
 > > dri-devel mailing list
 > > dri-devel at lists.freedesktop.org
 > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
 > 
 > _______________________________________________
 > dri-devel mailing list
 > dri-devel at lists.freedesktop.org
 > https://lists.freedesktop.org/mailman/listinfo/dri-devel
 > 



More information about the dri-devel mailing list