Looking for pointers on diagnosing ring test failure in amdgpu

Alex Deucher alexdeucher at gmail.com
Tue Jun 14 13:02:09 UTC 2016


On Tue, Jun 14, 2016 at 4:10 AM, Christian König
<christian.koenig at amd.com> wrote:
> Hi Matthew,
>
> see inline below.
>
> Am 14.06.2016 um 00:03 schrieb Matthew Macy:
>>
>>   ---- On Mon, 13 Jun 2016 01:35:34 -0700 Christian König
>> <christian.koenig at amd.com> wrote ----
>>   > Hi Matthew,
>>   >
>>   > sounds like the UVD block doesn't want to initialize. No idea off hand
>>   > why, could be anything. I would need the hardware here for a closer
>>   > inspection.
>>   >
>>   > For a workaround you can try to disable the UVD blokc using the
>>   > ip_block_mask module parameter (it's a bitmask of enabled blocks e.g.
>>   > 0xffffffff means all blocks enabled, UVD is bit 7 on Carrizo IIRC).
>>
>>
>> When I clear bit 7 I get the following now:
>>
>> Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 10 use gpu
>> addr 0x00000000400000b0, cpu addr 0x0xfffff800bd4320b0
>> Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 11 use gpu
>> addr 0x00000000400000c0, cpu addr 0x0xfffff800bd4320c0
>> Jun 14 07:58:19 trainwreck kernel: drmn0: SMU check loaded firmware
>> failed, expecting 0x17f, getting 0x0[drm:0xffffffff826d4dc4s] *ERROR*
>> amdgpu: smc start failed
>> Jun 14 07:58:19 trainwreck kernel: [drm:0xffffffff8269fc40s] *ERROR*
>> hw_init 3 failed -22
>> Jun 14 07:58:19 trainwreck kernel: drmn0: amdgpu_init failed
>
>
> UVD is optional (as long as you don't want to do hardware video decoding)
> but the SMU isn't. Alex, Rex any idea what's going wrong here?
>

Seems like maybe the two issues are related.  Maybe some general MMIO
issue on that particular system or a issue with the MC or gart setup?
The firmware that the SMU loads is stored in gart and all of the
engine rings are in gart.  Maybe a problem with the IOMMU setup on the
CPU?

Alex

>> Which is hard to correlate without spending a lot more quality time with
>> the driver than I've had time for yet.
>
>
> Yeah, I don't see why some blocks should fail while others seem to
> initialize just fine. Especially since you reported it seems to work on
> other hardware.
>
>> One thing that occurs to me is that Linux is usually compiled with gcc6 -
>> has amdgpu ever been tested as compiled with clang?
>
>
> Not as far as I know. We had some problems in the past even with some gcc
> versions because of some odd things in the BIOS headers (e.g. zero sized
> arrays). But those issues should be fixed by now.
>
>> Below is a list of the warnings I have to disable in order to get amdgpu
>> to compile without disabling Werror altogether. The -Wno-format is an
>> artifact of clang or FreeBSD treating long long and uint64_t as distinct
>> types and the  -Wno-pointer-arith is to accept the linux convention of doing
>> pointer arithmetic on void pointers. All the others are arguably oversights
>> in the code (similar silencing has to be done in i915, but I've had better
>> luck with it to date). I haven't fixed the warnings because I try to treat
>> it as vendor code and minimize any local changes. Will you accept
>> quasi-cosmetic patches from other operating systems / compilers?
>
>
> Yeah, sure feel free to provide patches. As long as it is only cleanup and
> not structural changes it should be trivial to get them merged.
>
> Especially "-Wno-missing-prototypes" and "-Wno-unused-variable" sound like
> something which should be trivial to fix.
>
> Regards,
> Christian.
>
>
>>
>> Thanks.
>>
>> -M
>>
>>
>> CWARNFLAGS+=    -Wno-pointer-arith
>> CWARNFLAGS+=    -Wno-pointer-sign ${CWARNFLAGS.${.IMPSRC:T}}
>>
>> CWARNFLAGS.amdgpu_acpi.c=       -Wno-int-conversion
>> -Wno-missing-prototypes -Wno-unused-variable
>> CWARNFLAGS.amdgpu_amdkfd.c=     -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_bo_list.c=    -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_cs.c= -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_device.c=     -Wno-format -Wno-cast-qual
>> CWARNFLAGS.amdgpu_fence.c=      -Wno-format
>> CWARNFLAGS.amdgpu_gfx.c=        -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_amdkfd_gfx_v7.c=      -Wno-cast-qual
>> CWARNFLAGS.amdgpu_amdkfd_gfx_v8.c=      -Wno-cast-qual
>> CWARNFLAGS.amdgpu_atpx_handler.c=       -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_ih.c= -Wno-cast-qual
>> CWARNFLAGS.amdgpu_ioc32.c=      -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_object.c=     -Wno-format
>> CWARNFLAGS.amdgpu_mn.c=         -Wno-unused-variable
>> CWARNFLAGS.amdgpu_pll.c=        -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_pm.c=         -Wno-missing-prototypes
>> -Wno-enum-conversion
>> CWARNFLAGS.amdgpu_ring.c=       -Wno-cast-qual
>> CWARNFLAGS.amdgpu_ttm.c=        -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_ucode.c=
>> -Wno-incompatible-pointer-types-discards-qualifiers -Wno-cast-qual
>> CWARNFLAGS.amdgpu_uvd.c=        -Wno-format
>> CWARNFLAGS.amdgpu_vce.c=        -Wno-format
>> CWARNFLAGS.amdgpu_vce.c=        -Wno-format
>> CWARNFLAGS.amdgpu_vm.c=         -Wno-format
>> CWARNFLAGS.amdgpu_test.c=       -Wno-format
>> CWARNFLAGS.amdgpu_vm.c=         -Wno-format
>> CWARNFLAGS.atombios_crtc.c=     -Wno-missing-prototypes
>> CWARNFLAGS.atombios_dp.c=       -Wno-format
>> CWARNFLAGS.atombios_i2c.c=      -Wno-missing-prototypes
>> CWARNFLAGS.ci_dpm.c=    -Wno-unused-const-variable
>> CWARNFLAGS.cz_smc.c=    -Wno-missing-prototypes
>> CWARNFLAGS.fiji_smc.c=  -Wno-cast-qual
>> CWARNFLAGS.gfx_v7_0.c=  -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.gfx_v8_0.c=  -Wno-missing-prototypes
>> CWARNFLAGS.iceland_smc.c=       -Wno-missing-prototypes
>> CWARNFLAGS.kv_dpm.c=    -Wno-unused-const-variable
>> CWARNFLAGS.tonga_smc.c= -Wno-cast-qual
>> CWARNFLAGS.gpu_scheduler.c=     -Wno-format -Wno-missing-prototypes
>> CWARNFLAGS.amd_powerplay.c=     -Wno-missing-prototypes
>> CWARNFLAGS.eventtasks.c=        -Wno-missing-prototypes
>> CWARNFLAGS.cz_clockpowergating.c=       -Wno-missing-prototypes
>> -Wno-enum-conversion
>> CWARNFLAGS.cz_hwmgr.c=  -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.fiji_hwmgr.c=        -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.fiji_thermal.c=      -Wno-missing-prototypes
>> CWARNFLAGS.pp_acpi.c=   -Wno-missing-prototypes
>> CWARNFLAGS.ppatomctrl.c=        -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.processpptables.c=   -Wno-missing-prototypes
>> -Wno-sometimes-uninitialized
>> CWARNFLAGS.tonga_clockpowergating.c=    -Wno-missing-prototypes
>> -Wno-enum-conversion
>> CWARNFLAGS.tonga_hwmgr.c=       -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.tonga_processpptables.c=     -Wno-missing-prototypes
>> -Wno-cast-qual
>> CWARNFLAGS.tonga_thermal.c=     -Wno-missing-prototypes
>> CWARNFLAGS.tonga_smumgr.c=      -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.fiji_smumgr.c=       -Wno-missing-prototypes -Wno-cast-qual
>>
>>
>>
>>
>>
>>   >
>>   > Regards,
>>   > Christian.
>>   >
>>   > Am 13.06.2016 um 03:35 schrieb Matthew Macy:
>>   > >
>>   > > I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case
>> it matters) on FreeBSD. The driver is essentially unmodified from what is
>> found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi
>> shims. The shims work well enough that i915/drm from 4.6 works extremely
>> well on most hardware (I have yet to diagnose / fix the severe artifacts on
>> Cherry Trail and Atom).
>>   > >
>>   > > On my A10 ring 11 test is failing:
>>   > >    https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
>>   > >
>>   > > On my friend's A12 based EliteBook ring initialization succeeds:
>>   > > https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
>>   > >
>>   > > With minor timing perturbations ring tests  will fail as early as
>> ring 0.
>>   > >
>>   > > I'm hoping that one of the amdgpu developers might give me pointers
>> on how to diagnose further and or what bugs in the linuxkpi might be causing
>> this. I know that I can selectively disable the rings, but that doesn't help
>> fix the underlying problem.
>>   > >
>>   > > Thanks in advance.
>>   > >
>>   > > -M
>>   > >
>>   > > _______________________________________________
>>   > > dri-devel mailing list
>>   > > dri-devel at lists.freedesktop.org
>>   > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>   >
>>   > _______________________________________________
>>   > dri-devel mailing list
>>   > dri-devel at lists.freedesktop.org
>>   > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>   >
>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


More information about the dri-devel mailing list