Looking for pointers on diagnosing ring test failure in amdgpu
Alex Deucher
alexdeucher at gmail.com
Tue Jun 14 13:02:09 UTC 2016
On Tue, Jun 14, 2016 at 4:10 AM, Christian König
<christian.koenig at amd.com> wrote:
> Hi Matthew,
>
> see inline below.
>
> Am 14.06.2016 um 00:03 schrieb Matthew Macy:
>>
>> ---- On Mon, 13 Jun 2016 01:35:34 -0700 Christian König
>> <christian.koenig at amd.com> wrote ----
>> > Hi Matthew,
>> >
>> > sounds like the UVD block doesn't want to initialize. No idea off hand
>> > why, could be anything. I would need the hardware here for a closer
>> > inspection.
>> >
>> > For a workaround you can try to disable the UVD blokc using the
>> > ip_block_mask module parameter (it's a bitmask of enabled blocks e.g.
>> > 0xffffffff means all blocks enabled, UVD is bit 7 on Carrizo IIRC).
>>
>>
>> When I clear bit 7 I get the following now:
>>
>> Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 10 use gpu
>> addr 0x00000000400000b0, cpu addr 0x0xfffff800bd4320b0
>> Jun 14 07:58:18 trainwreck kernel: drmn0: fence driver on ring 11 use gpu
>> addr 0x00000000400000c0, cpu addr 0x0xfffff800bd4320c0
>> Jun 14 07:58:19 trainwreck kernel: drmn0: SMU check loaded firmware
>> failed, expecting 0x17f, getting 0x0[drm:0xffffffff826d4dc4s] *ERROR*
>> amdgpu: smc start failed
>> Jun 14 07:58:19 trainwreck kernel: [drm:0xffffffff8269fc40s] *ERROR*
>> hw_init 3 failed -22
>> Jun 14 07:58:19 trainwreck kernel: drmn0: amdgpu_init failed
>
>
> UVD is optional (as long as you don't want to do hardware video decoding)
> but the SMU isn't. Alex, Rex any idea what's going wrong here?
>
Seems like maybe the two issues are related. Maybe some general MMIO
issue on that particular system or a issue with the MC or gart setup?
The firmware that the SMU loads is stored in gart and all of the
engine rings are in gart. Maybe a problem with the IOMMU setup on the
CPU?
Alex
>> Which is hard to correlate without spending a lot more quality time with
>> the driver than I've had time for yet.
>
>
> Yeah, I don't see why some blocks should fail while others seem to
> initialize just fine. Especially since you reported it seems to work on
> other hardware.
>
>> One thing that occurs to me is that Linux is usually compiled with gcc6 -
>> has amdgpu ever been tested as compiled with clang?
>
>
> Not as far as I know. We had some problems in the past even with some gcc
> versions because of some odd things in the BIOS headers (e.g. zero sized
> arrays). But those issues should be fixed by now.
>
>> Below is a list of the warnings I have to disable in order to get amdgpu
>> to compile without disabling Werror altogether. The -Wno-format is an
>> artifact of clang or FreeBSD treating long long and uint64_t as distinct
>> types and the -Wno-pointer-arith is to accept the linux convention of doing
>> pointer arithmetic on void pointers. All the others are arguably oversights
>> in the code (similar silencing has to be done in i915, but I've had better
>> luck with it to date). I haven't fixed the warnings because I try to treat
>> it as vendor code and minimize any local changes. Will you accept
>> quasi-cosmetic patches from other operating systems / compilers?
>
>
> Yeah, sure feel free to provide patches. As long as it is only cleanup and
> not structural changes it should be trivial to get them merged.
>
> Especially "-Wno-missing-prototypes" and "-Wno-unused-variable" sound like
> something which should be trivial to fix.
>
> Regards,
> Christian.
>
>
>>
>> Thanks.
>>
>> -M
>>
>>
>> CWARNFLAGS+= -Wno-pointer-arith
>> CWARNFLAGS+= -Wno-pointer-sign ${CWARNFLAGS.${.IMPSRC:T}}
>>
>> CWARNFLAGS.amdgpu_acpi.c= -Wno-int-conversion
>> -Wno-missing-prototypes -Wno-unused-variable
>> CWARNFLAGS.amdgpu_amdkfd.c= -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_bo_list.c= -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_cs.c= -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_device.c= -Wno-format -Wno-cast-qual
>> CWARNFLAGS.amdgpu_fence.c= -Wno-format
>> CWARNFLAGS.amdgpu_gfx.c= -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_amdkfd_gfx_v7.c= -Wno-cast-qual
>> CWARNFLAGS.amdgpu_amdkfd_gfx_v8.c= -Wno-cast-qual
>> CWARNFLAGS.amdgpu_atpx_handler.c= -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_ih.c= -Wno-cast-qual
>> CWARNFLAGS.amdgpu_ioc32.c= -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_object.c= -Wno-format
>> CWARNFLAGS.amdgpu_mn.c= -Wno-unused-variable
>> CWARNFLAGS.amdgpu_pll.c= -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_pm.c= -Wno-missing-prototypes
>> -Wno-enum-conversion
>> CWARNFLAGS.amdgpu_ring.c= -Wno-cast-qual
>> CWARNFLAGS.amdgpu_ttm.c= -Wno-missing-prototypes
>> CWARNFLAGS.amdgpu_ucode.c=
>> -Wno-incompatible-pointer-types-discards-qualifiers -Wno-cast-qual
>> CWARNFLAGS.amdgpu_uvd.c= -Wno-format
>> CWARNFLAGS.amdgpu_vce.c= -Wno-format
>> CWARNFLAGS.amdgpu_vce.c= -Wno-format
>> CWARNFLAGS.amdgpu_vm.c= -Wno-format
>> CWARNFLAGS.amdgpu_test.c= -Wno-format
>> CWARNFLAGS.amdgpu_vm.c= -Wno-format
>> CWARNFLAGS.atombios_crtc.c= -Wno-missing-prototypes
>> CWARNFLAGS.atombios_dp.c= -Wno-format
>> CWARNFLAGS.atombios_i2c.c= -Wno-missing-prototypes
>> CWARNFLAGS.ci_dpm.c= -Wno-unused-const-variable
>> CWARNFLAGS.cz_smc.c= -Wno-missing-prototypes
>> CWARNFLAGS.fiji_smc.c= -Wno-cast-qual
>> CWARNFLAGS.gfx_v7_0.c= -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.gfx_v8_0.c= -Wno-missing-prototypes
>> CWARNFLAGS.iceland_smc.c= -Wno-missing-prototypes
>> CWARNFLAGS.kv_dpm.c= -Wno-unused-const-variable
>> CWARNFLAGS.tonga_smc.c= -Wno-cast-qual
>> CWARNFLAGS.gpu_scheduler.c= -Wno-format -Wno-missing-prototypes
>> CWARNFLAGS.amd_powerplay.c= -Wno-missing-prototypes
>> CWARNFLAGS.eventtasks.c= -Wno-missing-prototypes
>> CWARNFLAGS.cz_clockpowergating.c= -Wno-missing-prototypes
>> -Wno-enum-conversion
>> CWARNFLAGS.cz_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.fiji_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.fiji_thermal.c= -Wno-missing-prototypes
>> CWARNFLAGS.pp_acpi.c= -Wno-missing-prototypes
>> CWARNFLAGS.ppatomctrl.c= -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.processpptables.c= -Wno-missing-prototypes
>> -Wno-sometimes-uninitialized
>> CWARNFLAGS.tonga_clockpowergating.c= -Wno-missing-prototypes
>> -Wno-enum-conversion
>> CWARNFLAGS.tonga_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.tonga_processpptables.c= -Wno-missing-prototypes
>> -Wno-cast-qual
>> CWARNFLAGS.tonga_thermal.c= -Wno-missing-prototypes
>> CWARNFLAGS.tonga_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual
>> CWARNFLAGS.fiji_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual
>>
>>
>>
>>
>>
>> >
>> > Regards,
>> > Christian.
>> >
>> > Am 13.06.2016 um 03:35 schrieb Matthew Macy:
>> > >
>> > > I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case
>> it matters) on FreeBSD. The driver is essentially unmodified from what is
>> found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi
>> shims. The shims work well enough that i915/drm from 4.6 works extremely
>> well on most hardware (I have yet to diagnose / fix the severe artifacts on
>> Cherry Trail and Atom).
>> > >
>> > > On my A10 ring 11 test is failing:
>> > > https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
>> > >
>> > > On my friend's A12 based EliteBook ring initialization succeeds:
>> > > https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
>> > >
>> > > With minor timing perturbations ring tests will fail as early as
>> ring 0.
>> > >
>> > > I'm hoping that one of the amdgpu developers might give me pointers
>> on how to diagnose further and or what bugs in the linuxkpi might be causing
>> this. I know that I can selectively disable the rings, but that doesn't help
>> fix the underlying problem.
>> > >
>> > > Thanks in advance.
>> > >
>> > > -M
>> > >
>> > > _______________________________________________
>> > > dri-devel mailing list
>> > > dri-devel at lists.freedesktop.org
>> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> >
>> > _______________________________________________
>> > dri-devel mailing list
>> > dri-devel at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> >
>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
More information about the dri-devel
mailing list