Looking for pointers on diagnosing ring test failure in amdgpu
Matthew Macy
mmacy at nextbsd.org
Tue Jun 14 20:08:33 UTC 2016
>
> The two issues are definitely related. They both go through a bounded delay loop waiting for some operation to complete.
>
I realized that sounded really dumb after I sent it. But what makes me think it's all related is that timing perturbations / random seemingly unrelated code changes can cause it to fail in about 3 or 4 distinct ways. Some times failing as early ring 0, other times ring 1, other times ring 11, and in this case on SMU firmware check. Whereas no matter what I do it gets to the point of switching from the efifb to the fb based on the set up by amdgpu on my friend's elitebook. So it looks like I just hit really unfortunate choice for a bring up device.
-M
> By default FreeBSD doesn't use the IOMMU on x86 so that's not an issue.
>
> One thing that is different between the Elitebook (A12) and the the Thinkpad (A10) is that the Thinkpad has both integrated and discrete GPUs. 0x6660 matches Hainan in drm_pciids.h which I guess is GCN 1.0? Could that possibly be an issue? I know amdgpu doesn't support pre GCN 1.1 currently, so I would assume it would just be ignored. Nonetheless, I thought I should bring it up just in case.
>
>
> vgapci0 at pci0:0:1:0: class=0x030000 card=0x511617aa chip=0x98741002 rev=0xc5 hdr=0x00
> vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
> device = 'Carrizo'
> class = display
> subclass = VGA
> <...>
> vgapci1 at pci0:5:0:0: class=0x038000 card=0x511617aa chip=0x66601002 rev=0x83 hdr=0x00
> vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
> device = 'Sun XT [Radeon HD 8670A/8670M/8690M / R5 M330]'
> class = display
>
> Thanks.
> -M
>
>
> >
> >Alex
> >
> >>> Which is hard to correlate without spending a lot more quality time with
> >>> the driver than I've had time for yet.
> >>
> >>
> >> Yeah, I don't see why some blocks should fail while others seem to
> >> initialize just fine. Especially since you reported it seems to work on
> >> other hardware.
> >>
> >>> One thing that occurs to me is that Linux is usually compiled with gcc6 -
> >>> has amdgpu ever been tested as compiled with clang?
> >>
> >>
> >> Not as far as I know. We had some problems in the past even with some gcc
> >> versions because of some odd things in the BIOS headers (e.g. zero sized
> >> arrays). But those issues should be fixed by now.
> >>
> >>> Below is a list of the warnings I have to disable in order to get amdgpu
> >>> to compile without disabling Werror altogether. The -Wno-format is an
> >>> artifact of clang or FreeBSD treating long long and uint64_t as distinct
> >>> types and the -Wno-pointer-arith is to accept the linux convention of doing
> >>> pointer arithmetic on void pointers. All the others are arguably oversights
> >>> in the code (similar silencing has to be done in i915, but I've had better
> >>> luck with it to date). I haven't fixed the warnings because I try to treat
> >>> it as vendor code and minimize any local changes. Will you accept
> >>> quasi-cosmetic patches from other operating systems / compilers?
> >>
> >>
> >> Yeah, sure feel free to provide patches. As long as it is only cleanup and
> >> not structural changes it should be trivial to get them merged.
> >>
> >> Especially "-Wno-missing-prototypes" and "-Wno-unused-variable" sound like
> >> something which should be trivial to fix.
> >>
> >> Regards,
> >> Christian.
> >>
> >>
> >>>
> >>> Thanks.
> >>>
> >>> -M
> >>>
> >>>
> >>> CWARNFLAGS+= -Wno-pointer-arith
> >>> CWARNFLAGS+= -Wno-pointer-sign ${CWARNFLAGS.${.IMPSRC:T}}
> >>>
> >>> CWARNFLAGS.amdgpu_acpi.c= -Wno-int-conversion
> >>> -Wno-missing-prototypes -Wno-unused-variable
> >>> CWARNFLAGS.amdgpu_amdkfd.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.amdgpu_bo_list.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.amdgpu_cs.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.amdgpu_device.c= -Wno-format -Wno-cast-qual
> >>> CWARNFLAGS.amdgpu_fence.c= -Wno-format
> >>> CWARNFLAGS.amdgpu_gfx.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.amdgpu_amdkfd_gfx_v7.c= -Wno-cast-qual
> >>> CWARNFLAGS.amdgpu_amdkfd_gfx_v8.c= -Wno-cast-qual
> >>> CWARNFLAGS.amdgpu_atpx_handler.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.amdgpu_ih.c= -Wno-cast-qual
> >>> CWARNFLAGS.amdgpu_ioc32.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.amdgpu_object.c= -Wno-format
> >>> CWARNFLAGS.amdgpu_mn.c= -Wno-unused-variable
> >>> CWARNFLAGS.amdgpu_pll.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.amdgpu_pm.c= -Wno-missing-prototypes
> >>> -Wno-enum-conversion
> >>> CWARNFLAGS.amdgpu_ring.c= -Wno-cast-qual
> >>> CWARNFLAGS.amdgpu_ttm.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.amdgpu_ucode.c=
> >>> -Wno-incompatible-pointer-types-discards-qualifiers -Wno-cast-qual
> >>> CWARNFLAGS.amdgpu_uvd.c= -Wno-format
> >>> CWARNFLAGS.amdgpu_vce.c= -Wno-format
> >>> CWARNFLAGS.amdgpu_vce.c= -Wno-format
> >>> CWARNFLAGS.amdgpu_vm.c= -Wno-format
> >>> CWARNFLAGS.amdgpu_test.c= -Wno-format
> >>> CWARNFLAGS.amdgpu_vm.c= -Wno-format
> >>> CWARNFLAGS.atombios_crtc.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.atombios_dp.c= -Wno-format
> >>> CWARNFLAGS.atombios_i2c.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.ci_dpm.c= -Wno-unused-const-variable
> >>> CWARNFLAGS.cz_smc.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.fiji_smc.c= -Wno-cast-qual
> >>> CWARNFLAGS.gfx_v7_0.c= -Wno-missing-prototypes -Wno-cast-qual
> >>> CWARNFLAGS.gfx_v8_0.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.iceland_smc.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.kv_dpm.c= -Wno-unused-const-variable
> >>> CWARNFLAGS.tonga_smc.c= -Wno-cast-qual
> >>> CWARNFLAGS.gpu_scheduler.c= -Wno-format -Wno-missing-prototypes
> >>> CWARNFLAGS.amd_powerplay.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.eventtasks.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.cz_clockpowergating.c= -Wno-missing-prototypes
> >>> -Wno-enum-conversion
> >>> CWARNFLAGS.cz_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual
> >>> CWARNFLAGS.fiji_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual
> >>> CWARNFLAGS.fiji_thermal.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.pp_acpi.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.ppatomctrl.c= -Wno-missing-prototypes -Wno-cast-qual
> >>> CWARNFLAGS.processpptables.c= -Wno-missing-prototypes
> >>> -Wno-sometimes-uninitialized
> >>> CWARNFLAGS.tonga_clockpowergating.c= -Wno-missing-prototypes
> >>> -Wno-enum-conversion
> >>> CWARNFLAGS.tonga_hwmgr.c= -Wno-missing-prototypes -Wno-cast-qual
> >>> CWARNFLAGS.tonga_processpptables.c= -Wno-missing-prototypes
> >>> -Wno-cast-qual
> >>> CWARNFLAGS.tonga_thermal.c= -Wno-missing-prototypes
> >>> CWARNFLAGS.tonga_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual
> >>> CWARNFLAGS.fiji_smumgr.c= -Wno-missing-prototypes -Wno-cast-qual
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> >
> >>> > Regards,
> >>> > Christian.
> >>> >
> >>> > Am 13.06.2016 um 03:35 schrieb Matthew Macy:
> >>> > >
> >>> > > I'm trying to bring up amdgpu an Carrizo A10 (Thinkpad e565 in case
> >>> it matters) on FreeBSD. The driver is essentially unmodified from what is
> >>> found in Linux 4.6 - relying on an extended version of FreeBSD's linuxkpi
> >>> shims. The shims work well enough that i915/drm from 4.6 works extremely
> >>> well on most hardware (I have yet to diagnose / fix the severe artifacts on
> >>> Cherry Trail and Atom).
> >>> > >
> >>> > > On my A10 ring 11 test is failing:
> >>> > > https://gist.github.com/mattmacy/8e4a85072648eceb2445ad227dcc447c
> >>> > >
> >>> > > On my friend's A12 based EliteBook ring initialization succeeds:
> >>> > > https://gist.github.com/mattmacy/d1fac64ab5190bb2568d6480dfbd7ee6
> >>> > >
> >>> > > With minor timing perturbations ring tests will fail as early as
> >>> ring 0.
> >>> > >
> >>> > > I'm hoping that one of the amdgpu developers might give me pointers
> >>> on how to diagnose further and or what bugs in the linuxkpi might be causing
> >>> this. I know that I can selectively disable the rings, but that doesn't help
> >>> fix the underlying problem.
> >>> > >
> >>> > > Thanks in advance.
> >>> > >
> >>> > > -M
> >>> > >
> >>> > > _______________________________________________
> >>> > > dri-devel mailing list
> >>> > > dri-devel at lists.freedesktop.org
> >>> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>> >
> >>> > _______________________________________________
> >>> > dri-devel mailing list
> >>> > dri-devel at lists.freedesktop.org
> >>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>> >
> >>>
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> >
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
More information about the dri-devel
mailing list