Regression on gfx8 with ring init

Christian König ckoenig.leichtzumerken at gmail.com
Tue Sep 18 14:41:22 UTC 2018


CRTC and GFX interrupts seem to be working perfectly fine.

The problem here looks like only EOP interrupts from the Compute queue 
are not correctly handled.

Most likely a bug somewhere in gfx_v8_0_eop_irq().

Christian.

Am 18.09.2018 um 16:36 schrieb Deucher, Alexander:
>
> FWIW, a number of consumer Raven boards have bad IVRS tables (windows 
> doesn't use interrupt remapping so they are sometimes wrong and 
> probably not validated.  There are a number of workaround to manually 
> override the IVRS tables to make interrupts work.  I think specifying 
> pci=noacpi is also a possible workaround.
>
>
> Alex
>
> ------------------------------------------------------------------------
> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of 
> Christian König <christian.koenig at amd.com>
> *Sent:* Tuesday, September 18, 2018 10:31:16 AM
> *To:* StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing)
> *Subject:* Re: Regression on gfx8 with ring init
> Well looks like interrupt processing is working perfectly fine.
>
> But looking at the error message once more I see that this actually
> affects ring number 9 and not the GFX ring.
>
> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the
> number?
>
> That must be some of the compute rings.
>
> Thanks,
> Christian.
>
> Am 18.09.2018 um 16:20 schrieb Tom St Denis:
> > On 2018-09-18 10:13 a.m., Christian König wrote:
> >> Mhm, there is no more failed IB-test in there isn't it?
> >
> > oh sorry I thought you wanted to test HEAD~ ... Attached is a log from
> > the tip of drm-next
> >
> > Tom
> >
> >>
> >> Christian.
> >>
> >> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
> >>> Disabling IOMMU in the BIOS resulted in a correct boot up...
> >>>
> >>> Here's the log.
> >>>
> >>> Tom
> >>>
> >>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
> >>>> Odd I couldn't even boot my system with the dGPU as primary after
> >>>> rebuilding the kernel.  It got hung up in the IOMMU driver (loads
> >>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it
> >>>> panic'ed before loading the network stack.
> >>>>
> >>>> Bizarre.
> >>>>
> >>>> I'll keep trying.
> >>>>
> >>>> Tom
> >>>>
> >>>> On 2018-09-18 9:35 a.m., Christian König wrote:
> >>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
> >>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
> >>>>>>> Great, not sure if that is a good or a bad news.
> >>>>>>>
> >>>>>>> Anyway going to revert the change for now. Does anybody
> >>>>>>> volunteer to figure out why interrupts sometimes doesn't work
> >>>>>>> correctly on Raven?
> >>>>>>
> >>>>>> What does "doesn't work correctly?"  My workstation is a Raven1
> >>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been
> >>>>>> perfectly stable (through suspend/resumes too I might add).
> >>>>>>
> >>>>>> Anything I could test with my devel raven?
> >>>>>
> >>>>> The problem seems to be that on some boards IH handling doesn't
> >>>>> work as it should.
> >>>>>
> >>>>> Can you try to disable the onboard graphics and try again?
> >>>>>
> >>>>> If that still doesn't work there is a DRM_DEBUG in
> >>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
> >>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
> >>>>>
> >>>>> Thanks,
> >>>>> Christian.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> Tom
> >>>>>>
> >>>>>>>
> >>>>>>> Christian.
> >>>>>>>
> >>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
> >>>>>>>> This commit:
> >>>>>>>>
> >>>>>>>> [root at raven linux]# git bisect good
> >>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
> >>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
> >>>>>>>> Author: Christian König <christian.koenig at amd.com>
> >>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
> >>>>>>>>
> >>>>>>>>     drm/amdgpu: remove fence fallback
> >>>>>>>>
> >>>>>>>>     DC doesn't seem to have a fallback path either.
> >>>>>>>>
> >>>>>>>>     So when interrupts doesn't work any more we are pretty much
> >>>>>>>> busted no
> >>>>>>>>     matter what.
> >>>>>>>>
> >>>>>>>>     Signed-off-by: Christian König <christian.koenig at amd.com>
> >>>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou at amd.com>
> >>>>>>>>
> >>>>>>>> Results in this:
> >>>>>>>>
> >>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
> >>>>>>>> 0000:07:00.0 on minor 1
> >>>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600
> >>>>>>>> bytes left
> >>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
> >>>>>>>> amdgpu: IB test timed out.
> >>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
> >>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
> >>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test
> >>>>>>>> failed (-110).
> >>>>>>>> [   28.506708] fuse init (API version 7.27)
> >>>>>>>>
> >>>>>>>> On init with my polaris/raven1 system.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Tom
> >>>>>>>> _______________________________________________
> >>>>>>>> amd-gfx mailing list
> >>>>>>>> amd-gfx at lists.freedesktop.org
> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180918/c2498d03/attachment-0001.html>


More information about the amd-gfx mailing list