Regression on gfx8 with ring init

Deucher, Alexander Alexander.Deucher at amd.com
Tue Sep 18 14:36:49 UTC 2018


FWIW, a number of consumer Raven boards have bad IVRS tables (windows doesn't use interrupt remapping so they are sometimes wrong and probably not validated.  There are a number of workaround to manually override the IVRS tables to make interrupts work.  I think specifying pci=noacpi is also a possible workaround.


Alex

________________________________
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of Christian König <christian.koenig at amd.com>
Sent: Tuesday, September 18, 2018 10:31:16 AM
To: StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing)
Subject: Re: Regression on gfx8 with ring init

Well looks like interrupt processing is working perfectly fine.

But looking at the error message once more I see that this actually
affects ring number 9 and not the GFX ring.

Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the
number?

That must be some of the compute rings.

Thanks,
Christian.

Am 18.09.2018 um 16:20 schrieb Tom St Denis:
> On 2018-09-18 10:13 a.m., Christian König wrote:
>> Mhm, there is no more failed IB-test in there isn't it?
>
> oh sorry I thought you wanted to test HEAD~ ... Attached is a log from
> the tip of drm-next
>
> Tom
>
>>
>> Christian.
>>
>> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>>
>>> Here's the log.
>>>
>>> Tom
>>>
>>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>>> Odd I couldn't even boot my system with the dGPU as primary after
>>>> rebuilding the kernel.  It got hung up in the IOMMU driver (loads
>>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it
>>>> panic'ed before loading the network stack.
>>>>
>>>> Bizarre.
>>>>
>>>> I'll keep trying.
>>>>
>>>> Tom
>>>>
>>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>>>>> Great, not sure if that is a good or a bad news.
>>>>>>>
>>>>>>> Anyway going to revert the change for now. Does anybody
>>>>>>> volunteer to figure out why interrupts sometimes doesn't work
>>>>>>> correctly on Raven?
>>>>>>
>>>>>> What does "doesn't work correctly?"  My workstation is a Raven1
>>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been
>>>>>> perfectly stable (through suspend/resumes too I might add).
>>>>>>
>>>>>> Anything I could test with my devel raven?
>>>>>
>>>>> The problem seems to be that on some boards IH handling doesn't
>>>>> work as it should.
>>>>>
>>>>> Can you try to disable the onboard graphics and try again?
>>>>>
>>>>> If that still doesn't work there is a DRM_DEBUG in
>>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
>>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>>>>> This commit:
>>>>>>>>
>>>>>>>> [root at raven linux]# git bisect good
>>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>>>>> Author: Christian König <christian.koenig at amd.com>
>>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>>>>>>
>>>>>>>>     drm/amdgpu: remove fence fallback
>>>>>>>>
>>>>>>>>     DC doesn't seem to have a fallback path either.
>>>>>>>>
>>>>>>>>     So when interrupts doesn't work any more we are pretty much
>>>>>>>> busted no
>>>>>>>>     matter what.
>>>>>>>>
>>>>>>>>     Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou at amd.com>
>>>>>>>>
>>>>>>>> Results in this:
>>>>>>>>
>>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
>>>>>>>> 0000:07:00.0 on minor 1
>>>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600
>>>>>>>> bytes left
>>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
>>>>>>>> amdgpu: IB test timed out.
>>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
>>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test
>>>>>>>> failed (-110).
>>>>>>>> [   28.506708] fuse init (API version 7.27)
>>>>>>>>
>>>>>>>> On init with my polaris/raven1 system.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Tom
>>>>>>>> _______________________________________________
>>>>>>>> amd-gfx mailing list
>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180918/23eb8eea/attachment-0001.html>


More information about the amd-gfx mailing list