Regression on gfx8 with ring init
Tom St Denis
tom.stdenis at amd.com
Tue Sep 18 14:40:46 UTC 2018
On 2018-09-18 10:31 a.m., Christian König wrote:
> Well looks like interrupt processing is working perfectly fine.
>
> But looking at the error message once more I see that this actually
> affects ring number 9 and not the GFX ring.
>
> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the
> number?
>
> That must be some of the compute rings.
That's a bingo.
[ 32.231734] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:01:00.0
on minor 0
[ 32.233803] modprobe (3816) used greatest stack depth: 12464 bytes left
[ 35.266007] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB
test timed out.
[ 35.266373] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu:
failed testing IB on ring (kiq_2.1.0) 9 (-110).
[ 35.403034] [drm:process_one_work] *ERROR* ib ring test failed (-110).
Should point out that kfd still has the old fence logic:
[root at raven amd]# git grep enable_signaling
amdgpu/amdgpu_amdkfd_fence.c: * nofity when the BO is free to move.
fence_add_callback --> enable_signaling
amdgpu/amdgpu_amdkfd_fence.c: * --> amdgpu_amdkfd_fence.enable_signaling
amdgpu/amdgpu_amdkfd_fence.c: * amdgpu_amdkfd_fence.enable_signaling -
Start a work item that will quiesce
amdgpu/amdgpu_amdkfd_fence.c: * amdkfd_fence_enable_signaling - This
gets called when TTM wants to evict
amdgpu/amdgpu_amdkfd_fence.c:static bool
amdkfd_fence_enable_signaling(struct dma_fence *f)
amdgpu/amdgpu_amdkfd_fence.c: .enable_signaling =
amdkfd_fence_enable_signaling,
Tom
>
> Thanks,
> Christian.
>
> Am 18.09.2018 um 16:20 schrieb Tom St Denis:
>> On 2018-09-18 10:13 a.m., Christian König wrote:
>>> Mhm, there is no more failed IB-test in there isn't it?
>>
>> oh sorry I thought you wanted to test HEAD~ ... Attached is a log from
>> the tip of drm-next
>>
>> Tom
>>
>>>
>>> Christian.
>>>
>>> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>>>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>>>
>>>> Here's the log.
>>>>
>>>> Tom
>>>>
>>>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>>>> Odd I couldn't even boot my system with the dGPU as primary after
>>>>> rebuilding the kernel. It got hung up in the IOMMU driver (loads
>>>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it
>>>>> panic'ed before loading the network stack.
>>>>>
>>>>> Bizarre.
>>>>>
>>>>> I'll keep trying.
>>>>>
>>>>> Tom
>>>>>
>>>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>>>>>> Great, not sure if that is a good or a bad news.
>>>>>>>>
>>>>>>>> Anyway going to revert the change for now. Does anybody
>>>>>>>> volunteer to figure out why interrupts sometimes doesn't work
>>>>>>>> correctly on Raven?
>>>>>>>
>>>>>>> What does "doesn't work correctly?" My workstation is a Raven1
>>>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been
>>>>>>> perfectly stable (through suspend/resumes too I might add).
>>>>>>>
>>>>>>> Anything I could test with my devel raven?
>>>>>>
>>>>>> The problem seems to be that on some boards IH handling doesn't
>>>>>> work as it should.
>>>>>>
>>>>>> Can you try to disable the onboard graphics and try again?
>>>>>>
>>>>>> If that still doesn't work there is a DRM_DEBUG in
>>>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
>>>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>>>>>>
>>>>>> Thanks,
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>>>>>> This commit:
>>>>>>>>>
>>>>>>>>> [root at raven linux]# git bisect good
>>>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>>>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>>>>>> Author: Christian König <christian.koenig at amd.com>
>>>>>>>>> Date: Tue Sep 18 10:38:09 2018 +0200
>>>>>>>>>
>>>>>>>>> drm/amdgpu: remove fence fallback
>>>>>>>>>
>>>>>>>>> DC doesn't seem to have a fallback path either.
>>>>>>>>>
>>>>>>>>> So when interrupts doesn't work any more we are pretty much
>>>>>>>>> busted no
>>>>>>>>> matter what.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>>>> Reviewed-by: Chunming Zhou <david1.zhou at amd.com>
>>>>>>>>>
>>>>>>>>> Results in this:
>>>>>>>>>
>>>>>>>>> [ 24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
>>>>>>>>> 0000:07:00.0 on minor 1
>>>>>>>>> [ 24.335674] modprobe (3895) used greatest stack depth: 12600
>>>>>>>>> bytes left
>>>>>>>>> [ 26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
>>>>>>>>> amdgpu: IB test timed out.
>>>>>>>>> [ 26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
>>>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>>>>>>> [ 26.407885] [drm:process_one_work] *ERROR* ib ring test
>>>>>>>>> failed (-110).
>>>>>>>>> [ 28.506708] fuse init (API version 7.27)
>>>>>>>>>
>>>>>>>>> On init with my polaris/raven1 system.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Tom
>>>>>>>>> _______________________________________________
>>>>>>>>> amd-gfx mailing list
>>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the amd-gfx
mailing list