Regression with kernel 4.20 on armhf
Christian König
ckoenig.leichtzumerken at gmail.com
Wed Jan 2 12:05:46 UTC 2019
Hi Luis,
mhm, sounds like a timing issue. We have probably made something faster
during bootup in 4.20 and because of this you now see this issue more often.
If the bisection doesn't show any result can you try adding some
msleep(10) call at critical places in the driver code to narrow this down?
Officially we don't test/support ARM with the driver code, but in this
particular case we should probably investigate since it sounds like it
just doesn't happen on x86 because of different timing.
Thanks,
Christian.
Am 28.12.18 um 15:05 schrieb Luís Mendes:
> Hi Alex,
>
> Before all... Have a nice holidays! Happy new year!!
>
> - Okay, so it looks like sometimes the driver is able to enter
> graphical mode with the Polaris card, but most of the time it fails
> before with:
> [ 49.762704] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> timeout, signaled seq=2, emitted seq=3
>
> - This is something that is happening sporadically but in a less
> intensive way in 4.17, 4.18 and 4.19 kernels, so this is actually not
> a regression, but rather an existent issue, which maybe the patch
> "drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq ring tests
> sequence" solves. I tried to backport it to 4.20, but had no
> improvement. Need to try with the git version, or rc1.
>
> - This hang happens after the console is displayed in the screen, but
> before switching to graphical mode with X.
>
> - However if X is entered then the driver is stable and can be used
> for long periods.
>
> Regards,
> Luís Mendes
>
> On Tue, Dec 18, 2018 at 11:16 PM Luís Mendes <luis.p.mendes at gmail.com> wrote:
>> Hi Alex,
>>
>> I am already using drm_arch_can_wc_memory() set to false.
>> I will try to bisect...
>>
>> Regards,
>> Luís
>>
>> On Tue, Dec 18, 2018 at 7:03 PM Alex Deucher <alexdeucher at gmail.com> wrote:
>>> On Tue, Dec 18, 2018 at 8:58 AM Luís Mendes <luis.p.mendes at gmail.com> wrote:
>>>> Hi Christian,
>>>>
>>>> I've been using a Sapphire RX 550 and a Sapphire RX 460 on a custom
>>>> armhf board that runs well with Linux 4.19.9 at least, but now
>>>> starting with Linux kernel 4.20, I'm having a gpu hang, right after
>>>> the console being displayed, but before entering in graphical mode,
>>>> when starting X session.
>>>> I'm only reporting this now, because there was a PCI commit for mvebu
>>>> that also entered for linux-4.20 that caused a kernel oops during
>>>> pci_map_rom call in amdgpu initialization code. I've reverted that
>>>> patch, but now amdgpu is hanging.
>>> It would be useful if you could bisect. This is the first I've heard
>>> of amdgpu working on an ARM board without write combining (WC)
>>> disabled. You might check to see if disabling WC helps. Return false
>>> in drm_arch_can_wc_memory().
>>>
>>> Alex
>>>
>>>>
>>>> [ 24.801861] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
>>>> timeout, signaled seq=2, emitted seq=3
>>>>
>>>> 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>>>> [AMD/ATI] Baffin [Polaris11] (rev ff) (prog-if 00 [VGA controller])
>>>> Subsystem: Sapphire Technology Limited Baffin [Radeon RX 560]
>>>> Flags: bus master, fast devsel, latency 0, IRQ 51
>>>> Memory at d0000000 (64-bit, prefetchable) [size=256M]
>>>> Memory at e0000000 (64-bit, prefetchable) [size=2M]
>>>> I/O ports at 10000 [size=256]
>>>> Memory at e0200000 (32-bit, non-prefetchable) [size=256K]
>>>> Expansion ROM at e0240000 [disabled] [size=128K]
>>>> Capabilities: <access denied>
>>>> Kernel driver in use: amdgpu
>>>> Kernel modules: amdgpu
>>>>
>>>> dmesg follows in attachment.
>>>>
>>>> Regards,
>>>> Luís
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list