Regression with kernel 4.20 on armhf
Luís Mendes
luis.p.mendes at gmail.com
Fri Jan 4 15:32:20 UTC 2019
Hi Alex, Christian,
I've tested amd-staging-drm-next at commit
9698024e8a191481321574bec1fe886bbce797cf - drm/amdgpu: Cleanup 2
compiler warnings,
and now RX 550 Polaris 12 still hangs in ring gfx with XOrg, but a gpu
recovery is now performed and works, except that VRAM contents are
lost and screen image becomes corrupted. Regarding the hang maybe XOrg
needs some patch, this only seems to happen with lightdm, but not with
gdm3... I am using XOrg 1.19.6-1ubuntu4.2 with
xserver-xorg-video-amdgpu 18.0.1-1 from Ubuntu 18.04.1 LTS.
Maybe this is not a driver issue, but rather a mesa or XOrg issue,
since something is sent to the compute/gfx unit that causes the GPU to
hang, so it is not only timing sensitive, but is mainly because of
wrong openGL commands, that drive the GPU into an invalid state
Regarding the GPU recovery, it looks quite promising, except for the VRAM issue.
Jan 4 14:01:44 picolo kernel: [ 45.922728] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=2, emitted seq=3
Jan 4 14:01:44 picolo kernel: [ 45.932217] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process Xorg pid 571 thread
Xorg:cs0 pid 587
Jan 4 14:01:44 picolo kernel: [ 45.943148] amdgpu 0000:02:00.0: GPU
reset begin!
...
Jan 4 14:01:46 picolo kernel: [ 48.202935] amdgpu 0000:02:00.0: GPU
pci config reset
Jan 4 14:01:46 picolo kernel: [ 48.337369] amdgpu 0000:02:00.0: GPU
reset succeeded, trying to resume
...
Jan 4 14:01:46 picolo kernel: [ 48.339973] [drm] PCIE GART of 256M
enabled (table at 0x000000F400000000).
Jan 4 14:01:46 picolo kernel: [ 48.340131]
[drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
In attachment follow the kernel log from the hang up to the recovery.
Regards,
Luís
In attachment follows the kernel log with the gpu recovery event.
On Thu, Jan 3, 2019 at 2:31 PM Luís Mendes <luis.p.mendes at gmail.com> wrote:
>
> Hi Alex,
>
> I've made a backport of that patch to linux 4.20, but didn't notice
> any improvements. It looks like, from the logs with drm debug
> messages, that the issue is happening after the ring tests, while
> entering X session.
> Thanks for your suggestion, anyway. I'll try that again once
> linux-4.21-rc1 comes out.
>
> Regards,
> Luís
>
> On Thu, Jan 3, 2019 at 2:12 PM Alex Deucher <alexdeucher at gmail.com> wrote:
> >
> > Does this patch help by any chance?
> >
> > https://cgit.freedesktop.org/~agd5f/linux/commit/?h=amd-staging-drm-next&id=5e01c09ce3b7263d88873105f21a82eda904664b
> >
> > Alex
> >
> > On Thu, Jan 3, 2019 at 7:14 AM Luís Mendes <luis.p.mendes at gmail.com> wrote:
> > >
> > > Hi Christian, Alex,
> > >
> > > I've set the kernel command line with drm.debug=0xf, and I see what
> > > could be a race condition that triggers the failure, and from what I
> > > see the critical path is quite after the ring tests. This happens on
> > > ARM but maybe what is also affecting my TYAN S7002 and S7025, as the
> > > failure symptom seems similar, except it is failing every time on the
> > > TYANs. While on an AsRock Rack EP2C602 with Xeon E5 v2 it is working
> > > fine.
> > >
> > > Below follow the two log excerpts, the first from a working
> > > initialization attempt, and the second from a failed initialization
> > > attempt. Both attemps were made with with kernel vanilla 4.20.0 on the
> > > same armhf system. Full dmesg logs attached. Please ignored the EDID
> > > errors, as I'm having a problem with this particular CROWN TV. The
> > > EDID gets overwritten at every boot when connected to any Radeon RX
> > > card that I have tried, while with Radeon R7 240 the EDID is not
> > > corrupted on boot, but that's another story.
> > >
> > > Meanwhile I will try to find the concrete racing condition. It is
> > > noticeable that for some reason the kernel thread
> > > [drm:amdgpu_ih_process [amdgpu]] doesn't receive updates due to the
> > > gpu hang and only one EOP irq is recevied on the bad boot attempt,
> > > while on the good attempt 3 EOP irqs are triggered.
> > >
> > > Good attempt (critical log excerpt from kern_good.log):
> > > Jan 3 11:28:03 picolo kernel: [ 39.845747] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16032, wptr 16048
> > > Jan 3 11:28:03 picolo kernel: [ 39.845987]
> > > [drm:drm_calc_vbltimestamp_from_scanoutpos [drm]] crtc 0: Noisy
> > > timestamp 26 us > 20 us [3 reps].
> > > Jan 3 11:28:03 picolo kernel: [ 39.850430] [drm:drm_ioctl [drm]]
> > > pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > > Jan 3 11:28:03 picolo kernel: [ 39.850489] [drm:drm_ioctl [drm]]
> > > pid=627, dev=0xe200, auth=1, AMDGPU_CS
> > > Jan 3 11:28:03 picolo kernel: [ 39.850697] [drm:drm_ioctl [drm]]
> > > pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > > Jan 3 11:28:03 picolo kernel: [ 39.850943] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16048, wptr 16080
> > > Jan 3 11:28:03 picolo kernel: [ 39.850973] [drm:drm_ioctl [drm]]
> > > pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > > Jan 3 11:28:03 picolo kernel: [ 39.851133]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:28:03 picolo kernel: [ 39.851159] [drm:drm_ioctl [drm]]
> > > pid=627, dev=0xe200, auth=1, AMDGPU_CS
> > > Jan 3 11:28:03 picolo kernel: [ 39.851333]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:28:03 picolo kernel: [ 39.851360] [drm:drm_ioctl [drm]]
> > > pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > > Jan 3 11:28:03 picolo kernel: [ 39.851513] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16080, wptr 16096
> > > Jan 3 11:28:03 picolo kernel: [ 39.851657]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:28:03 picolo kernel: [ 39.851810] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16096, wptr 16096
> > > Jan 3 11:28:03 picolo kernel: [ 39.851950] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16096, wptr 16128
> > > Jan 3 11:28:03 picolo kernel: [ 39.852091]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:28:03 picolo kernel: [ 39.852239]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:28:03 picolo kernel: [ 39.852265] [drm:drm_ioctl [drm]]
> > > pid=605, dev=0xe200, auth=1, AMDGPU_WAIT_CS
> > > Jan 3 11:28:03 picolo kernel: [ 39.852411] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16128, wptr 16128
> > > Jan 3 11:28:03 picolo kernel: [ 39.852605] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16128, wptr 16144
> > > Jan 3 11:28:03 picolo kernel: [ 39.852754]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:28:03 picolo kernel: [ 39.852905] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16144, wptr 16160
> > > Jan 3 11:28:03 picolo kernel: [ 39.853049]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:28:03 picolo kernel: [ 39.853210] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16160, wptr 16160
> > > Jan 3 11:28:03 picolo kernel: [ 39.853418] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16160, wptr 16176
> > > Jan 3 11:28:03 picolo kernel: [ 39.853582] [drm:gfx_v8_0_eop_irq
> > > [amdgpu]] IH: CP EOP
> > > Jan 3 11:28:03 picolo kernel: [ 39.853752] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16176, wptr 16208
> > > Jan 3 11:28:03 picolo kernel: [ 39.853901] [drm:gfx_v8_0_eop_irq
> > > [amdgpu]] IH: CP EOP
> > > Jan 3 11:28:03 picolo kernel: [ 39.854044] [drm:gfx_v8_0_eop_irq
> > > [amdgpu]] IH: CP EOP
> > > Jan 3 11:28:03 picolo kernel: [ 39.854205] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16208, wptr 16208
> > > Jan 3 11:28:03 picolo kernel: [ 39.857057] [drm:drm_ioctl [drm]]
> > > pid=605, dev=0xe200, auth=1, DRM_IOCTL_MODE_SETCRTC
> > > Jan 3 11:28:03 picolo kernel: [ 39.857089] [drm:drm_mode_setcrtc
> > > [drm]] [CRTC:45:crtc-1]
> > > Jan 3 11:28:03 picolo kernel: [ 39.857341]
> > > [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
> > > Jan 3 11:28:03 picolo kernel: [ 39.857508]
> > > [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:1
> > > crtc_state_flags: enable:0, active:0, planes_changed:0,
> > > mode_changed:0,active_changed:0,connectors_changed:0
> > > Jan 3 11:28:03 picolo kernel: [ 39.857559] [drm:drm_ioctl [drm]]
> > > pid=605, dev=0xe200, auth=1, DRM_IOCTL_MODE_SETCRTC
> > > Jan 3 11:28:03 picolo kernel: [ 39.857587] [drm:drm_mode_setcrtc
> > > [drm]] [CRTC:47:crtc-2]
> > > Jan 3 11:28:03 picolo kernel: [ 39.857769]
> > > [drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
> > > Jan 3 11:28:03 picolo kernel: [ 39.857944]
> > > [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:2
> > > crtc_state_flags: enable:0, active:0, planes_changed:0,
> > > mode_changed:0,active_changed:0,connectors_changed:0
> > > Jan 3 11:28:03 picolo kernel: [ 39.857992] [drm:drm_ioctl [drm]]
> > > pid=605, dev=0xe200, auth=1, DRM_IOCTL_MODE_SETCRTC
> > > Jan 3 11:28:03 picolo kernel: [ 39.858020] [drm:drm_mode_setcrtc
> > > [drm]] [CRTC:49:crtc-3]
> > >
> > > BAD attempt (critical log excerpt from kern_bad.log):
> > > Jan 3 11:39:23 picolo kernel: [ 39.599313] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14240, wptr 14256
> > > Jan 3 11:39:23 picolo kernel: [ 39.599496]
> > > [drm:drm_calc_vbltimestamp_from_scanoutpos [drm]] crtc 0: Noisy
> > > timestamp 26 us > 20 us [3 reps].
> > > Jan 3 11:39:23 picolo kernel: [ 39.599599] [drm:drm_ioctl [drm]]
> > > pid=663, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > > Jan 3 11:39:23 picolo kernel: [ 39.599640] [drm:drm_ioctl [drm]]
> > > pid=663, dev=0xe200, auth=1, AMDGPU_CS
> > > Jan 3 11:39:23 picolo kernel: [ 39.599992] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14256, wptr 14272
> > > Jan 3 11:39:23 picolo kernel: [ 39.600142]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:39:23 picolo kernel: [ 39.600297] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14272, wptr 14304
> > > Jan 3 11:39:23 picolo kernel: [ 39.600439]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:39:23 picolo kernel: [ 39.600580]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:39:23 picolo kernel: [ 39.600725] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14304, wptr 14304
> > > Jan 3 11:39:23 picolo kernel: [ 39.600795] [drm:drm_ioctl [drm]]
> > > pid=663, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > > Jan 3 11:39:23 picolo kernel: [ 39.600846] [drm:drm_ioctl [drm]]
> > > pid=663, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > > Jan 3 11:39:23 picolo kernel: [ 39.600881] [drm:drm_ioctl [drm]]
> > > pid=663, dev=0xe200, auth=1, AMDGPU_CS
> > > Jan 3 11:39:23 picolo kernel: [ 39.601019] [drm:drm_ioctl [drm]]
> > > pid=663, dev=0xe200, auth=1, AMDGPU_BO_LIST
> > > Jan 3 11:39:23 picolo kernel: [ 39.601074] [drm:drm_ioctl [drm]]
> > > pid=630, dev=0xe200, auth=1, AMDGPU_WAIT_CS
> > > Jan 3 11:39:23 picolo kernel: [ 39.601269] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14304, wptr 14320
> > > Jan 3 11:39:23 picolo kernel: [ 39.601416]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:39:23 picolo kernel: [ 39.601569] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14320, wptr 14384
> > > Jan 3 11:39:23 picolo kernel: [ 39.601595] [drm:drm_ioctl [drm]]
> > > pid=630, dev=0xe200, auth=1, AMDGPU_WAIT_CS
> > > Jan 3 11:39:23 picolo kernel: [ 39.601738]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:39:23 picolo kernel: [ 39.601880]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:39:23 picolo kernel: [ 39.602029]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:39:23 picolo kernel: [ 39.602171]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:39:23 picolo kernel: [ 39.602313] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14384, wptr 14384
> > > Jan 3 11:39:23 picolo kernel: [ 39.602500] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14384, wptr 14400
> > > Jan 3 11:39:23 picolo kernel: [ 39.602649]
> > > [drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
> > > Jan 3 11:39:23 picolo kernel: [ 39.602887] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14400, wptr 14416
> > > Jan 3 11:39:23 picolo kernel: [ 39.603054] [drm:gfx_v8_0_eop_irq
> > > [amdgpu]] IH: CP EOP
> > > Jan 3 11:39:23 picolo kernel: [ 39.615864] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14416, wptr 14432
> > > Jan 3 11:39:23 picolo kernel: [ 39.632542] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14432, wptr 14448
> > > Jan 3 11:39:23 picolo kernel: [ 39.649264] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14448, wptr 14464
> > > Jan 3 11:39:23 picolo kernel: [ 39.665943] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14464, wptr 14480
> > > Jan 3 11:39:23 picolo kernel: [ 39.682610] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14480, wptr 14496
> > > Jan 3 11:39:23 picolo kernel: [ 39.699285] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14496, wptr 14512
> > > Jan 3 11:39:23 picolo kernel: [ 39.715955] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14512, wptr 14528
> > > Jan 3 11:39:23 picolo kernel: [ 39.732629] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14528, wptr 14544
> > > Jan 3 11:39:23 picolo kernel: [ 39.749313] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14544, wptr 14560
> > > Jan 3 11:39:23 picolo kernel: [ 39.765995] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14560, wptr 14576
> > > Jan 3 11:39:23 picolo kernel: [ 39.782667] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14576, wptr 14592
> > > Jan 3 11:39:23 picolo kernel: [ 39.799363] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14592, wptr 14608
> > > Jan 3 11:39:23 picolo kernel: [ 39.816043] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14608, wptr 14624
> > > Jan 3 11:39:23 picolo kernel: [ 39.832734] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14624, wptr 14640
> > > Jan 3 11:39:23 picolo kernel: [ 39.849426] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14640, wptr 14656
> > > Jan 3 11:39:23 picolo kernel: [ 39.866081] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14656, wptr 14672
> > > Jan 3 11:39:23 picolo kernel: [ 39.882822] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14672, wptr 14688
> > > Jan 3 11:39:23 picolo kernel: [ 39.899455] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14688, wptr 14704
> > > Jan 3 11:39:23 picolo kernel: [ 39.916190] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14704, wptr 14720
> > > Jan 3 11:39:23 picolo kernel: [ 39.932885] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14720, wptr 14736
> > > Jan 3 11:39:23 picolo kernel: [ 39.949589] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14736, wptr 14752
> > > Jan 3 11:39:23 picolo kernel: [ 39.966238] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14752, wptr 14768
> > > Jan 3 11:39:23 picolo kernel: [ 39.982869] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14768, wptr 14784
> > > Jan 3 11:39:23 picolo kernel: [ 39.999609] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14784, wptr 14800
> > > Jan 3 11:39:23 picolo kernel: [ 40.016286] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14800, wptr 14816
> > > Jan 3 11:39:23 picolo kernel: [ 40.033045] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14816, wptr 14832
> > > Jan 3 11:39:23 picolo kernel: [ 40.049716] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14832, wptr 14848
> > > Jan 3 11:39:23 picolo kernel: [ 40.066446] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14848, wptr 14864
> > > Jan 3 11:39:23 picolo kernel: [ 40.083031] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14864, wptr 14880
> > > Jan 3 11:39:23 picolo kernel: [ 40.099765] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14880, wptr 14896
> > > Jan 3 11:39:23 picolo kernel: [ 40.116394] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14896, wptr 14912
> > > Jan 3 11:39:23 picolo kernel: [ 40.133133] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14912, wptr 14928
> > > Jan 3 11:39:23 picolo kernel: [ 40.149743] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14928, wptr 14944
> > > Jan 3 11:39:23 picolo kernel: [ 40.166426] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14944, wptr 14960
> > > Jan 3 11:39:23 picolo kernel: [ 40.183178] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14960, wptr 14976
> > > Jan 3 11:39:23 picolo kernel: [ 40.199788] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14976, wptr 14992
> > > Jan 3 11:39:23 picolo kernel: [ 40.216507] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 14992, wptr 15008
> > > Jan 3 11:39:23 picolo kernel: [ 40.233150] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15008, wptr 15024
> > > Jan 3 11:39:23 picolo kernel: [ 40.249815] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15024, wptr 15040
> > > Jan 3 11:39:23 picolo kernel: [ 40.266454] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15040, wptr 15056
> > > Jan 3 11:39:23 picolo kernel: [ 40.283123] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15056, wptr 15072
> > > Jan 3 11:39:23 picolo kernel: [ 40.299804] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15072, wptr 15088
> > > Jan 3 11:39:23 picolo kernel: [ 40.316483] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15088, wptr 15104
> > > Jan 3 11:39:23 picolo kernel: [ 40.333164] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15104, wptr 15120
> > > Jan 3 11:39:23 picolo kernel: [ 40.349843] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15120, wptr 15136
> > > Jan 3 11:39:23 picolo kernel: [ 40.366523] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15136, wptr 15152
> > > Jan 3 11:39:23 picolo kernel: [ 40.383200] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15152, wptr 15168
> > > Jan 3 11:39:23 picolo kernel: [ 40.399878] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15168, wptr 15184
> > > Jan 3 11:39:23 picolo kernel: [ 40.416561] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15184, wptr 15200
> > > Jan 3 11:39:23 picolo kernel: [ 40.433245] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15200, wptr 15216
> > > Jan 3 11:39:23 picolo kernel: [ 40.449925] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15216, wptr 15232
> > > Jan 3 11:39:23 picolo kernel: [ 40.466613] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15232, wptr 15248
> > > Jan 3 11:39:24 picolo kernel: [ 40.483291] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15248, wptr 15264
> > > Jan 3 11:39:24 picolo kernel: [ 40.499971] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15264, wptr 15280
> > > Jan 3 11:39:24 picolo kernel: [ 40.516652] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15280, wptr 15296
> > > Jan 3 11:39:24 picolo kernel: [ 40.533336] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15296, wptr 15312
> > > Jan 3 11:39:24 picolo kernel: [ 40.550016] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15312, wptr 15328
> > > Jan 3 11:39:24 picolo kernel: [ 40.566715] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15328, wptr 15344
> > > Jan 3 11:39:24 picolo kernel: [ 40.583390] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15344, wptr 15360
> > > Jan 3 11:39:24 picolo kernel: [ 40.600065] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15360, wptr 15376
> > > Jan 3 11:39:24 picolo kernel: [ 40.616745] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15376, wptr 15392
> > > Jan 3 11:39:24 picolo kernel: [ 40.633432] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15392, wptr 15408
> > > Jan 3 11:39:24 picolo kernel: [ 40.650113] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15408, wptr 15424
> > > Jan 3 11:39:24 picolo kernel: [ 40.666790] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15424, wptr 15440
> > > Jan 3 11:39:24 picolo kernel: [ 40.683477] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15440, wptr 15456
> > > Jan 3 11:39:24 picolo kernel: [ 40.700157] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15456, wptr 15472
> > > Jan 3 11:39:24 picolo kernel: [ 40.716836] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15472, wptr 15488
> > > Jan 3 11:39:24 picolo kernel: [ 40.733522] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15488, wptr 15504
> > > Jan 3 11:39:24 picolo kernel: [ 40.750203] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15504, wptr 15520
> > > Jan 3 11:39:24 picolo kernel: [ 40.766882] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15520, wptr 15536
> > > Jan 3 11:39:24 picolo kernel: [ 40.783563] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15536, wptr 15552
> > > Jan 3 11:39:24 picolo kernel: [ 40.800247] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15552, wptr 15568
> > > Jan 3 11:39:24 picolo kernel: [ 40.816929] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15568, wptr 15584
> > > Jan 3 11:39:24 picolo kernel: [ 40.833633] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15584, wptr 15600
> > > Jan 3 11:39:24 picolo kernel: [ 40.850305] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15600, wptr 15616
> > > Jan 3 11:39:24 picolo kernel: [ 40.867011] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15616, wptr 15632
> > > Jan 3 11:39:24 picolo kernel: [ 40.883676] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15632, wptr 15648
> > > Jan 3 11:39:24 picolo kernel: [ 40.900346] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15648, wptr 15664
> > > Jan 3 11:39:24 picolo kernel: [ 40.917026] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15664, wptr 15680
> > > Jan 3 11:39:24 picolo kernel: [ 40.933716] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15680, wptr 15696
> > > Jan 3 11:39:24 picolo kernel: [ 40.950390] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15696, wptr 15712
> > > Jan 3 11:39:24 picolo kernel: [ 40.967070] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15712, wptr 15728
> > > Jan 3 11:39:24 picolo kernel: [ 40.983757] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15728, wptr 15744
> > > Jan 3 11:39:24 picolo kernel: [ 41.000438] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15744, wptr 15760
> > > Jan 3 11:39:24 picolo kernel: [ 41.017115] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15760, wptr 15776
> > > Jan 3 11:39:24 picolo kernel: [ 41.033812] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15776, wptr 15792
> > > Jan 3 11:39:24 picolo kernel: [ 41.050485] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15792, wptr 15808
> > > Jan 3 11:39:24 picolo kernel: [ 41.067162] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15808, wptr 15824
> > > Jan 3 11:39:24 picolo kernel: [ 41.083845] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15824, wptr 15840
> > > Jan 3 11:39:24 picolo kernel: [ 41.100523] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15840, wptr 15856
> > > Jan 3 11:39:24 picolo kernel: [ 41.117205] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15856, wptr 15872
> > > Jan 3 11:39:24 picolo kernel: [ 41.133904] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15872, wptr 15888
> > > Jan 3 11:39:24 picolo kernel: [ 41.150579] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15888, wptr 15904
> > > Jan 3 11:39:24 picolo kernel: [ 41.167255] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15904, wptr 15920
> > > Jan 3 11:39:24 picolo kernel: [ 41.183933] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15920, wptr 15936
> > > Jan 3 11:39:24 picolo kernel: [ 41.200614] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15936, wptr 15952
> > > Jan 3 11:39:24 picolo kernel: [ 41.217295] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15952, wptr 15968
> > > Jan 3 11:39:24 picolo kernel: [ 41.233984] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15968, wptr 15984
> > > Jan 3 11:39:24 picolo kernel: [ 41.250663] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 15984, wptr 16000
> > > Jan 3 11:39:24 picolo kernel: [ 41.267347] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16000, wptr 16016
> > > Jan 3 11:39:24 picolo kernel: [ 41.284027] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16016, wptr 16032
> > > Jan 3 11:39:24 picolo kernel: [ 41.300706] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16032, wptr 16048
> > > Jan 3 11:39:24 picolo kernel: [ 41.317388] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16048, wptr 16064
> > > Jan 3 11:39:24 picolo kernel: [ 41.334071] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16064, wptr 16080
> > > Jan 3 11:39:24 picolo kernel: [ 41.350752] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16080, wptr 16096
> > > Jan 3 11:39:24 picolo kernel: [ 41.367442] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16096, wptr 16112
> > > Jan 3 11:39:24 picolo kernel: [ 41.384122] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16112, wptr 16128
> > > Jan 3 11:39:24 picolo kernel: [ 41.400801] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16128, wptr 16144
> > > Jan 3 11:39:24 picolo kernel: [ 41.417480] [drm:amdgpu_ih_process
> > > [amdgpu]] amdgpu_ih_process: rptr 16144, wptr 16160
> > > Jan 3 11:39:24 picolo kernel: [ 41.432501] [drm:vblank_disable_fn
> > > [drm]] disabling vblank on crtc 0
> > > Jan 3 11:41:22 picolo kernel: [ 49.762715] [drm:amdgpu_job_timedout
> > > [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2, emitted seq=3
> > > Jan 3 11:41:22 picolo kernel: [ 49.772047] [drm] GPU recovery disabled.
> > >
> > > Regards,
> > > Luís
> > >
> > > On Wed, Jan 2, 2019 at 12:05 PM Christian König
> > > <ckoenig.leichtzumerken at gmail.com> wrote:
> > > >
> > > > Hi Luis,
> > > >
> > > > mhm, sounds like a timing issue. We have probably made something faster
> > > > during bootup in 4.20 and because of this you now see this issue more often.
> > > >
> > > > If the bisection doesn't show any result can you try adding some
> > > > msleep(10) call at critical places in the driver code to narrow this down?
> > > >
> > > > Officially we don't test/support ARM with the driver code, but in this
> > > > particular case we should probably investigate since it sounds like it
> > > > just doesn't happen on x86 because of different timing.
> > > >
> > > > Thanks,
> > > > Christian.
> > > >
> > > > Am 28.12.18 um 15:05 schrieb Luís Mendes:
> > > > > Hi Alex,
> > > > >
> > > > > Before all... Have a nice holidays! Happy new year!!
> > > > >
> > > > > - Okay, so it looks like sometimes the driver is able to enter
> > > > > graphical mode with the Polaris card, but most of the time it fails
> > > > > before with:
> > > > > [ 49.762704] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> > > > > timeout, signaled seq=2, emitted seq=3
> > > > >
> > > > > - This is something that is happening sporadically but in a less
> > > > > intensive way in 4.17, 4.18 and 4.19 kernels, so this is actually not
> > > > > a regression, but rather an existent issue, which maybe the patch
> > > > > "drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq ring tests
> > > > > sequence" solves. I tried to backport it to 4.20, but had no
> > > > > improvement. Need to try with the git version, or rc1.
> > > > >
> > > > > - This hang happens after the console is displayed in the screen, but
> > > > > before switching to graphical mode with X.
> > > > >
> > > > > - However if X is entered then the driver is stable and can be used
> > > > > for long periods.
> > > > >
> > > > > Regards,
> > > > > Luís Mendes
> > > > >
> > > > > On Tue, Dec 18, 2018 at 11:16 PM Luís Mendes <luis.p.mendes at gmail.com> wrote:
> > > > >> Hi Alex,
> > > > >>
> > > > >> I am already using drm_arch_can_wc_memory() set to false.
> > > > >> I will try to bisect...
> > > > >>
> > > > >> Regards,
> > > > >> Luís
> > > > >>
> > > > >> On Tue, Dec 18, 2018 at 7:03 PM Alex Deucher <alexdeucher at gmail.com> wrote:
> > > > >>> On Tue, Dec 18, 2018 at 8:58 AM Luís Mendes <luis.p.mendes at gmail.com> wrote:
> > > > >>>> Hi Christian,
> > > > >>>>
> > > > >>>> I've been using a Sapphire RX 550 and a Sapphire RX 460 on a custom
> > > > >>>> armhf board that runs well with Linux 4.19.9 at least, but now
> > > > >>>> starting with Linux kernel 4.20, I'm having a gpu hang, right after
> > > > >>>> the console being displayed, but before entering in graphical mode,
> > > > >>>> when starting X session.
> > > > >>>> I'm only reporting this now, because there was a PCI commit for mvebu
> > > > >>>> that also entered for linux-4.20 that caused a kernel oops during
> > > > >>>> pci_map_rom call in amdgpu initialization code. I've reverted that
> > > > >>>> patch, but now amdgpu is hanging.
> > > > >>> It would be useful if you could bisect. This is the first I've heard
> > > > >>> of amdgpu working on an ARM board without write combining (WC)
> > > > >>> disabled. You might check to see if disabling WC helps. Return false
> > > > >>> in drm_arch_can_wc_memory().
> > > > >>>
> > > > >>> Alex
> > > > >>>
> > > > >>>>
> > > > >>>> [ 24.801861] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> > > > >>>> timeout, signaled seq=2, emitted seq=3
> > > > >>>>
> > > > >>>> 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> > > > >>>> [AMD/ATI] Baffin [Polaris11] (rev ff) (prog-if 00 [VGA controller])
> > > > >>>> Subsystem: Sapphire Technology Limited Baffin [Radeon RX 560]
> > > > >>>> Flags: bus master, fast devsel, latency 0, IRQ 51
> > > > >>>> Memory at d0000000 (64-bit, prefetchable) [size=256M]
> > > > >>>> Memory at e0000000 (64-bit, prefetchable) [size=2M]
> > > > >>>> I/O ports at 10000 [size=256]
> > > > >>>> Memory at e0200000 (32-bit, non-prefetchable) [size=256K]
> > > > >>>> Expansion ROM at e0240000 [disabled] [size=128K]
> > > > >>>> Capabilities: <access denied>
> > > > >>>> Kernel driver in use: amdgpu
> > > > >>>> Kernel modules: amdgpu
> > > > >>>>
> > > > >>>> dmesg follows in attachment.
> > > > >>>>
> > > > >>>> Regards,
> > > > >>>> Luís
> > > > >>>> _______________________________________________
> > > > >>>> amd-gfx mailing list
> > > > >>>> amd-gfx at lists.freedesktop.org
> > > > >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > > > > _______________________________________________
> > > > > amd-gfx mailing list
> > > > > amd-gfx at lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > > >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kern_gpu_recover.log
Type: text/x-log
Size: 282909 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190104/00705e3c/attachment-0001.bin>
More information about the amd-gfx
mailing list