Regression with kernel 4.20 on armhf

Luís Mendes luis.p.mendes at gmail.com
Thu Jan 3 12:14:25 UTC 2019


Hi Christian, Alex,

I've set the kernel command line with drm.debug=0xf, and I see what
could be a race condition that triggers the failure, and from what I
see the critical path is quite after the ring tests. This happens on
ARM but maybe what is also affecting my TYAN S7002 and S7025, as the
failure symptom seems similar, except it is failing every time on the
TYANs.  While on an AsRock Rack EP2C602 with Xeon E5 v2 it is working
fine.

Below follow the two log excerpts, the first from a working
initialization attempt, and the second from a failed initialization
attempt. Both attemps were made with with kernel vanilla 4.20.0 on the
same armhf system. Full dmesg logs attached. Please ignored the EDID
errors, as I'm having a problem with this particular CROWN TV. The
EDID gets overwritten at every boot when connected to any Radeon RX
card that I have tried, while with Radeon R7 240 the EDID is not
corrupted on boot, but that's another story.

Meanwhile I will try to find the concrete racing condition. It is
noticeable that for some reason the kernel thread
[drm:amdgpu_ih_process [amdgpu]] doesn't receive updates due to the
gpu hang and only one EOP irq is recevied on the bad boot attempt,
while on the good attempt 3 EOP irqs are triggered.

Good attempt (critical log excerpt from kern_good.log):
Jan  3 11:28:03 picolo kernel: [   39.845747] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16032, wptr 16048
Jan  3 11:28:03 picolo kernel: [   39.845987]
[drm:drm_calc_vbltimestamp_from_scanoutpos [drm]] crtc 0: Noisy
timestamp 26 us > 20 us [3 reps].
Jan  3 11:28:03 picolo kernel: [   39.850430] [drm:drm_ioctl [drm]]
pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
Jan  3 11:28:03 picolo kernel: [   39.850489] [drm:drm_ioctl [drm]]
pid=627, dev=0xe200, auth=1, AMDGPU_CS
Jan  3 11:28:03 picolo kernel: [   39.850697] [drm:drm_ioctl [drm]]
pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
Jan  3 11:28:03 picolo kernel: [   39.850943] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16048, wptr 16080
Jan  3 11:28:03 picolo kernel: [   39.850973] [drm:drm_ioctl [drm]]
pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
Jan  3 11:28:03 picolo kernel: [   39.851133]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:28:03 picolo kernel: [   39.851159] [drm:drm_ioctl [drm]]
pid=627, dev=0xe200, auth=1, AMDGPU_CS
Jan  3 11:28:03 picolo kernel: [   39.851333]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:28:03 picolo kernel: [   39.851360] [drm:drm_ioctl [drm]]
pid=627, dev=0xe200, auth=1, AMDGPU_BO_LIST
Jan  3 11:28:03 picolo kernel: [   39.851513] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16080, wptr 16096
Jan  3 11:28:03 picolo kernel: [   39.851657]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:28:03 picolo kernel: [   39.851810] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16096, wptr 16096
Jan  3 11:28:03 picolo kernel: [   39.851950] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16096, wptr 16128
Jan  3 11:28:03 picolo kernel: [   39.852091]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:28:03 picolo kernel: [   39.852239]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:28:03 picolo kernel: [   39.852265] [drm:drm_ioctl [drm]]
pid=605, dev=0xe200, auth=1, AMDGPU_WAIT_CS
Jan  3 11:28:03 picolo kernel: [   39.852411] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16128, wptr 16128
Jan  3 11:28:03 picolo kernel: [   39.852605] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16128, wptr 16144
Jan  3 11:28:03 picolo kernel: [   39.852754]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:28:03 picolo kernel: [   39.852905] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16144, wptr 16160
Jan  3 11:28:03 picolo kernel: [   39.853049]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:28:03 picolo kernel: [   39.853210] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16160, wptr 16160
Jan  3 11:28:03 picolo kernel: [   39.853418] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16160, wptr 16176
Jan  3 11:28:03 picolo kernel: [   39.853582] [drm:gfx_v8_0_eop_irq
[amdgpu]] IH: CP EOP
Jan  3 11:28:03 picolo kernel: [   39.853752] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16176, wptr 16208
Jan  3 11:28:03 picolo kernel: [   39.853901] [drm:gfx_v8_0_eop_irq
[amdgpu]] IH: CP EOP
Jan  3 11:28:03 picolo kernel: [   39.854044] [drm:gfx_v8_0_eop_irq
[amdgpu]] IH: CP EOP
Jan  3 11:28:03 picolo kernel: [   39.854205] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16208, wptr 16208
Jan  3 11:28:03 picolo kernel: [   39.857057] [drm:drm_ioctl [drm]]
pid=605, dev=0xe200, auth=1, DRM_IOCTL_MODE_SETCRTC
Jan  3 11:28:03 picolo kernel: [   39.857089] [drm:drm_mode_setcrtc
[drm]] [CRTC:45:crtc-1]
Jan  3 11:28:03 picolo kernel: [   39.857341]
[drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
Jan  3 11:28:03 picolo kernel: [   39.857508]
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:1
crtc_state_flags: enable:0, active:0, planes_changed:0,
mode_changed:0,active_changed:0,connectors_changed:0
Jan  3 11:28:03 picolo kernel: [   39.857559] [drm:drm_ioctl [drm]]
pid=605, dev=0xe200, auth=1, DRM_IOCTL_MODE_SETCRTC
Jan  3 11:28:03 picolo kernel: [   39.857587] [drm:drm_mode_setcrtc
[drm]] [CRTC:47:crtc-2]
Jan  3 11:28:03 picolo kernel: [   39.857769]
[drm:dm_plane_helper_prepare_fb [amdgpu]] No FB bound
Jan  3 11:28:03 picolo kernel: [   39.857944]
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] amdgpu_crtc id:2
crtc_state_flags: enable:0, active:0, planes_changed:0,
mode_changed:0,active_changed:0,connectors_changed:0
Jan  3 11:28:03 picolo kernel: [   39.857992] [drm:drm_ioctl [drm]]
pid=605, dev=0xe200, auth=1, DRM_IOCTL_MODE_SETCRTC
Jan  3 11:28:03 picolo kernel: [   39.858020] [drm:drm_mode_setcrtc
[drm]] [CRTC:49:crtc-3]

BAD attempt (critical log excerpt from kern_bad.log):
Jan  3 11:39:23 picolo kernel: [   39.599313] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14240, wptr 14256
Jan  3 11:39:23 picolo kernel: [   39.599496]
[drm:drm_calc_vbltimestamp_from_scanoutpos [drm]] crtc 0: Noisy
timestamp 26 us > 20 us [3 reps].
Jan  3 11:39:23 picolo kernel: [   39.599599] [drm:drm_ioctl [drm]]
pid=663, dev=0xe200, auth=1, AMDGPU_BO_LIST
Jan  3 11:39:23 picolo kernel: [   39.599640] [drm:drm_ioctl [drm]]
pid=663, dev=0xe200, auth=1, AMDGPU_CS
Jan  3 11:39:23 picolo kernel: [   39.599992] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14256, wptr 14272
Jan  3 11:39:23 picolo kernel: [   39.600142]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:39:23 picolo kernel: [   39.600297] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14272, wptr 14304
Jan  3 11:39:23 picolo kernel: [   39.600439]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:39:23 picolo kernel: [   39.600580]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:39:23 picolo kernel: [   39.600725] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14304, wptr 14304
Jan  3 11:39:23 picolo kernel: [   39.600795] [drm:drm_ioctl [drm]]
pid=663, dev=0xe200, auth=1, AMDGPU_BO_LIST
Jan  3 11:39:23 picolo kernel: [   39.600846] [drm:drm_ioctl [drm]]
pid=663, dev=0xe200, auth=1, AMDGPU_BO_LIST
Jan  3 11:39:23 picolo kernel: [   39.600881] [drm:drm_ioctl [drm]]
pid=663, dev=0xe200, auth=1, AMDGPU_CS
Jan  3 11:39:23 picolo kernel: [   39.601019] [drm:drm_ioctl [drm]]
pid=663, dev=0xe200, auth=1, AMDGPU_BO_LIST
Jan  3 11:39:23 picolo kernel: [   39.601074] [drm:drm_ioctl [drm]]
pid=630, dev=0xe200, auth=1, AMDGPU_WAIT_CS
Jan  3 11:39:23 picolo kernel: [   39.601269] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14304, wptr 14320
Jan  3 11:39:23 picolo kernel: [   39.601416]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:39:23 picolo kernel: [   39.601569] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14320, wptr 14384
Jan  3 11:39:23 picolo kernel: [   39.601595] [drm:drm_ioctl [drm]]
pid=630, dev=0xe200, auth=1, AMDGPU_WAIT_CS
Jan  3 11:39:23 picolo kernel: [   39.601738]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:39:23 picolo kernel: [   39.601880]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:39:23 picolo kernel: [   39.602029]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:39:23 picolo kernel: [   39.602171]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:39:23 picolo kernel: [   39.602313] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14384, wptr 14384
Jan  3 11:39:23 picolo kernel: [   39.602500] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14384, wptr 14400
Jan  3 11:39:23 picolo kernel: [   39.602649]
[drm:sdma_v3_0_process_trap_irq [amdgpu]] IH: SDMA trap
Jan  3 11:39:23 picolo kernel: [   39.602887] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14400, wptr 14416
Jan  3 11:39:23 picolo kernel: [   39.603054] [drm:gfx_v8_0_eop_irq
[amdgpu]] IH: CP EOP
Jan  3 11:39:23 picolo kernel: [   39.615864] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14416, wptr 14432
Jan  3 11:39:23 picolo kernel: [   39.632542] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14432, wptr 14448
Jan  3 11:39:23 picolo kernel: [   39.649264] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14448, wptr 14464
Jan  3 11:39:23 picolo kernel: [   39.665943] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14464, wptr 14480
Jan  3 11:39:23 picolo kernel: [   39.682610] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14480, wptr 14496
Jan  3 11:39:23 picolo kernel: [   39.699285] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14496, wptr 14512
Jan  3 11:39:23 picolo kernel: [   39.715955] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14512, wptr 14528
Jan  3 11:39:23 picolo kernel: [   39.732629] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14528, wptr 14544
Jan  3 11:39:23 picolo kernel: [   39.749313] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14544, wptr 14560
Jan  3 11:39:23 picolo kernel: [   39.765995] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14560, wptr 14576
Jan  3 11:39:23 picolo kernel: [   39.782667] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14576, wptr 14592
Jan  3 11:39:23 picolo kernel: [   39.799363] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14592, wptr 14608
Jan  3 11:39:23 picolo kernel: [   39.816043] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14608, wptr 14624
Jan  3 11:39:23 picolo kernel: [   39.832734] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14624, wptr 14640
Jan  3 11:39:23 picolo kernel: [   39.849426] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14640, wptr 14656
Jan  3 11:39:23 picolo kernel: [   39.866081] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14656, wptr 14672
Jan  3 11:39:23 picolo kernel: [   39.882822] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14672, wptr 14688
Jan  3 11:39:23 picolo kernel: [   39.899455] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14688, wptr 14704
Jan  3 11:39:23 picolo kernel: [   39.916190] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14704, wptr 14720
Jan  3 11:39:23 picolo kernel: [   39.932885] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14720, wptr 14736
Jan  3 11:39:23 picolo kernel: [   39.949589] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14736, wptr 14752
Jan  3 11:39:23 picolo kernel: [   39.966238] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14752, wptr 14768
Jan  3 11:39:23 picolo kernel: [   39.982869] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14768, wptr 14784
Jan  3 11:39:23 picolo kernel: [   39.999609] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14784, wptr 14800
Jan  3 11:39:23 picolo kernel: [   40.016286] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14800, wptr 14816
Jan  3 11:39:23 picolo kernel: [   40.033045] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14816, wptr 14832
Jan  3 11:39:23 picolo kernel: [   40.049716] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14832, wptr 14848
Jan  3 11:39:23 picolo kernel: [   40.066446] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14848, wptr 14864
Jan  3 11:39:23 picolo kernel: [   40.083031] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14864, wptr 14880
Jan  3 11:39:23 picolo kernel: [   40.099765] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14880, wptr 14896
Jan  3 11:39:23 picolo kernel: [   40.116394] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14896, wptr 14912
Jan  3 11:39:23 picolo kernel: [   40.133133] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14912, wptr 14928
Jan  3 11:39:23 picolo kernel: [   40.149743] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14928, wptr 14944
Jan  3 11:39:23 picolo kernel: [   40.166426] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14944, wptr 14960
Jan  3 11:39:23 picolo kernel: [   40.183178] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14960, wptr 14976
Jan  3 11:39:23 picolo kernel: [   40.199788] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14976, wptr 14992
Jan  3 11:39:23 picolo kernel: [   40.216507] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 14992, wptr 15008
Jan  3 11:39:23 picolo kernel: [   40.233150] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15008, wptr 15024
Jan  3 11:39:23 picolo kernel: [   40.249815] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15024, wptr 15040
Jan  3 11:39:23 picolo kernel: [   40.266454] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15040, wptr 15056
Jan  3 11:39:23 picolo kernel: [   40.283123] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15056, wptr 15072
Jan  3 11:39:23 picolo kernel: [   40.299804] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15072, wptr 15088
Jan  3 11:39:23 picolo kernel: [   40.316483] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15088, wptr 15104
Jan  3 11:39:23 picolo kernel: [   40.333164] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15104, wptr 15120
Jan  3 11:39:23 picolo kernel: [   40.349843] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15120, wptr 15136
Jan  3 11:39:23 picolo kernel: [   40.366523] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15136, wptr 15152
Jan  3 11:39:23 picolo kernel: [   40.383200] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15152, wptr 15168
Jan  3 11:39:23 picolo kernel: [   40.399878] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15168, wptr 15184
Jan  3 11:39:23 picolo kernel: [   40.416561] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15184, wptr 15200
Jan  3 11:39:23 picolo kernel: [   40.433245] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15200, wptr 15216
Jan  3 11:39:23 picolo kernel: [   40.449925] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15216, wptr 15232
Jan  3 11:39:23 picolo kernel: [   40.466613] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15232, wptr 15248
Jan  3 11:39:24 picolo kernel: [   40.483291] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15248, wptr 15264
Jan  3 11:39:24 picolo kernel: [   40.499971] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15264, wptr 15280
Jan  3 11:39:24 picolo kernel: [   40.516652] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15280, wptr 15296
Jan  3 11:39:24 picolo kernel: [   40.533336] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15296, wptr 15312
Jan  3 11:39:24 picolo kernel: [   40.550016] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15312, wptr 15328
Jan  3 11:39:24 picolo kernel: [   40.566715] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15328, wptr 15344
Jan  3 11:39:24 picolo kernel: [   40.583390] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15344, wptr 15360
Jan  3 11:39:24 picolo kernel: [   40.600065] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15360, wptr 15376
Jan  3 11:39:24 picolo kernel: [   40.616745] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15376, wptr 15392
Jan  3 11:39:24 picolo kernel: [   40.633432] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15392, wptr 15408
Jan  3 11:39:24 picolo kernel: [   40.650113] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15408, wptr 15424
Jan  3 11:39:24 picolo kernel: [   40.666790] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15424, wptr 15440
Jan  3 11:39:24 picolo kernel: [   40.683477] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15440, wptr 15456
Jan  3 11:39:24 picolo kernel: [   40.700157] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15456, wptr 15472
Jan  3 11:39:24 picolo kernel: [   40.716836] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15472, wptr 15488
Jan  3 11:39:24 picolo kernel: [   40.733522] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15488, wptr 15504
Jan  3 11:39:24 picolo kernel: [   40.750203] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15504, wptr 15520
Jan  3 11:39:24 picolo kernel: [   40.766882] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15520, wptr 15536
Jan  3 11:39:24 picolo kernel: [   40.783563] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15536, wptr 15552
Jan  3 11:39:24 picolo kernel: [   40.800247] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15552, wptr 15568
Jan  3 11:39:24 picolo kernel: [   40.816929] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15568, wptr 15584
Jan  3 11:39:24 picolo kernel: [   40.833633] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15584, wptr 15600
Jan  3 11:39:24 picolo kernel: [   40.850305] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15600, wptr 15616
Jan  3 11:39:24 picolo kernel: [   40.867011] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15616, wptr 15632
Jan  3 11:39:24 picolo kernel: [   40.883676] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15632, wptr 15648
Jan  3 11:39:24 picolo kernel: [   40.900346] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15648, wptr 15664
Jan  3 11:39:24 picolo kernel: [   40.917026] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15664, wptr 15680
Jan  3 11:39:24 picolo kernel: [   40.933716] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15680, wptr 15696
Jan  3 11:39:24 picolo kernel: [   40.950390] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15696, wptr 15712
Jan  3 11:39:24 picolo kernel: [   40.967070] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15712, wptr 15728
Jan  3 11:39:24 picolo kernel: [   40.983757] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15728, wptr 15744
Jan  3 11:39:24 picolo kernel: [   41.000438] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15744, wptr 15760
Jan  3 11:39:24 picolo kernel: [   41.017115] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15760, wptr 15776
Jan  3 11:39:24 picolo kernel: [   41.033812] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15776, wptr 15792
Jan  3 11:39:24 picolo kernel: [   41.050485] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15792, wptr 15808
Jan  3 11:39:24 picolo kernel: [   41.067162] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15808, wptr 15824
Jan  3 11:39:24 picolo kernel: [   41.083845] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15824, wptr 15840
Jan  3 11:39:24 picolo kernel: [   41.100523] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15840, wptr 15856
Jan  3 11:39:24 picolo kernel: [   41.117205] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15856, wptr 15872
Jan  3 11:39:24 picolo kernel: [   41.133904] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15872, wptr 15888
Jan  3 11:39:24 picolo kernel: [   41.150579] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15888, wptr 15904
Jan  3 11:39:24 picolo kernel: [   41.167255] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15904, wptr 15920
Jan  3 11:39:24 picolo kernel: [   41.183933] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15920, wptr 15936
Jan  3 11:39:24 picolo kernel: [   41.200614] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15936, wptr 15952
Jan  3 11:39:24 picolo kernel: [   41.217295] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15952, wptr 15968
Jan  3 11:39:24 picolo kernel: [   41.233984] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15968, wptr 15984
Jan  3 11:39:24 picolo kernel: [   41.250663] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 15984, wptr 16000
Jan  3 11:39:24 picolo kernel: [   41.267347] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16000, wptr 16016
Jan  3 11:39:24 picolo kernel: [   41.284027] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16016, wptr 16032
Jan  3 11:39:24 picolo kernel: [   41.300706] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16032, wptr 16048
Jan  3 11:39:24 picolo kernel: [   41.317388] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16048, wptr 16064
Jan  3 11:39:24 picolo kernel: [   41.334071] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16064, wptr 16080
Jan  3 11:39:24 picolo kernel: [   41.350752] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16080, wptr 16096
Jan  3 11:39:24 picolo kernel: [   41.367442] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16096, wptr 16112
Jan  3 11:39:24 picolo kernel: [   41.384122] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16112, wptr 16128
Jan  3 11:39:24 picolo kernel: [   41.400801] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16128, wptr 16144
Jan  3 11:39:24 picolo kernel: [   41.417480] [drm:amdgpu_ih_process
[amdgpu]] amdgpu_ih_process: rptr 16144, wptr 16160
Jan  3 11:39:24 picolo kernel: [   41.432501] [drm:vblank_disable_fn
[drm]] disabling vblank on crtc 0
Jan  3 11:41:22 picolo kernel: [   49.762715] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=2, emitted seq=3
Jan  3 11:41:22 picolo kernel: [   49.772047] [drm] GPU recovery disabled.

Regards,
Luís

On Wed, Jan 2, 2019 at 12:05 PM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Hi Luis,
>
> mhm, sounds like a timing issue. We have probably made something faster
> during bootup in 4.20 and because of this you now see this issue more often.
>
> If the bisection doesn't show any result can you try adding some
> msleep(10) call at critical places in the driver code to narrow this down?
>
> Officially we don't test/support ARM with the driver code, but in this
> particular case we should probably investigate since it sounds like it
> just doesn't happen on x86 because of different timing.
>
> Thanks,
> Christian.
>
> Am 28.12.18 um 15:05 schrieb Luís Mendes:
> > Hi Alex,
> >
> > Before all... Have a nice holidays! Happy new year!!
> >
> > - Okay, so it looks like sometimes the driver is able to enter
> > graphical mode with the Polaris card, but most of the time it fails
> > before with:
> > [   49.762704] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> > timeout, signaled seq=2, emitted seq=3
> >
> > - This is something that is happening sporadically but in a less
> > intensive way in 4.17, 4.18 and 4.19 kernels, so this is actually not
> > a regression, but rather an existent issue, which maybe the patch
> > "drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq ring tests
> > sequence" solves. I tried to backport it to 4.20, but had no
> > improvement. Need to try with the git version, or rc1.
> >
> > - This hang happens after the console is displayed in the screen, but
> > before switching to graphical mode with X.
> >
> > - However if X is entered then the driver is stable and can be used
> > for long periods.
> >
> > Regards,
> > Luís Mendes
> >
> > On Tue, Dec 18, 2018 at 11:16 PM Luís Mendes <luis.p.mendes at gmail.com> wrote:
> >> Hi Alex,
> >>
> >> I am already using drm_arch_can_wc_memory() set to false.
> >> I will try to bisect...
> >>
> >> Regards,
> >> Luís
> >>
> >> On Tue, Dec 18, 2018 at 7:03 PM Alex Deucher <alexdeucher at gmail.com> wrote:
> >>> On Tue, Dec 18, 2018 at 8:58 AM Luís Mendes <luis.p.mendes at gmail.com> wrote:
> >>>> Hi Christian,
> >>>>
> >>>> I've been using a Sapphire RX 550 and a Sapphire RX 460 on a custom
> >>>> armhf board that runs well with Linux 4.19.9 at least, but now
> >>>> starting with Linux kernel 4.20, I'm having a gpu hang, right after
> >>>> the console being displayed, but before entering in graphical mode,
> >>>> when starting X session.
> >>>> I'm only reporting this now, because there was a PCI commit for mvebu
> >>>> that also entered for linux-4.20 that caused a kernel oops during
> >>>> pci_map_rom call in amdgpu initialization code. I've reverted that
> >>>> patch, but now amdgpu is hanging.
> >>> It would be useful if you could bisect.  This is the first I've heard
> >>> of amdgpu working on an ARM board without write combining (WC)
> >>> disabled.  You might check to see if disabling WC helps.  Return false
> >>> in drm_arch_can_wc_memory().
> >>>
> >>> Alex
> >>>
> >>>>
> >>>> [   24.801861] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> >>>> timeout, signaled seq=2, emitted seq=3
> >>>>
> >>>> 02:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> >>>> [AMD/ATI] Baffin [Polaris11] (rev ff) (prog-if 00 [VGA controller])
> >>>>      Subsystem: Sapphire Technology Limited Baffin [Radeon RX 560]
> >>>>      Flags: bus master, fast devsel, latency 0, IRQ 51
> >>>>      Memory at d0000000 (64-bit, prefetchable) [size=256M]
> >>>>      Memory at e0000000 (64-bit, prefetchable) [size=2M]
> >>>>      I/O ports at 10000 [size=256]
> >>>>      Memory at e0200000 (32-bit, non-prefetchable) [size=256K]
> >>>>      Expansion ROM at e0240000 [disabled] [size=128K]
> >>>>      Capabilities: <access denied>
> >>>>      Kernel driver in use: amdgpu
> >>>>      Kernel modules: amdgpu
> >>>>
> >>>> dmesg follows in attachment.
> >>>>
> >>>> Regards,
> >>>> Luís
> >>>> _______________________________________________
> >>>> amd-gfx mailing list
> >>>> amd-gfx at lists.freedesktop.org
> >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kern_bad.log
Type: text/x-log
Size: 412979 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190103/127e9eaa/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kern_good.log
Type: text/x-log
Size: 990263 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190103/127e9eaa/attachment-0003.bin>


More information about the amd-gfx mailing list