Bug: UVD initialization / clock gating issue on kabini

Nils Holland nholland at tisys.org
Sun Jan 22 19:26:10 UTC 2017


Hi folks,

while playing around with the amdgpu drm driver, I stumbled upon an
issue. In fact, I have tracked it down to commit
aa4747c00a2dd034c5fdf70ca73b1674ca15beb3 ("drm/amdgpu: refine uvd_4.2
clock gate sequence.").

When I run the latest mainline git kernel, which contains this commit,
on my system, I get the following in dmesg (notice the lines about UVD
not responding at the end):

[    2.276715] Linux agpgart interface v0.103
[    2.277418] [drm] Initialized
[    2.277612] [drm] amdgpu kernel modesetting enabled.
[    2.278311] [drm] initializing kernel modesetting (KABINI 0x1002:0x9834 0x103C:0x21F7 0x00).
[    2.278554] [drm] register mmio base: 0xF0C00000
[    2.278683] [drm] register mmio size: 262144
[    2.278819] [drm] doorbell mmio base: 0xF0000000
[    2.278945] [drm] doorbell mmio size: 8388608
[    2.282482] ATOM BIOS: AMD
[    2.282640] [drm] GPU post is not needed
[    2.282767] [drm] Changing default dispclk from 300Mhz to 600Mhz
[    2.283369] amdgpu 0000:00:01.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
[    2.283604] amdgpu 0000:00:01.0: GTT: 1024M 0x0000000020000000 - 0x000000005FFFFFFF
[    2.283823] [drm] Detected VRAM RAM=512M, BAR=256M
[    2.283951] [drm] RAM width 128bits UNKNOWN
[    2.284199] [TTM] Zone  kernel: Available graphics memory: 425732 kiB
[    2.284329] [TTM] Zone highmem: Available graphics memory: 1788424 kiB
[    2.284458] [TTM] Initializing pool allocator
[    2.284617] [TTM] Initializing DMA pool allocator
[    2.284796] [drm] amdgpu: 512M of VRAM memory ready
[    2.284924] [drm] amdgpu: 1024M of GTT memory ready.
[    2.285069] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    2.346421] [drm] PCIE GART of 1024M enabled (table at 0x0000000000040000).
[    2.346608] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.346717] [drm] Driver supports precise vblank timestamp query.
[    2.346894] amdgpu 0000:00:01.0: amdgpu: using MSI.
[    2.347044] [drm] amdgpu: irq initialized.
[    2.347160] [drm] Internal thermal controller without fan control
[    2.347269] [drm] amdgpu: dpm initialized
[    2.401730] [drm] amdgpu atom DIG backlight initialized
[    2.401854] [drm] AMDGPU Display Connectors
[    2.401959] [drm] Connector 0:
[    2.402062] [drm]   LVDS-1
[    2.402165] [drm]   HPD1
[    2.402270] [drm]   DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
[    2.402439] [drm]   Encoders:
[    2.402562] [drm]     LCD1: INTERNAL_UNIPHY
[    2.402667] [drm] Connector 1:
[    2.402769] [drm]   HDMI-A-1
[    2.402871] [drm]   HPD2
[    2.402974] [drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
[    2.403142] [drm]   Encoders:
[    2.403244] [drm]     DFP1: INTERNAL_UNIPHY
[    2.403348] [drm] Connector 2:
[    2.403450] [drm]   VGA-1
[    2.403565] [drm]   DDC: 0x1970 0x1970 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973
[    2.403733] [drm]   Encoders:
[    2.403836] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[    2.404484] amdgpu 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000020000010, cpu addr 0xffc01010
[    2.404785] amdgpu 0000:00:01.0: fence driver on ring 1 use gpu addr 0x0000000020000020, cpu addr 0xffc01020
[    2.405046] amdgpu 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000020000030, cpu addr 0xffc01030
[    2.405343] amdgpu 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000020000040, cpu addr 0xffc01040
[    2.405644] amdgpu 0000:00:01.0: fence driver on ring 4 use gpu addr 0x0000000020000050, cpu addr 0xffc01050
[    2.405904] amdgpu 0000:00:01.0: fence driver on ring 5 use gpu addr 0x0000000020000060, cpu addr 0xffc01060
[    2.406176] amdgpu 0000:00:01.0: fence driver on ring 6 use gpu addr 0x0000000020000070, cpu addr 0xffc01070
[    2.406432] amdgpu 0000:00:01.0: fence driver on ring 7 use gpu addr 0x0000000020000080, cpu addr 0xffc01080
[    2.406706] amdgpu 0000:00:01.0: fence driver on ring 8 use gpu addr 0x0000000020000090, cpu addr 0xffc01090
[    2.407073] amdgpu 0000:00:01.0: fence driver on ring 9 use gpu addr 0x00000000200000a0, cpu addr 0xffc010a0
[    2.407329] amdgpu 0000:00:01.0: fence driver on ring 10 use gpu addr 0x00000000200000b0, cpu addr 0xffc010b0
[    2.407801] [drm] Found UVD firmware Version: 1.64 Family ID: 9
[    2.409254] amdgpu 0000:00:01.0: fence driver on ring 11 use gpu addr 0x000000000028cd30, cpu addr 0xf8a38d30
[    2.409646] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
[    2.409911] amdgpu 0000:00:01.0: fence driver on ring 12 use gpu addr 0x00000000200000d0, cpu addr 0xffc010d0
[    2.410172] amdgpu 0000:00:01.0: fence driver on ring 13 use gpu addr 0x00000000200000e0, cpu addr 0xffc010e0
[    2.414503] [drm] ring test on 0 succeeded in 16 usecs
[    2.414875] [drm] ring test on 1 succeeded in 3 usecs
[    2.414992] [drm] ring test on 2 succeeded in 3 usecs
[    2.415107] [drm] ring test on 3 succeeded in 3 usecs
[    2.415223] [drm] ring test on 4 succeeded in 3 usecs
[    2.415338] [drm] ring test on 5 succeeded in 3 usecs
[    2.415452] [drm] ring test on 6 succeeded in 3 usecs
[    2.415583] [drm] ring test on 7 succeeded in 3 usecs
[    2.415698] [drm] ring test on 8 succeeded in 3 usecs
[    2.416076] [drm] ring test on 9 succeeded in 5 usecs
[    2.416189] [drm] ring test on 10 succeeded in 5 usecs
[    2.442350] [drm] ring test on 11 succeeded in 1 usecs
[    2.442459] [drm] UVD initialized successfully.
[    2.580653] [Firmware Bug]: battery: (dis)charge rate invalid.
[    2.580899] ACPI: Battery Slot [BAT1] (battery present)
[    2.673280] [drm] ring test on 12 succeeded in 13 usecs
[    2.673398] [drm] ring test on 13 succeeded in 2 usecs
[    2.673500] [drm] VCE initialized successfully.
[    3.246698] tsc: Refined TSC clocksource calibration: 998.128 MHz
[    3.246828] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1cc65d64e77, max_idle_ns: 881590512558 ns
[    3.698778] [drm] fb mappable at 0xE0428000
[    3.698896] [drm] vram apper at 0xE0000000
[    3.699007] [drm] size 4325376
[    3.699116] [drm] fb depth is 24
[    3.699228] [drm]    pitch is 5632
[    3.699503] fbcon: amdgpudrmfb (fb0) is primary device
[    3.826793] Console: switching to colour frame buffer device 170x48
[    3.837201] amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer device
[    3.843195] [drm] ib test on ring 0 succeeded
[    3.843355] [drm] ib test on ring 1 succeeded
[    3.843486] [drm] ib test on ring 2 succeeded
[    3.843651] [drm] ib test on ring 3 succeeded
[    3.843779] [drm] ib test on ring 4 succeeded
[    3.843917] [drm] ib test on ring 5 succeeded
[    3.844047] [drm] ib test on ring 6 succeeded
[    3.844175] [drm] ib test on ring 7 succeeded
[    3.844304] [drm] ib test on ring 8 succeeded
[    3.844427] [drm] ib test on ring 9 succeeded
[    3.844576] [drm] ib test on ring 10 succeeded
[    4.870999] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[    5.891351] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[    6.911713] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[    7.932062] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[    8.952409] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[    9.974050] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   10.995701] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   12.017354] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   13.039024] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   14.060692] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[   14.082168] [drm:uvd_v4_2_start] *ERROR* UVD not responding, giving up!!!
[   14.084797] clocksource: Switched to clocksource tsc
[   14.086404] kwatchdog (97) used greatest stack depth: 7260 bytes left
[   15.086475] [drm:amdgpu_uvd_ring_test_ib] *ERROR* amdgpu: IB test timed out.
[   15.088263] [drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on ring 11 (-110).
[   15.211331] [drm] ib test on ring 12 succeeded
[   15.213105] [drm:amdgpu_device_init] *ERROR* ib ring test failed (-110).
[   15.223583] [drm] Initialized amdgpu 3.9.0 20150101 for 0000:00:01.0 on minor 0

After the short delay that happens while the ERROR messages are
printed, the system continues to boot just fine and in general, for
what I am concerned, works as expected. However, I'm probably not
doing anything that would use the UVD and thus might not notice when
it's not been properly initialized. :-)

Now, when I revert just the one commit
aa4747c00a2dd034c5fdf70ca73b1674ca15beb3, or try a kernel that doesn't
contain it in the first place (like, some 4.9.x one), the UVD seems to
get initialized just fine. Here are the relevant dmesg lines from that
case:

[    2.274639] Linux agpgart interface v0.103
[    2.275342] [drm] Initialized
[    2.275534] [drm] amdgpu kernel modesetting enabled.
[    2.276201] [drm] initializing kernel modesetting (KABINI 0x1002:0x9834 0x103C:0x21F7 0x00).
[    2.276442] [drm] register mmio base: 0xF0C00000
[    2.276568] [drm] register mmio size: 262144
[    2.276702] [drm] doorbell mmio base: 0xF0000000
[    2.276825] [drm] doorbell mmio size: 8388608
[    2.280335] ATOM BIOS: AMD
[    2.280493] [drm] GPU post is not needed
[    2.280620] [drm] Changing default dispclk from 300Mhz to 600Mhz
[    2.281220] amdgpu 0000:00:01.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
[    2.281454] amdgpu 0000:00:01.0: GTT: 1024M 0x0000000020000000 - 0x000000005FFFFFFF
[    2.281674] [drm] Detected VRAM RAM=512M, BAR=256M
[    2.281801] [drm] RAM width 128bits UNKNOWN
[    2.282062] [TTM] Zone  kernel: Available graphics memory: 425732 kiB
[    2.282194] [TTM] Zone highmem: Available graphics memory: 1788424 kiB
[    2.282322] [TTM] Initializing pool allocator
[    2.282476] [TTM] Initializing DMA pool allocator
[    2.282657] [drm] amdgpu: 512M of VRAM memory ready
[    2.282785] [drm] amdgpu: 1024M of GTT memory ready.
[    2.282930] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    2.344268] [drm] PCIE GART of 1024M enabled (table at 0x0000000000040000).
[    2.344462] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.344572] [drm] Driver supports precise vblank timestamp query.
[    2.344754] amdgpu 0000:00:01.0: amdgpu: using MSI.
[    2.344905] [drm] amdgpu: irq initialized.
[    2.345022] [drm] Internal thermal controller without fan control
[    2.345131] [drm] amdgpu: dpm initialized
[    2.398603] [drm] amdgpu atom DIG backlight initialized
[    2.398727] [drm] AMDGPU Display Connectors
[    2.398833] [drm] Connector 0:
[    2.398936] [drm]   LVDS-1
[    2.399039] [drm]   HPD1
[    2.399145] [drm]   DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
[    2.399313] [drm]   Encoders:
[    2.399437] [drm]     LCD1: INTERNAL_UNIPHY
[    2.399542] [drm] Connector 1:
[    2.399644] [drm]   HDMI-A-1
[    2.399746] [drm]   HPD2
[    2.399849] [drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
[    2.400017] [drm]   Encoders:
[    2.400120] [drm]     DFP1: INTERNAL_UNIPHY
[    2.400223] [drm] Connector 2:
[    2.400325] [drm]   VGA-1
[    2.400440] [drm]   DDC: 0x1970 0x1970 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973
[    2.400609] [drm]   Encoders:
[    2.400711] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[    2.401352] amdgpu 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000020000010, cpu addr 0xffc01010
[    2.401651] amdgpu 0000:00:01.0: fence driver on ring 1 use gpu addr 0x0000000020000020, cpu addr 0xffc01020
[    2.401926] amdgpu 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000020000030, cpu addr 0xffc01030
[    2.402216] amdgpu 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000020000040, cpu addr 0xffc01040
[    2.402516] amdgpu 0000:00:01.0: fence driver on ring 4 use gpu addr 0x0000000020000050, cpu addr 0xffc01050
[    2.402774] amdgpu 0000:00:01.0: fence driver on ring 5 use gpu addr 0x0000000020000060, cpu addr 0xffc01060
[    2.403030] amdgpu 0000:00:01.0: fence driver on ring 6 use gpu addr 0x0000000020000070, cpu addr 0xffc01070
[    2.403293] amdgpu 0000:00:01.0: fence driver on ring 7 use gpu addr 0x0000000020000080, cpu addr 0xffc01080
[    2.403573] amdgpu 0000:00:01.0: fence driver on ring 8 use gpu addr 0x0000000020000090, cpu addr 0xffc01090
[    2.403941] amdgpu 0000:00:01.0: fence driver on ring 9 use gpu addr 0x00000000200000a0, cpu addr 0xffc010a0
[    2.404198] amdgpu 0000:00:01.0: fence driver on ring 10 use gpu addr 0x00000000200000b0, cpu addr 0xffc010b0
[    2.404679] [drm] Found UVD firmware Version: 1.64 Family ID: 9
[    2.406144] amdgpu 0000:00:01.0: fence driver on ring 11 use gpu addr 0x000000000028cd30, cpu addr 0xf8a38d30
[    2.406543] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
[    2.406800] amdgpu 0000:00:01.0: fence driver on ring 12 use gpu addr 0x00000000200000d0, cpu addr 0xffc010d0
[    2.407061] amdgpu 0000:00:01.0: fence driver on ring 13 use gpu addr 0x00000000200000e0, cpu addr 0xffc010e0
[    2.411298] [drm] ring test on 0 succeeded in 16 usecs
[    2.411662] [drm] ring test on 1 succeeded in 3 usecs
[    2.411781] [drm] ring test on 2 succeeded in 3 usecs
[    2.411896] [drm] ring test on 3 succeeded in 3 usecs
[    2.412011] [drm] ring test on 4 succeeded in 3 usecs
[    2.412126] [drm] ring test on 5 succeeded in 3 usecs
[    2.412240] [drm] ring test on 6 succeeded in 3 usecs
[    2.412355] [drm] ring test on 7 succeeded in 3 usecs
[    2.412484] [drm] ring test on 8 succeeded in 3 usecs
[    2.412864] [drm] ring test on 9 succeeded in 6 usecs
[    2.412976] [drm] ring test on 10 succeeded in 4 usecs
[    2.459127] [drm] ring test on 11 succeeded in 1 usecs
[    2.479259] [drm] UVD initialized successfully.
[    2.580509] [Firmware Bug]: battery: (dis)charge rate invalid.
[    2.580741] ACPI: Battery Slot [BAT1] (battery present)
[    2.710161] [drm] ring test on 12 succeeded in 13 usecs
[    2.710286] [drm] ring test on 13 succeeded in 1 usecs
[    2.710387] [drm] VCE initialized successfully.
[    3.243531] tsc: Refined TSC clocksource calibration: 998.127 MHz
[    3.243662] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1cc65b93289, max_idle_ns: 881590487074 ns
[    3.734588] [drm] fb mappable at 0xE0428000
[    3.734701] [drm] vram apper at 0xE0000000
[    3.734802] [drm] size 4325376
[    3.734900] [drm] fb depth is 24
[    3.734999] [drm]    pitch is 5632
[    3.735303] fbcon: amdgpudrmfb (fb0) is primary device
[    3.858601] Console: switching to colour frame buffer device 170x48
[    3.869008] amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer device
[    3.875082] [drm] ib test on ring 0 succeeded
[    3.875275] [drm] ib test on ring 1 succeeded
[    3.875442] [drm] ib test on ring 2 succeeded
[    3.875577] [drm] ib test on ring 3 succeeded
[    3.875711] [drm] ib test on ring 4 succeeded
[    3.875844] [drm] ib test on ring 5 succeeded
[    3.875977] [drm] ib test on ring 6 succeeded
[    3.876110] [drm] ib test on ring 7 succeeded
[    3.876243] [drm] ib test on ring 8 succeeded
[    3.876363] [drm] ib test on ring 9 succeeded
[    3.876480] [drm] ib test on ring 10 succeeded
[    3.904244] [drm] ib test on ring 11 succeeded
[    4.025142] [drm] ib test on ring 12 succeeded
[    4.033356] [drm] Initialized amdgpu 3.9.0 20150101 for 0000:00:01.0 on minor 0

So, it seems that this specific commit, at least on my kabini card (or
rather, APU), probably also only on 32 bit (couldn't yet test a 64 bit
kernel), has introduced a regression.

I thought I'd report this so someone with a bit more knowledge can
have a look. Actually, just three hours ago, I didn't even know what a
UVD is, now I know that it's a Unified Video Decoder, and ... well, if
I continue at that pace, in a year from now I might even know how to
fix this issue I've discovered ;-), but right now, I guess the help of an
expert would be greatly appreaciated! :-)

Greetings
Nils


More information about the amd-gfx mailing list