Bug: UVD initialization / clock gating issue on kabini
Nils Holland
nholland at tisys.org
Sun Jan 22 19:26:10 UTC 2017
Hi folks,
while playing around with the amdgpu drm driver, I stumbled upon an
issue. In fact, I have tracked it down to commit
aa4747c00a2dd034c5fdf70ca73b1674ca15beb3 ("drm/amdgpu: refine uvd_4.2
clock gate sequence.").
When I run the latest mainline git kernel, which contains this commit,
on my system, I get the following in dmesg (notice the lines about UVD
not responding at the end):
[ 2.276715] Linux agpgart interface v0.103
[ 2.277418] [drm] Initialized
[ 2.277612] [drm] amdgpu kernel modesetting enabled.
[ 2.278311] [drm] initializing kernel modesetting (KABINI 0x1002:0x9834 0x103C:0x21F7 0x00).
[ 2.278554] [drm] register mmio base: 0xF0C00000
[ 2.278683] [drm] register mmio size: 262144
[ 2.278819] [drm] doorbell mmio base: 0xF0000000
[ 2.278945] [drm] doorbell mmio size: 8388608
[ 2.282482] ATOM BIOS: AMD
[ 2.282640] [drm] GPU post is not needed
[ 2.282767] [drm] Changing default dispclk from 300Mhz to 600Mhz
[ 2.283369] amdgpu 0000:00:01.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
[ 2.283604] amdgpu 0000:00:01.0: GTT: 1024M 0x0000000020000000 - 0x000000005FFFFFFF
[ 2.283823] [drm] Detected VRAM RAM=512M, BAR=256M
[ 2.283951] [drm] RAM width 128bits UNKNOWN
[ 2.284199] [TTM] Zone kernel: Available graphics memory: 425732 kiB
[ 2.284329] [TTM] Zone highmem: Available graphics memory: 1788424 kiB
[ 2.284458] [TTM] Initializing pool allocator
[ 2.284617] [TTM] Initializing DMA pool allocator
[ 2.284796] [drm] amdgpu: 512M of VRAM memory ready
[ 2.284924] [drm] amdgpu: 1024M of GTT memory ready.
[ 2.285069] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 2.346421] [drm] PCIE GART of 1024M enabled (table at 0x0000000000040000).
[ 2.346608] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 2.346717] [drm] Driver supports precise vblank timestamp query.
[ 2.346894] amdgpu 0000:00:01.0: amdgpu: using MSI.
[ 2.347044] [drm] amdgpu: irq initialized.
[ 2.347160] [drm] Internal thermal controller without fan control
[ 2.347269] [drm] amdgpu: dpm initialized
[ 2.401730] [drm] amdgpu atom DIG backlight initialized
[ 2.401854] [drm] AMDGPU Display Connectors
[ 2.401959] [drm] Connector 0:
[ 2.402062] [drm] LVDS-1
[ 2.402165] [drm] HPD1
[ 2.402270] [drm] DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
[ 2.402439] [drm] Encoders:
[ 2.402562] [drm] LCD1: INTERNAL_UNIPHY
[ 2.402667] [drm] Connector 1:
[ 2.402769] [drm] HDMI-A-1
[ 2.402871] [drm] HPD2
[ 2.402974] [drm] DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
[ 2.403142] [drm] Encoders:
[ 2.403244] [drm] DFP1: INTERNAL_UNIPHY
[ 2.403348] [drm] Connector 2:
[ 2.403450] [drm] VGA-1
[ 2.403565] [drm] DDC: 0x1970 0x1970 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973
[ 2.403733] [drm] Encoders:
[ 2.403836] [drm] CRT1: INTERNAL_KLDSCP_DAC1
[ 2.404484] amdgpu 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000020000010, cpu addr 0xffc01010
[ 2.404785] amdgpu 0000:00:01.0: fence driver on ring 1 use gpu addr 0x0000000020000020, cpu addr 0xffc01020
[ 2.405046] amdgpu 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000020000030, cpu addr 0xffc01030
[ 2.405343] amdgpu 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000020000040, cpu addr 0xffc01040
[ 2.405644] amdgpu 0000:00:01.0: fence driver on ring 4 use gpu addr 0x0000000020000050, cpu addr 0xffc01050
[ 2.405904] amdgpu 0000:00:01.0: fence driver on ring 5 use gpu addr 0x0000000020000060, cpu addr 0xffc01060
[ 2.406176] amdgpu 0000:00:01.0: fence driver on ring 6 use gpu addr 0x0000000020000070, cpu addr 0xffc01070
[ 2.406432] amdgpu 0000:00:01.0: fence driver on ring 7 use gpu addr 0x0000000020000080, cpu addr 0xffc01080
[ 2.406706] amdgpu 0000:00:01.0: fence driver on ring 8 use gpu addr 0x0000000020000090, cpu addr 0xffc01090
[ 2.407073] amdgpu 0000:00:01.0: fence driver on ring 9 use gpu addr 0x00000000200000a0, cpu addr 0xffc010a0
[ 2.407329] amdgpu 0000:00:01.0: fence driver on ring 10 use gpu addr 0x00000000200000b0, cpu addr 0xffc010b0
[ 2.407801] [drm] Found UVD firmware Version: 1.64 Family ID: 9
[ 2.409254] amdgpu 0000:00:01.0: fence driver on ring 11 use gpu addr 0x000000000028cd30, cpu addr 0xf8a38d30
[ 2.409646] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
[ 2.409911] amdgpu 0000:00:01.0: fence driver on ring 12 use gpu addr 0x00000000200000d0, cpu addr 0xffc010d0
[ 2.410172] amdgpu 0000:00:01.0: fence driver on ring 13 use gpu addr 0x00000000200000e0, cpu addr 0xffc010e0
[ 2.414503] [drm] ring test on 0 succeeded in 16 usecs
[ 2.414875] [drm] ring test on 1 succeeded in 3 usecs
[ 2.414992] [drm] ring test on 2 succeeded in 3 usecs
[ 2.415107] [drm] ring test on 3 succeeded in 3 usecs
[ 2.415223] [drm] ring test on 4 succeeded in 3 usecs
[ 2.415338] [drm] ring test on 5 succeeded in 3 usecs
[ 2.415452] [drm] ring test on 6 succeeded in 3 usecs
[ 2.415583] [drm] ring test on 7 succeeded in 3 usecs
[ 2.415698] [drm] ring test on 8 succeeded in 3 usecs
[ 2.416076] [drm] ring test on 9 succeeded in 5 usecs
[ 2.416189] [drm] ring test on 10 succeeded in 5 usecs
[ 2.442350] [drm] ring test on 11 succeeded in 1 usecs
[ 2.442459] [drm] UVD initialized successfully.
[ 2.580653] [Firmware Bug]: battery: (dis)charge rate invalid.
[ 2.580899] ACPI: Battery Slot [BAT1] (battery present)
[ 2.673280] [drm] ring test on 12 succeeded in 13 usecs
[ 2.673398] [drm] ring test on 13 succeeded in 2 usecs
[ 2.673500] [drm] VCE initialized successfully.
[ 3.246698] tsc: Refined TSC clocksource calibration: 998.128 MHz
[ 3.246828] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1cc65d64e77, max_idle_ns: 881590512558 ns
[ 3.698778] [drm] fb mappable at 0xE0428000
[ 3.698896] [drm] vram apper at 0xE0000000
[ 3.699007] [drm] size 4325376
[ 3.699116] [drm] fb depth is 24
[ 3.699228] [drm] pitch is 5632
[ 3.699503] fbcon: amdgpudrmfb (fb0) is primary device
[ 3.826793] Console: switching to colour frame buffer device 170x48
[ 3.837201] amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer device
[ 3.843195] [drm] ib test on ring 0 succeeded
[ 3.843355] [drm] ib test on ring 1 succeeded
[ 3.843486] [drm] ib test on ring 2 succeeded
[ 3.843651] [drm] ib test on ring 3 succeeded
[ 3.843779] [drm] ib test on ring 4 succeeded
[ 3.843917] [drm] ib test on ring 5 succeeded
[ 3.844047] [drm] ib test on ring 6 succeeded
[ 3.844175] [drm] ib test on ring 7 succeeded
[ 3.844304] [drm] ib test on ring 8 succeeded
[ 3.844427] [drm] ib test on ring 9 succeeded
[ 3.844576] [drm] ib test on ring 10 succeeded
[ 4.870999] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 5.891351] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 6.911713] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 7.932062] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 8.952409] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 9.974050] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 10.995701] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 12.017354] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 13.039024] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 14.060692] [drm:uvd_v4_2_start] *ERROR* UVD not responding, trying to reset the VCPU!!!
[ 14.082168] [drm:uvd_v4_2_start] *ERROR* UVD not responding, giving up!!!
[ 14.084797] clocksource: Switched to clocksource tsc
[ 14.086404] kwatchdog (97) used greatest stack depth: 7260 bytes left
[ 15.086475] [drm:amdgpu_uvd_ring_test_ib] *ERROR* amdgpu: IB test timed out.
[ 15.088263] [drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on ring 11 (-110).
[ 15.211331] [drm] ib test on ring 12 succeeded
[ 15.213105] [drm:amdgpu_device_init] *ERROR* ib ring test failed (-110).
[ 15.223583] [drm] Initialized amdgpu 3.9.0 20150101 for 0000:00:01.0 on minor 0
After the short delay that happens while the ERROR messages are
printed, the system continues to boot just fine and in general, for
what I am concerned, works as expected. However, I'm probably not
doing anything that would use the UVD and thus might not notice when
it's not been properly initialized. :-)
Now, when I revert just the one commit
aa4747c00a2dd034c5fdf70ca73b1674ca15beb3, or try a kernel that doesn't
contain it in the first place (like, some 4.9.x one), the UVD seems to
get initialized just fine. Here are the relevant dmesg lines from that
case:
[ 2.274639] Linux agpgart interface v0.103
[ 2.275342] [drm] Initialized
[ 2.275534] [drm] amdgpu kernel modesetting enabled.
[ 2.276201] [drm] initializing kernel modesetting (KABINI 0x1002:0x9834 0x103C:0x21F7 0x00).
[ 2.276442] [drm] register mmio base: 0xF0C00000
[ 2.276568] [drm] register mmio size: 262144
[ 2.276702] [drm] doorbell mmio base: 0xF0000000
[ 2.276825] [drm] doorbell mmio size: 8388608
[ 2.280335] ATOM BIOS: AMD
[ 2.280493] [drm] GPU post is not needed
[ 2.280620] [drm] Changing default dispclk from 300Mhz to 600Mhz
[ 2.281220] amdgpu 0000:00:01.0: VRAM: 512M 0x0000000000000000 - 0x000000001FFFFFFF (512M used)
[ 2.281454] amdgpu 0000:00:01.0: GTT: 1024M 0x0000000020000000 - 0x000000005FFFFFFF
[ 2.281674] [drm] Detected VRAM RAM=512M, BAR=256M
[ 2.281801] [drm] RAM width 128bits UNKNOWN
[ 2.282062] [TTM] Zone kernel: Available graphics memory: 425732 kiB
[ 2.282194] [TTM] Zone highmem: Available graphics memory: 1788424 kiB
[ 2.282322] [TTM] Initializing pool allocator
[ 2.282476] [TTM] Initializing DMA pool allocator
[ 2.282657] [drm] amdgpu: 512M of VRAM memory ready
[ 2.282785] [drm] amdgpu: 1024M of GTT memory ready.
[ 2.282930] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 2.344268] [drm] PCIE GART of 1024M enabled (table at 0x0000000000040000).
[ 2.344462] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 2.344572] [drm] Driver supports precise vblank timestamp query.
[ 2.344754] amdgpu 0000:00:01.0: amdgpu: using MSI.
[ 2.344905] [drm] amdgpu: irq initialized.
[ 2.345022] [drm] Internal thermal controller without fan control
[ 2.345131] [drm] amdgpu: dpm initialized
[ 2.398603] [drm] amdgpu atom DIG backlight initialized
[ 2.398727] [drm] AMDGPU Display Connectors
[ 2.398833] [drm] Connector 0:
[ 2.398936] [drm] LVDS-1
[ 2.399039] [drm] HPD1
[ 2.399145] [drm] DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f
[ 2.399313] [drm] Encoders:
[ 2.399437] [drm] LCD1: INTERNAL_UNIPHY
[ 2.399542] [drm] Connector 1:
[ 2.399644] [drm] HDMI-A-1
[ 2.399746] [drm] HPD2
[ 2.399849] [drm] DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
[ 2.400017] [drm] Encoders:
[ 2.400120] [drm] DFP1: INTERNAL_UNIPHY
[ 2.400223] [drm] Connector 2:
[ 2.400325] [drm] VGA-1
[ 2.400440] [drm] DDC: 0x1970 0x1970 0x1971 0x1971 0x1972 0x1972 0x1973 0x1973
[ 2.400609] [drm] Encoders:
[ 2.400711] [drm] CRT1: INTERNAL_KLDSCP_DAC1
[ 2.401352] amdgpu 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000020000010, cpu addr 0xffc01010
[ 2.401651] amdgpu 0000:00:01.0: fence driver on ring 1 use gpu addr 0x0000000020000020, cpu addr 0xffc01020
[ 2.401926] amdgpu 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000020000030, cpu addr 0xffc01030
[ 2.402216] amdgpu 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000020000040, cpu addr 0xffc01040
[ 2.402516] amdgpu 0000:00:01.0: fence driver on ring 4 use gpu addr 0x0000000020000050, cpu addr 0xffc01050
[ 2.402774] amdgpu 0000:00:01.0: fence driver on ring 5 use gpu addr 0x0000000020000060, cpu addr 0xffc01060
[ 2.403030] amdgpu 0000:00:01.0: fence driver on ring 6 use gpu addr 0x0000000020000070, cpu addr 0xffc01070
[ 2.403293] amdgpu 0000:00:01.0: fence driver on ring 7 use gpu addr 0x0000000020000080, cpu addr 0xffc01080
[ 2.403573] amdgpu 0000:00:01.0: fence driver on ring 8 use gpu addr 0x0000000020000090, cpu addr 0xffc01090
[ 2.403941] amdgpu 0000:00:01.0: fence driver on ring 9 use gpu addr 0x00000000200000a0, cpu addr 0xffc010a0
[ 2.404198] amdgpu 0000:00:01.0: fence driver on ring 10 use gpu addr 0x00000000200000b0, cpu addr 0xffc010b0
[ 2.404679] [drm] Found UVD firmware Version: 1.64 Family ID: 9
[ 2.406144] amdgpu 0000:00:01.0: fence driver on ring 11 use gpu addr 0x000000000028cd30, cpu addr 0xf8a38d30
[ 2.406543] [drm] Found VCE firmware Version: 50.10 Binary ID: 2
[ 2.406800] amdgpu 0000:00:01.0: fence driver on ring 12 use gpu addr 0x00000000200000d0, cpu addr 0xffc010d0
[ 2.407061] amdgpu 0000:00:01.0: fence driver on ring 13 use gpu addr 0x00000000200000e0, cpu addr 0xffc010e0
[ 2.411298] [drm] ring test on 0 succeeded in 16 usecs
[ 2.411662] [drm] ring test on 1 succeeded in 3 usecs
[ 2.411781] [drm] ring test on 2 succeeded in 3 usecs
[ 2.411896] [drm] ring test on 3 succeeded in 3 usecs
[ 2.412011] [drm] ring test on 4 succeeded in 3 usecs
[ 2.412126] [drm] ring test on 5 succeeded in 3 usecs
[ 2.412240] [drm] ring test on 6 succeeded in 3 usecs
[ 2.412355] [drm] ring test on 7 succeeded in 3 usecs
[ 2.412484] [drm] ring test on 8 succeeded in 3 usecs
[ 2.412864] [drm] ring test on 9 succeeded in 6 usecs
[ 2.412976] [drm] ring test on 10 succeeded in 4 usecs
[ 2.459127] [drm] ring test on 11 succeeded in 1 usecs
[ 2.479259] [drm] UVD initialized successfully.
[ 2.580509] [Firmware Bug]: battery: (dis)charge rate invalid.
[ 2.580741] ACPI: Battery Slot [BAT1] (battery present)
[ 2.710161] [drm] ring test on 12 succeeded in 13 usecs
[ 2.710286] [drm] ring test on 13 succeeded in 1 usecs
[ 2.710387] [drm] VCE initialized successfully.
[ 3.243531] tsc: Refined TSC clocksource calibration: 998.127 MHz
[ 3.243662] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1cc65b93289, max_idle_ns: 881590487074 ns
[ 3.734588] [drm] fb mappable at 0xE0428000
[ 3.734701] [drm] vram apper at 0xE0000000
[ 3.734802] [drm] size 4325376
[ 3.734900] [drm] fb depth is 24
[ 3.734999] [drm] pitch is 5632
[ 3.735303] fbcon: amdgpudrmfb (fb0) is primary device
[ 3.858601] Console: switching to colour frame buffer device 170x48
[ 3.869008] amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer device
[ 3.875082] [drm] ib test on ring 0 succeeded
[ 3.875275] [drm] ib test on ring 1 succeeded
[ 3.875442] [drm] ib test on ring 2 succeeded
[ 3.875577] [drm] ib test on ring 3 succeeded
[ 3.875711] [drm] ib test on ring 4 succeeded
[ 3.875844] [drm] ib test on ring 5 succeeded
[ 3.875977] [drm] ib test on ring 6 succeeded
[ 3.876110] [drm] ib test on ring 7 succeeded
[ 3.876243] [drm] ib test on ring 8 succeeded
[ 3.876363] [drm] ib test on ring 9 succeeded
[ 3.876480] [drm] ib test on ring 10 succeeded
[ 3.904244] [drm] ib test on ring 11 succeeded
[ 4.025142] [drm] ib test on ring 12 succeeded
[ 4.033356] [drm] Initialized amdgpu 3.9.0 20150101 for 0000:00:01.0 on minor 0
So, it seems that this specific commit, at least on my kabini card (or
rather, APU), probably also only on 32 bit (couldn't yet test a 64 bit
kernel), has introduced a regression.
I thought I'd report this so someone with a bit more knowledge can
have a look. Actually, just three hours ago, I didn't even know what a
UVD is, now I know that it's a Unified Video Decoder, and ... well, if
I continue at that pace, in a year from now I might even know how to
fix this issue I've discovered ;-), but right now, I guess the help of an
expert would be greatly appreaciated! :-)
Greetings
Nils
More information about the amd-gfx
mailing list