[PATCH] drm/amd/display: Fix pageflip event race condition for DCN. (v2)
Michel Dänzer
michel at daenzer.net
Fri Mar 13 14:42:39 UTC 2020
On 2020-03-13 1:35 p.m., Kazlauskas, Nicholas wrote:
> On 2020-03-12 10:32 a.m., Alex Deucher wrote:
>> On Thu, Mar 5, 2020 at 4:21 PM Mario Kleiner
>> <mario.kleiner.de at gmail.com> wrote:
>>>
>>> Commit '16f17eda8bad ("drm/amd/display: Send vblank and user
>>> events at vsartup for DCN")' introduces a new way of pageflip
>>> completion handling for DCN, and some trouble.
>>>
>>> The current implementation introduces a race condition, which
>>> can cause pageflip completion events to be sent out one vblank
>>> too early, thereby confusing userspace and causing flicker:
>>>
>>> prepare_flip_isr():
>>>
>>> 1. Pageflip programming takes the ddev->event_lock.
>>> 2. Sets acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED
>>> 3. Releases ddev->event_lock.
>>>
>>> --> Deadline for surface address regs double-buffering passes on
>>> target pipe.
>>>
>>> 4. dc_commit_updates_for_stream() MMIO programs the new pageflip
>>> into hw, but too late for current vblank.
>>>
>>> => pflip_status == AMDGPU_FLIP_SUBMITTED, but flip won't complete
>>> in current vblank due to missing the double-buffering deadline
>>> by a tiny bit.
>>>
>>> 5. VSTARTUP trigger point in vblank is reached, VSTARTUP irq fires,
>>> dm_dcn_crtc_high_irq() gets called.
>>>
>>> 6. Detects pflip_status == AMDGPU_FLIP_SUBMITTED and assumes the
>>> pageflip has been completed/will complete in this vblank and
>>> sends out pageflip completion event to userspace and resets
>>> pflip_status = AMDGPU_FLIP_NONE.
>>>
>>> => Flip completion event sent out one vblank too early.
>>>
>>> This behaviour has been observed during my testing with measurement
>>> hardware a couple of time.
>>>
>>> The commit message says that the extra flip event code was added to
>>> dm_dcn_crtc_high_irq() to prevent missing to send out pageflip events
>>> in case the pflip irq doesn't fire, because the "DCH HUBP" component
>>> is clock gated and doesn't fire pflip irqs in that state. Also that
>>> this clock gating may happen if no planes are active. According to
>>> Nicholas, the clock gating can also happen if psr is active, and the
>>> gating is controlled independently by the hardware, so difficult to
>>> detect if and when the completion code in above commit is needed.
>>>
>>> This patch tries the following solution: It only executes the extra
>>> pflip
>>> completion code in dm_dcn_crtc_high_irq() iff the hardware reports
>>> that there aren't any surface updated pending in the double-buffered
>>> surface scanout address registers. Otherwise it leaves pflip completion
>>> to the pflip irq handler, for a more race-free experience.
>>>
>>> This would only guard against the order of events mentioned above.
>>> If Step 5 (VSTARTUP trigger) happens before step 4 then this won't help
>>> at all, because 1-3 + 5 might happen even without the hw being
>>> programmed
>>> at all, ie. no surface update pending because none yet programmed
>>> into hw.
>>>
>>> Therefore this patch also changes locking in amdgpu_dm_commit_planes(),
>>> so that prepare_flip_isr() and dc_commit_updates_for_stream() are done
>>> under event_lock protection within the same critical section.
>>>
>>> v2: Take Nicholas comments into account, try a different solution.
>>>
>>> Lightly tested on Polaris (locking) and Raven (the whole DCN stuff).
>>> Seems to work without causing obvious new trouble.
>>
>> Nick, any comments on this? Can we get this committed or do you think
>> it needs additional rework?
>>
>> Thanks,
>>
>> Alex
>
> Hi Alex, Mario,
>
> This might be a little strange, but if we want to get this in as a fix
> for regressions caused by the original vblank and user events at
> vstartup patch then I'm actually going to give my reviewed by on the
> *v1* of this patch (but not this v2):
>
> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
>
> You can feel free to apply that one.
>
> Reason 1: After having thought about it some more I don't think we
> enable anything today that has hubp powered down at the same time we
> expect to be waiting for a flip - eg. DMCU powering down HUBP during PSR
> entry. Static screen interrupt should happen after that flip finishes I
> think.
>
> The CRTC can still be powered on with zero planes, and I don't think any
> userspace explicitly asks for vblank events in this case but it doesn't
> hurt to have the check.
>
> Reason 2: This new patch will need much more thorough testing from side
> to fully understand the consequences of locking the entire DC commit
> sequence. For just a page flip that sounds fine, but for anything more
> than (eg. full updates, modesets, etc) I don't think we want to be
> disabling interrupts for potentially many milliseconds.
Ah! I was wondering where the attached splat comes from, but I think
this explains it: With this patch amdgpu_dm_commit_planes keeps the
pcrtc->dev->event_lock spinlock locked while calling
dc_commit_updates_for_stream, which ends up calling
smu_set_display_count, which tries to lock a mutex.
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and X developer
-------------- next part --------------
Mar 13 12:44:49 kaveri kernel: [16728.466999][T347601] BUG: scheduling while atomic: Xorg/347601/0x00000002
Mar 13 12:44:49 kaveri kernel: [16728.467004][T347601] Modules linked in: fuse(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) nft_counter(E) xt_addrtype(E) nft_compat(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) rfkill(E) cpufreq_powersave(E) cpufreq_userspace(E) cpufreq_conservative(E) edac_mce_amd(E) kvm_amd(E) binfmt_misc(E) snd_hda_codec_realtek(E) kvm(E) snd_hda_codec_generic(E) irqbypass(E) ppdev(E) amdgpu(E) wmi_bmof(E) ledtrig_audio(E) nls_ascii(E) nls_cp437(E) snd_hda_codec_hdmi(E) vfat(E) gpu_sched(E) fat(E) crct10dif_pclmul(E) snd_hda_intel(E) ttm(E) snd_intel_dspcfg(E) crc32_pclmul(E) snd_hda_codec(E) drm_kms_helper(E) snd_hda_core(E) cec(E) snd_hwdep(E) ghash_clmulni_intel(E) snd_pcm(E) rc_core(E) r8169(E) snd_timer(E) aesni_intel(E) drm(E) efi_pstore(E) sp5100_tco(E) ccp(E) crypto_simd(E) realtek(E) snd(E) i2c_algo_bit(E) watchdog(E)
Mar 13 12:44:49 kaveri kernel: [16728.467043][T347601] cryptd(E) mfd_core(E) glue_helper(E) pcspkr(E) efivars(E) sg(E) parport_pc(E) libphy(E) k10temp(E) soundcore(E) rng_core(E) i2c_piix4(E) parport(E) wmi(E) button(E) acpi_cpufreq(E) tcp_bbr(E) sch_fq(E) nct6775(E) hwmon_vid(E) sunrpc(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) dm_mod(E) raid10(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) sd_mod(E) evdev(E) hid_generic(E) usbhid(E) hid(E) xhci_pci(E) ahci(E) libahci(E) xhci_hcd(E) crc32c_intel(E) libata(E) usbcore(E) scsi_mod(E) gpio_amdpt(E) gpio_generic(E)
Mar 13 12:44:49 kaveri kernel: [16728.467077][T347601] Preemption disabled at:
Mar 13 12:44:49 kaveri kernel: [16728.467080][T347601] [<0000000000000000>] 0x0
Mar 13 12:44:49 kaveri kernel: [16728.467084][T347601] CPU: 7 PID: 347601 Comm: Xorg Tainted: G E 5.5.8+ #28
Mar 13 12:44:49 kaveri kernel: [16728.467086][T347601] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.OR 11/29/2019
Mar 13 12:44:49 kaveri kernel: [16728.467088][T347601] Call Trace:
Mar 13 12:44:49 kaveri kernel: [16728.467096][T347601] dump_stack+0x66/0x90
Mar 13 12:44:49 kaveri kernel: [16728.467101][T347601] __schedule_bug.cold+0x8e/0x9b
Mar 13 12:44:49 kaveri kernel: [16728.467106][T347601] __schedule+0x63c/0x790
Mar 13 12:44:49 kaveri kernel: [16728.467110][T347601] schedule+0x46/0xf0
Mar 13 12:44:49 kaveri kernel: [16728.467112][T347601] schedule_preempt_disabled+0x14/0x20
Mar 13 12:44:49 kaveri kernel: [16728.467115][T347601] __mutex_lock.isra.0+0x178/0x520
Mar 13 12:44:49 kaveri kernel: [16728.467207][T347601] smu_set_display_count+0x21/0x60 [amdgpu]
Mar 13 12:44:49 kaveri kernel: [16728.467291][T347601] pp_nv_set_display_count+0x23/0x40 [amdgpu]
Mar 13 12:44:49 kaveri kernel: [16728.467375][T347601] dcn2_update_clocks+0x597/0x650 [amdgpu]
Mar 13 12:44:49 kaveri kernel: [16728.467460][T347601] dcn20_prepare_bandwidth+0x32/0x70 [amdgpu]
Mar 13 12:44:49 kaveri kernel: [16728.467541][T347601] dc_commit_updates_for_stream+0xee4/0x15d0 [amdgpu]
Mar 13 12:44:49 kaveri kernel: [16728.467623][T347601] amdgpu_dm_atomic_commit_tail+0x1210/0x2130 [amdgpu]
Mar 13 12:44:49 kaveri kernel: [16728.467632][T347601] ? kfree+0x196/0x2b0
Mar 13 12:44:49 kaveri kernel: [16728.467649][T347601] commit_tail+0x94/0x130 [drm_kms_helper]
Mar 13 12:44:49 kaveri kernel: [16728.467658][T347601] drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
Mar 13 12:44:49 kaveri kernel: [16728.467682][T347601] drm_mode_obj_set_property_ioctl+0x115/0x2d0 [drm]
Mar 13 12:44:49 kaveri kernel: [16728.467699][T347601] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
Mar 13 12:44:49 kaveri kernel: [16728.467711][T347601] drm_ioctl_kernel+0xaa/0xf0 [drm]
Mar 13 12:44:49 kaveri kernel: [16728.467724][T347601] drm_ioctl+0x208/0x390 [drm]
Mar 13 12:44:49 kaveri kernel: [16728.467737][T347601] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
Mar 13 12:44:49 kaveri kernel: [16728.467742][T347601] ? preempt_count_sub+0x9b/0xd0
Mar 13 12:44:49 kaveri kernel: [16728.467793][T347601] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Mar 13 12:44:49 kaveri kernel: [16728.467799][T347601] do_vfs_ioctl+0x45f/0x6d0
Mar 13 12:44:49 kaveri kernel: [16728.467803][T347601] ksys_ioctl+0x5e/0x90
Mar 13 12:44:49 kaveri kernel: [16728.467806][T347601] __x64_sys_ioctl+0x16/0x20
Mar 13 12:44:49 kaveri kernel: [16728.467811][T347601] do_syscall_64+0x5f/0x200
Mar 13 12:44:49 kaveri kernel: [16728.467816][T347601] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 13 12:44:49 kaveri kernel: [16728.467819][T347601] RIP: 0033:0x7f023d792497
Mar 13 12:44:49 kaveri kernel: [16728.467822][T347601] Code: 00 00 90 48 8b 05 f9 79 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 79 0c 00 f7 d8 64 89 01 48
Mar 13 12:44:49 kaveri kernel: [16728.467824][T347601] RSP: 002b:00007ffd773e0348 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 13 12:44:49 kaveri kernel: [16728.467827][T347601] RAX: ffffffffffffffda RBX: 00007ffd773e0380 RCX: 00007f023d792497
Mar 13 12:44:49 kaveri kernel: [16728.467829][T347601] RDX: 00007ffd773e0380 RSI: 00000000c01864ba RDI: 000000000000000d
Mar 13 12:44:49 kaveri kernel: [16728.467830][T347601] RBP: 00000000c01864ba R08: 000000000000005c R09: 00000000cccccccc
Mar 13 12:44:49 kaveri kernel: [16728.467832][T347601] R10: 000055fbcc8a6bd4 R11: 0000000000000246 R12: 000055fbcb7199b0
Mar 13 12:44:49 kaveri kernel: [16728.467834][T347601] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000fff
Mar 13 12:44:49 kaveri kernel: [16728.477723][T347601] ------------[ cut here ]------------
Mar 13 12:44:49 kaveri kernel: [16728.477728][T347601] DEBUG_LOCKS_WARN_ON(val > preempt_count())
Mar 13 12:44:49 kaveri kernel: [16728.477739][T347601] WARNING: CPU: 7 PID: 347601 at kernel/sched/core.c:3816 preempt_count_sub+0x70/0xd0
Mar 13 12:44:49 kaveri kernel: [16728.477743][T347601] Modules linked in: fuse(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) nft_counter(E) xt_addrtype(E) nft_compat(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) rfkill(E) cpufreq_powersave(E) cpufreq_userspace(E) cpufreq_conservative(E) edac_mce_amd(E) kvm_amd(E) binfmt_misc(E) snd_hda_codec_realtek(E) kvm(E) snd_hda_codec_generic(E) irqbypass(E) ppdev(E) amdgpu(E) wmi_bmof(E) ledtrig_audio(E) nls_ascii(E) nls_cp437(E) snd_hda_codec_hdmi(E) vfat(E) gpu_sched(E) fat(E) crct10dif_pclmul(E) snd_hda_intel(E) ttm(E) snd_intel_dspcfg(E) crc32_pclmul(E) snd_hda_codec(E) drm_kms_helper(E) snd_hda_core(E) cec(E) snd_hwdep(E) ghash_clmulni_intel(E) snd_pcm(E) rc_core(E) r8169(E) snd_timer(E) aesni_intel(E) drm(E) efi_pstore(E) sp5100_tco(E) ccp(E) crypto_simd(E) realtek(E) snd(E) i2c_algo_bit(E) watchdog(E)
Mar 13 12:44:49 kaveri kernel: [16728.477780][T347601] cryptd(E) mfd_core(E) glue_helper(E) pcspkr(E) efivars(E) sg(E) parport_pc(E) libphy(E) k10temp(E) soundcore(E) rng_core(E) i2c_piix4(E) parport(E) wmi(E) button(E) acpi_cpufreq(E) tcp_bbr(E) sch_fq(E) nct6775(E) hwmon_vid(E) sunrpc(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) dm_mod(E) raid10(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) sd_mod(E) evdev(E) hid_generic(E) usbhid(E) hid(E) xhci_pci(E) ahci(E) libahci(E) xhci_hcd(E) crc32c_intel(E) libata(E) usbcore(E) scsi_mod(E) gpio_amdpt(E) gpio_generic(E)
Mar 13 12:44:49 kaveri kernel: [16728.477809][T347601] CPU: 7 PID: 347601 Comm: Xorg Tainted: G W E 5.5.8+ #28
Mar 13 12:44:49 kaveri kernel: [16728.477811][T347601] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.OR 11/29/2019
Mar 13 12:44:49 kaveri kernel: [16728.477814][T347601] RIP: 0010:preempt_count_sub+0x70/0xd0
Mar 13 12:44:49 kaveri kernel: [16728.477817][T347601] Code: 6c 5b 5d c3 e8 11 c4 37 00 85 c0 74 f4 8b 15 f7 d9 27 01 85 d2 75 ea 48 c7 c6 a5 f9 cb 94 48 c7 c7 e5 b7 ca 94 e8 22 fc fc ff <0f> 0b eb d3 48 8b 6c 24 10 48 89 ef e8 df e4 02 00 85 c0 74 09 31
Mar 13 12:44:49 kaveri kernel: [16728.477819][T347601] RSP: 0018:ffffa393a61d7928 EFLAGS: 00010282
Mar 13 12:44:49 kaveri kernel: [16728.477821][T347601] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
Mar 13 12:44:49 kaveri kernel: [16728.477823][T347601] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000246
Mar 13 12:44:49 kaveri kernel: [16728.477824][T347601] RBP: ffffa393a61d7c38 R08: 0000000000000001 R09: 000000000000002a
Mar 13 12:44:49 kaveri kernel: [16728.477826][T347601] R10: ffffffff95465690 R11: 00000000954652d3 R12: ffff8ab13baf2918
Mar 13 12:44:49 kaveri kernel: [16728.477828][T347601] R13: ffff8aae13ea6400 R14: ffff8ab13bae0000 R15: ffff8ab13e14c800
Mar 13 12:44:49 kaveri kernel: [16728.477830][T347601] FS: 00007f023d1cb940(0000) GS:ffff8ab14e9c0000(0000) knlGS:0000000000000000
Mar 13 12:44:49 kaveri kernel: [16728.477832][T347601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 13 12:44:49 kaveri kernel: [16728.477833][T347601] CR2: 00007ffa2900b54c CR3: 00000003a4340000 CR4: 00000000003406e0
Mar 13 12:44:49 kaveri kernel: [16728.477835][T347601] Call Trace:
Mar 13 12:44:49 kaveri kernel: [16728.477842][T347601] _raw_spin_unlock_irqrestore+0x20/0x40
Mar 13 12:44:49 kaveri kernel: [16728.477935][T347601] amdgpu_dm_atomic_commit_tail+0x1ee3/0x2130 [amdgpu]
Mar 13 12:44:49 kaveri kernel: [16728.477944][T347601] ? kfree+0x196/0x2b0
Mar 13 12:44:49 kaveri kernel: [16728.477961][T347601] commit_tail+0x94/0x130 [drm_kms_helper]
Mar 13 12:44:49 kaveri kernel: [16728.477970][T347601] drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
Mar 13 12:44:49 kaveri kernel: [16728.477990][T347601] drm_mode_obj_set_property_ioctl+0x115/0x2d0 [drm]
Mar 13 12:44:49 kaveri kernel: [16728.478008][T347601] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
Mar 13 12:44:49 kaveri kernel: [16728.478020][T347601] drm_ioctl_kernel+0xaa/0xf0 [drm]
Mar 13 12:44:49 kaveri kernel: [16728.478033][T347601] drm_ioctl+0x208/0x390 [drm]
Mar 13 12:44:49 kaveri kernel: [16728.478047][T347601] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
Mar 13 12:44:49 kaveri kernel: [16728.478051][T347601] ? preempt_count_sub+0x9b/0xd0
Mar 13 12:44:49 kaveri kernel: [16728.478102][T347601] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
Mar 13 12:44:49 kaveri kernel: [16728.478108][T347601] do_vfs_ioctl+0x45f/0x6d0
Mar 13 12:44:49 kaveri kernel: [16728.478112][T347601] ksys_ioctl+0x5e/0x90
Mar 13 12:44:49 kaveri kernel: [16728.478115][T347601] __x64_sys_ioctl+0x16/0x20
Mar 13 12:44:49 kaveri kernel: [16728.478119][T347601] do_syscall_64+0x5f/0x200
Mar 13 12:44:49 kaveri kernel: [16728.478122][T347601] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar 13 12:44:49 kaveri kernel: [16728.478125][T347601] RIP: 0033:0x7f023d792497
Mar 13 12:44:49 kaveri kernel: [16728.478128][T347601] Code: 00 00 90 48 8b 05 f9 79 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 79 0c 00 f7 d8 64 89 01 48
Mar 13 12:44:49 kaveri kernel: [16728.478130][T347601] RSP: 002b:00007ffd773e0348 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 13 12:44:49 kaveri kernel: [16728.478132][T347601] RAX: ffffffffffffffda RBX: 00007ffd773e0380 RCX: 00007f023d792497
Mar 13 12:44:49 kaveri kernel: [16728.478134][T347601] RDX: 00007ffd773e0380 RSI: 00000000c01864ba RDI: 000000000000000d
Mar 13 12:44:49 kaveri kernel: [16728.478136][T347601] RBP: 00000000c01864ba R08: 000000000000005c R09: 00000000cccccccc
Mar 13 12:44:49 kaveri kernel: [16728.478137][T347601] R10: 000055fbcc8a6bd4 R11: 0000000000000246 R12: 000055fbcb7199b0
Mar 13 12:44:49 kaveri kernel: [16728.478139][T347601] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000fff
Mar 13 12:44:49 kaveri kernel: [16728.478143][T347601] ---[ end trace e1909b5aa94da59e ]---
More information about the amd-gfx
mailing list