[PATCH] drm/amd/display: Fix pageflip event race condition for DCN. (v2)

Matt Coffin mcoffin13 at gmail.com
Tue May 5 18:03:15 UTC 2020


Sure, I'll file one after work today.

To clarify though, FreeSync still "works" as in the monitor refresh rate
is updating, but it constantly bounces between the maximum and minimum
freesync refresh rates, causing it to look VERY stuttery.

Thanks for the attention, I'll file the but tonight. If you want another
reference person, it got a little discussion in this bug report...

https://gitlab.freedesktop.org/drm/amd/-/issues/1002#note_486494

Cheers,
Matt


On 5/5/20 11:59 AM, Kazlauskas, Nicholas wrote:
> Can you file a full bug report on the gitlab tracker?
> 
> FreeSync is still working on my Navi setups with this patch applied, and
> this patch is essentially just a revert of another patch already (where
> FreeSync worked before).
> 
> I can understand the v2 of this series causing issues, but the v1
> shouldn't be - so I'd like to understand more about the setup where this
> is causing issues - ASIC, OS, compositor, displays, dmesg log, X log, etc.
> 
> Regards,
> Nicholas Kazlauskas
> 
> On 2020-05-05 1:03 p.m., Alex Deucher wrote:
>> Mario or Nick any thoughts?
>>
>> Alex
>>
>> On Mon, May 4, 2020 at 1:35 PM Matt Coffin <mcoffin13 at gmail.com> wrote:
>>>
>>> Hey guys,
>>>
>>> This is still an issue for me, and I'm still having to run a patch to
>>> revert this as of 5.7-rc4. To avoid breaking a lot of people's Navi
>>> setups in 5.7, is there any news on this? Has anyone else at the very
>>> least been able to reproduce the problem?
>>>
>>> It happens for me in every single program that mesa allows to utilize
>>> variable refresh rates, and reverting it "fixes" the issue.
>>>
>>> Cheers, and sorry for the extra email, just making sure this is still on
>>> someone's radar,
>>> Matt
>>>
>>> On 4/14/20 5:32 PM, Matt Coffin wrote:
>>>> Hey everyone,
>>>>
>>>> This patch broke variable refresh rate in games (all that I've tried so
>>>> far... Project CARS 2, DiRT Rally 2.0, Assetto Corsa Competizione) as
>>>> well as a simple freesync tester application.
>>>>
>>>> FreeSync tester I've been using: https://github.com/Nixola/VRRTest
>>>>
>>>> I'm not at all familiar with the page flipping code, so it would
>>>> take me
>>>> a long time to find the *right* way to fix it, but does someone else
>>>> see
>>>> why it would do that?
>>>>
>>>> The symptom is that the refresh rate of the display constantly bounces
>>>> between the two ends of the FreeSync range (for me 40 -> 144), and the
>>>> game stutters like a madman.
>>>>
>>>> Any help on where to start, ideas on how to fix it (other than just
>>>> revert this commit, which I've done in the interim), or alternative
>>>> patches would be appreciated.
>>>>
>>>> Thanks in advance for the work/help,
>>>> Matt
>>>>
>>>> On 3/13/20 8:42 AM, Michel Dänzer wrote:
>>>>> On 2020-03-13 1:35 p.m., Kazlauskas, Nicholas wrote:
>>>>>> On 2020-03-12 10:32 a.m., Alex Deucher wrote:
>>>>>>> On Thu, Mar 5, 2020 at 4:21 PM Mario Kleiner
>>>>>>> <mario.kleiner.de at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Commit '16f17eda8bad ("drm/amd/display: Send vblank and user
>>>>>>>> events at vsartup for DCN")' introduces a new way of pageflip
>>>>>>>> completion handling for DCN, and some trouble.
>>>>>>>>
>>>>>>>> The current implementation introduces a race condition, which
>>>>>>>> can cause pageflip completion events to be sent out one vblank
>>>>>>>> too early, thereby confusing userspace and causing flicker:
>>>>>>>>
>>>>>>>> prepare_flip_isr():
>>>>>>>>
>>>>>>>> 1. Pageflip programming takes the ddev->event_lock.
>>>>>>>> 2. Sets acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED
>>>>>>>> 3. Releases ddev->event_lock.
>>>>>>>>
>>>>>>>> --> Deadline for surface address regs double-buffering passes on
>>>>>>>>       target pipe.
>>>>>>>>
>>>>>>>> 4. dc_commit_updates_for_stream() MMIO programs the new pageflip
>>>>>>>>      into hw, but too late for current vblank.
>>>>>>>>
>>>>>>>> => pflip_status == AMDGPU_FLIP_SUBMITTED, but flip won't complete
>>>>>>>>      in current vblank due to missing the double-buffering deadline
>>>>>>>>      by a tiny bit.
>>>>>>>>
>>>>>>>> 5. VSTARTUP trigger point in vblank is reached, VSTARTUP irq fires,
>>>>>>>>      dm_dcn_crtc_high_irq() gets called.
>>>>>>>>
>>>>>>>> 6. Detects pflip_status == AMDGPU_FLIP_SUBMITTED and assumes the
>>>>>>>>      pageflip has been completed/will complete in this vblank and
>>>>>>>>      sends out pageflip completion event to userspace and resets
>>>>>>>>      pflip_status = AMDGPU_FLIP_NONE.
>>>>>>>>
>>>>>>>> => Flip completion event sent out one vblank too early.
>>>>>>>>
>>>>>>>> This behaviour has been observed during my testing with measurement
>>>>>>>> hardware a couple of time.
>>>>>>>>
>>>>>>>> The commit message says that the extra flip event code was added to
>>>>>>>> dm_dcn_crtc_high_irq() to prevent missing to send out pageflip
>>>>>>>> events
>>>>>>>> in case the pflip irq doesn't fire, because the "DCH HUBP"
>>>>>>>> component
>>>>>>>> is clock gated and doesn't fire pflip irqs in that state. Also that
>>>>>>>> this clock gating may happen if no planes are active. According to
>>>>>>>> Nicholas, the clock gating can also happen if psr is active, and
>>>>>>>> the
>>>>>>>> gating is controlled independently by the hardware, so difficult to
>>>>>>>> detect if and when the completion code in above commit is needed.
>>>>>>>>
>>>>>>>> This patch tries the following solution: It only executes the extra
>>>>>>>> pflip
>>>>>>>> completion code in dm_dcn_crtc_high_irq() iff the hardware reports
>>>>>>>> that there aren't any surface updated pending in the
>>>>>>>> double-buffered
>>>>>>>> surface scanout address registers. Otherwise it leaves pflip
>>>>>>>> completion
>>>>>>>> to the pflip irq handler, for a more race-free experience.
>>>>>>>>
>>>>>>>> This would only guard against the order of events mentioned above.
>>>>>>>> If Step 5 (VSTARTUP trigger) happens before step 4 then this
>>>>>>>> won't help
>>>>>>>> at all, because 1-3 + 5 might happen even without the hw being
>>>>>>>> programmed
>>>>>>>> at all, ie. no surface update pending because none yet programmed
>>>>>>>> into hw.
>>>>>>>>
>>>>>>>> Therefore this patch also changes locking in
>>>>>>>> amdgpu_dm_commit_planes(),
>>>>>>>> so that prepare_flip_isr() and dc_commit_updates_for_stream()
>>>>>>>> are done
>>>>>>>> under event_lock protection within the same critical section.
>>>>>>>>
>>>>>>>> v2: Take Nicholas comments into account, try a different solution.
>>>>>>>>
>>>>>>>> Lightly tested on Polaris (locking) and Raven (the whole DCN
>>>>>>>> stuff).
>>>>>>>> Seems to work without causing obvious new trouble.
>>>>>>>
>>>>>>> Nick, any comments on this?  Can we get this committed or do you
>>>>>>> think
>>>>>>> it needs additional rework?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Alex
>>>>>>
>>>>>> Hi Alex, Mario,
>>>>>>
>>>>>> This might be a little strange, but if we want to get this in as a
>>>>>> fix
>>>>>> for regressions caused by the original vblank and user events at
>>>>>> vstartup patch then I'm actually going to give my reviewed by on the
>>>>>> *v1* of this patch (but not this v2):
>>>>>>
>>>>>> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
>>>>>>
>>>>>> You can feel free to apply that one.
>>>>>>
>>>>>> Reason 1: After having thought about it some more I don't think we
>>>>>> enable anything today that has hubp powered down at the same time we
>>>>>> expect to be waiting for a flip - eg. DMCU powering down HUBP
>>>>>> during PSR
>>>>>> entry. Static screen interrupt should happen after that flip
>>>>>> finishes I
>>>>>> think.
>>>>>>
>>>>>> The CRTC can still be powered on with zero planes, and I don't
>>>>>> think any
>>>>>> userspace explicitly asks for vblank events in this case but it
>>>>>> doesn't
>>>>>> hurt to have the check.
>>>>>>
>>>>>> Reason 2: This new patch will need much more thorough testing from
>>>>>> side
>>>>>> to fully understand the consequences of locking the entire DC commit
>>>>>> sequence. For just a page flip that sounds fine, but for anything
>>>>>> more
>>>>>> than (eg. full updates, modesets, etc) I don't think we want to be
>>>>>> disabling interrupts for potentially many milliseconds.
>>>>>
>>>>> Ah! I was wondering where the attached splat comes from, but I think
>>>>> this explains it: With this patch amdgpu_dm_commit_planes keeps the
>>>>> pcrtc->dev->event_lock spinlock locked while calling
>>>>> dc_commit_updates_for_stream, which ends up calling
>>>>> smu_set_display_count, which tries to lock a mutex.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>
> 


More information about the amd-gfx mailing list