[PATCH] drm/amd/display: Fix pageflip event race condition for DCN. (v2)
Matt Coffin
mcoffin13 at gmail.com
Fri May 8 15:54:45 UTC 2020
Just to follow up on this, I decided not to file the report, since
Nick's latest patch that "Fixes:" this has fixed my issue. Thanks for
the good work on that one Nick.
Might want to make sure that makes it in to the 5.7 fixes :)
Thanks again guys.
~Matt
On 5/5/20 11:59 AM, Kazlauskas, Nicholas wrote:
> Can you file a full bug report on the gitlab tracker?
>
> FreeSync is still working on my Navi setups with this patch applied, and
> this patch is essentially just a revert of another patch already (where
> FreeSync worked before).
>
> I can understand the v2 of this series causing issues, but the v1
> shouldn't be - so I'd like to understand more about the setup where this
> is causing issues - ASIC, OS, compositor, displays, dmesg log, X log, etc.
>
> Regards,
> Nicholas Kazlauskas
>
> On 2020-05-05 1:03 p.m., Alex Deucher wrote:
>> Mario or Nick any thoughts?
>>
>> Alex
>>
>> On Mon, May 4, 2020 at 1:35 PM Matt Coffin <mcoffin13 at gmail.com> wrote:
>>>
>>> Hey guys,
>>>
>>> This is still an issue for me, and I'm still having to run a patch to
>>> revert this as of 5.7-rc4. To avoid breaking a lot of people's Navi
>>> setups in 5.7, is there any news on this? Has anyone else at the very
>>> least been able to reproduce the problem?
>>>
>>> It happens for me in every single program that mesa allows to utilize
>>> variable refresh rates, and reverting it "fixes" the issue.
>>>
>>> Cheers, and sorry for the extra email, just making sure this is still on
>>> someone's radar,
>>> Matt
>>>
>>> On 4/14/20 5:32 PM, Matt Coffin wrote:
>>>> Hey everyone,
>>>>
>>>> This patch broke variable refresh rate in games (all that I've tried so
>>>> far... Project CARS 2, DiRT Rally 2.0, Assetto Corsa Competizione) as
>>>> well as a simple freesync tester application.
>>>>
>>>> FreeSync tester I've been using: https://github.com/Nixola/VRRTest
>>>>
>>>> I'm not at all familiar with the page flipping code, so it would
>>>> take me
>>>> a long time to find the *right* way to fix it, but does someone else
>>>> see
>>>> why it would do that?
>>>>
>>>> The symptom is that the refresh rate of the display constantly bounces
>>>> between the two ends of the FreeSync range (for me 40 -> 144), and the
>>>> game stutters like a madman.
>>>>
>>>> Any help on where to start, ideas on how to fix it (other than just
>>>> revert this commit, which I've done in the interim), or alternative
>>>> patches would be appreciated.
>>>>
>>>> Thanks in advance for the work/help,
>>>> Matt
>>>>
>>>> On 3/13/20 8:42 AM, Michel Dänzer wrote:
>>>>> On 2020-03-13 1:35 p.m., Kazlauskas, Nicholas wrote:
>>>>>> On 2020-03-12 10:32 a.m., Alex Deucher wrote:
>>>>>>> On Thu, Mar 5, 2020 at 4:21 PM Mario Kleiner
>>>>>>> <mario.kleiner.de at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Commit '16f17eda8bad ("drm/amd/display: Send vblank and user
>>>>>>>> events at vsartup for DCN")' introduces a new way of pageflip
>>>>>>>> completion handling for DCN, and some trouble.
>>>>>>>>
>>>>>>>> The current implementation introduces a race condition, which
>>>>>>>> can cause pageflip completion events to be sent out one vblank
>>>>>>>> too early, thereby confusing userspace and causing flicker:
>>>>>>>>
>>>>>>>> prepare_flip_isr():
>>>>>>>>
>>>>>>>> 1. Pageflip programming takes the ddev->event_lock.
>>>>>>>> 2. Sets acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED
>>>>>>>> 3. Releases ddev->event_lock.
>>>>>>>>
>>>>>>>> --> Deadline for surface address regs double-buffering passes on
>>>>>>>> target pipe.
>>>>>>>>
>>>>>>>> 4. dc_commit_updates_for_stream() MMIO programs the new pageflip
>>>>>>>> into hw, but too late for current vblank.
>>>>>>>>
>>>>>>>> => pflip_status == AMDGPU_FLIP_SUBMITTED, but flip won't complete
>>>>>>>> in current vblank due to missing the double-buffering deadline
>>>>>>>> by a tiny bit.
>>>>>>>>
>>>>>>>> 5. VSTARTUP trigger point in vblank is reached, VSTARTUP irq fires,
>>>>>>>> dm_dcn_crtc_high_irq() gets called.
>>>>>>>>
>>>>>>>> 6. Detects pflip_status == AMDGPU_FLIP_SUBMITTED and assumes the
>>>>>>>> pageflip has been completed/will complete in this vblank and
>>>>>>>> sends out pageflip completion event to userspace and resets
>>>>>>>> pflip_status = AMDGPU_FLIP_NONE.
>>>>>>>>
>>>>>>>> => Flip completion event sent out one vblank too early.
>>>>>>>>
>>>>>>>> This behaviour has been observed during my testing with measurement
>>>>>>>> hardware a couple of time.
>>>>>>>>
>>>>>>>> The commit message says that the extra flip event code was added to
>>>>>>>> dm_dcn_crtc_high_irq() to prevent missing to send out pageflip
>>>>>>>> events
>>>>>>>> in case the pflip irq doesn't fire, because the "DCH HUBP"
>>>>>>>> component
>>>>>>>> is clock gated and doesn't fire pflip irqs in that state. Also that
>>>>>>>> this clock gating may happen if no planes are active. According to
>>>>>>>> Nicholas, the clock gating can also happen if psr is active, and
>>>>>>>> the
>>>>>>>> gating is controlled independently by the hardware, so difficult to
>>>>>>>> detect if and when the completion code in above commit is needed.
>>>>>>>>
>>>>>>>> This patch tries the following solution: It only executes the extra
>>>>>>>> pflip
>>>>>>>> completion code in dm_dcn_crtc_high_irq() iff the hardware reports
>>>>>>>> that there aren't any surface updated pending in the
>>>>>>>> double-buffered
>>>>>>>> surface scanout address registers. Otherwise it leaves pflip
>>>>>>>> completion
>>>>>>>> to the pflip irq handler, for a more race-free experience.
>>>>>>>>
>>>>>>>> This would only guard against the order of events mentioned above.
>>>>>>>> If Step 5 (VSTARTUP trigger) happens before step 4 then this
>>>>>>>> won't help
>>>>>>>> at all, because 1-3 + 5 might happen even without the hw being
>>>>>>>> programmed
>>>>>>>> at all, ie. no surface update pending because none yet programmed
>>>>>>>> into hw.
>>>>>>>>
>>>>>>>> Therefore this patch also changes locking in
>>>>>>>> amdgpu_dm_commit_planes(),
>>>>>>>> so that prepare_flip_isr() and dc_commit_updates_for_stream()
>>>>>>>> are done
>>>>>>>> under event_lock protection within the same critical section.
>>>>>>>>
>>>>>>>> v2: Take Nicholas comments into account, try a different solution.
>>>>>>>>
>>>>>>>> Lightly tested on Polaris (locking) and Raven (the whole DCN
>>>>>>>> stuff).
>>>>>>>> Seems to work without causing obvious new trouble.
>>>>>>>
>>>>>>> Nick, any comments on this? Can we get this committed or do you
>>>>>>> think
>>>>>>> it needs additional rework?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Alex
>>>>>>
>>>>>> Hi Alex, Mario,
>>>>>>
>>>>>> This might be a little strange, but if we want to get this in as a
>>>>>> fix
>>>>>> for regressions caused by the original vblank and user events at
>>>>>> vstartup patch then I'm actually going to give my reviewed by on the
>>>>>> *v1* of this patch (but not this v2):
>>>>>>
>>>>>> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
>>>>>>
>>>>>> You can feel free to apply that one.
>>>>>>
>>>>>> Reason 1: After having thought about it some more I don't think we
>>>>>> enable anything today that has hubp powered down at the same time we
>>>>>> expect to be waiting for a flip - eg. DMCU powering down HUBP
>>>>>> during PSR
>>>>>> entry. Static screen interrupt should happen after that flip
>>>>>> finishes I
>>>>>> think.
>>>>>>
>>>>>> The CRTC can still be powered on with zero planes, and I don't
>>>>>> think any
>>>>>> userspace explicitly asks for vblank events in this case but it
>>>>>> doesn't
>>>>>> hurt to have the check.
>>>>>>
>>>>>> Reason 2: This new patch will need much more thorough testing from
>>>>>> side
>>>>>> to fully understand the consequences of locking the entire DC commit
>>>>>> sequence. For just a page flip that sounds fine, but for anything
>>>>>> more
>>>>>> than (eg. full updates, modesets, etc) I don't think we want to be
>>>>>> disabling interrupts for potentially many milliseconds.
>>>>>
>>>>> Ah! I was wondering where the attached splat comes from, but I think
>>>>> this explains it: With this patch amdgpu_dm_commit_planes keeps the
>>>>> pcrtc->dev->event_lock spinlock locked while calling
>>>>> dc_commit_updates_for_stream, which ends up calling
>>>>> smu_set_display_count, which tries to lock a mutex.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>
>
More information about the amd-gfx
mailing list