[PATCH] drm/amd/display: Fix pageflip event race condition for DCN. (v2)

Kazlauskas, Nicholas nicholas.kazlauskas at amd.com
Fri Mar 13 12:35:30 UTC 2020


On 2020-03-12 10:32 a.m., Alex Deucher wrote:
> On Thu, Mar 5, 2020 at 4:21 PM Mario Kleiner <mario.kleiner.de at gmail.com> wrote:
>>
>> Commit '16f17eda8bad ("drm/amd/display: Send vblank and user
>> events at vsartup for DCN")' introduces a new way of pageflip
>> completion handling for DCN, and some trouble.
>>
>> The current implementation introduces a race condition, which
>> can cause pageflip completion events to be sent out one vblank
>> too early, thereby confusing userspace and causing flicker:
>>
>> prepare_flip_isr():
>>
>> 1. Pageflip programming takes the ddev->event_lock.
>> 2. Sets acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED
>> 3. Releases ddev->event_lock.
>>
>> --> Deadline for surface address regs double-buffering passes on
>>      target pipe.
>>
>> 4. dc_commit_updates_for_stream() MMIO programs the new pageflip
>>     into hw, but too late for current vblank.
>>
>> => pflip_status == AMDGPU_FLIP_SUBMITTED, but flip won't complete
>>     in current vblank due to missing the double-buffering deadline
>>     by a tiny bit.
>>
>> 5. VSTARTUP trigger point in vblank is reached, VSTARTUP irq fires,
>>     dm_dcn_crtc_high_irq() gets called.
>>
>> 6. Detects pflip_status == AMDGPU_FLIP_SUBMITTED and assumes the
>>     pageflip has been completed/will complete in this vblank and
>>     sends out pageflip completion event to userspace and resets
>>     pflip_status = AMDGPU_FLIP_NONE.
>>
>> => Flip completion event sent out one vblank too early.
>>
>> This behaviour has been observed during my testing with measurement
>> hardware a couple of time.
>>
>> The commit message says that the extra flip event code was added to
>> dm_dcn_crtc_high_irq() to prevent missing to send out pageflip events
>> in case the pflip irq doesn't fire, because the "DCH HUBP" component
>> is clock gated and doesn't fire pflip irqs in that state. Also that
>> this clock gating may happen if no planes are active. According to
>> Nicholas, the clock gating can also happen if psr is active, and the
>> gating is controlled independently by the hardware, so difficult to
>> detect if and when the completion code in above commit is needed.
>>
>> This patch tries the following solution: It only executes the extra pflip
>> completion code in dm_dcn_crtc_high_irq() iff the hardware reports
>> that there aren't any surface updated pending in the double-buffered
>> surface scanout address registers. Otherwise it leaves pflip completion
>> to the pflip irq handler, for a more race-free experience.
>>
>> This would only guard against the order of events mentioned above.
>> If Step 5 (VSTARTUP trigger) happens before step 4 then this won't help
>> at all, because 1-3 + 5 might happen even without the hw being programmed
>> at all, ie. no surface update pending because none yet programmed into hw.
>>
>> Therefore this patch also changes locking in amdgpu_dm_commit_planes(),
>> so that prepare_flip_isr() and dc_commit_updates_for_stream() are done
>> under event_lock protection within the same critical section.
>>
>> v2: Take Nicholas comments into account, try a different solution.
>>
>> Lightly tested on Polaris (locking) and Raven (the whole DCN stuff).
>> Seems to work without causing obvious new trouble.
> 
> Nick, any comments on this?  Can we get this committed or do you think
> it needs additional rework?
> 
> Thanks,
> 
> Alex

Hi Alex, Mario,

This might be a little strange, but if we want to get this in as a fix 
for regressions caused by the original vblank and user events at 
vstartup patch then I'm actually going to give my reviewed by on the 
*v1* of this patch (but not this v2):

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>

You can feel free to apply that one.

Reason 1: After having thought about it some more I don't think we 
enable anything today that has hubp powered down at the same time we 
expect to be waiting for a flip - eg. DMCU powering down HUBP during PSR 
entry. Static screen interrupt should happen after that flip finishes I 
think.

The CRTC can still be powered on with zero planes, and I don't think any 
userspace explicitly asks for vblank events in this case but it doesn't 
hurt to have the check.

Reason 2: This new patch will need much more thorough testing from side 
to fully understand the consequences of locking the entire DC commit 
sequence. For just a page flip that sounds fine, but for anything more 
than (eg. full updates, modesets, etc) I don't think we want to be 
disabling interrupts for potentially many milliseconds.

Maybe we could just use a lock-free queue here for the events. It's 
single producer/single consumer per CRTC.

Regards,
Nicholas Kazlauskas

> 
>>
>> Fixes: 16f17eda8bad ("drm/amd/display: Send vblank and user events at vsartup for DCN")
>> Signed-off-by: Mario Kleiner <mario.kleiner.de at gmail.com>
>> ---
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 80 ++++++++++++++++---
>>   1 file changed, 67 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index d7df1a85e72f..aa4e941b276f 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -287,6 +287,28 @@ static inline bool amdgpu_dm_vrr_active(struct dm_crtc_state *dm_state)
>>                 dm_state->freesync_config.state == VRR_STATE_ACTIVE_FIXED;
>>   }
>>
>> +/**
>> + * dm_crtc_is_flip_pending() - Is a pageflip pending on this crtc?
>> + *
>> + * Returns true if any plane on the crtc has a flip pending, false otherwise.
>> + */
>> +static bool dm_crtc_is_flip_pending(struct dm_crtc_state *acrtc_state)
>> +{
>> +       struct dc_stream_status *status = dc_stream_get_status(acrtc_state->stream);
>> +       const struct dc_plane_status *plane_status;
>> +       int i;
>> +       bool pending = false;
>> +
>> +       for (i = 0; i < status->plane_count; i++) {
>> +               plane_status = dc_plane_get_status(status->plane_states[i]);
>> +               pending |= plane_status->is_flip_pending;
>> +               DRM_DEBUG_DRIVER("plane:%d, flip_pending=%d\n",
>> +                                i, plane_status->is_flip_pending);
>> +       }
>> +
>> +       return pending;
>> +}
>> +
>>   /**
>>    * dm_pflip_high_irq() - Handle pageflip interrupt
>>    * @interrupt_params: ignored
>> @@ -435,6 +457,11 @@ static void dm_vupdate_high_irq(void *interrupt_params)
>>                                  spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
>>                          }
>>                  }
>> +
>> +               if (acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED) {
>> +                       DRM_DEBUG_DRIVER("%s:crtc:%d, flip_pending=%d\n", __func__,
>> +                                           acrtc->crtc_id, dm_crtc_is_flip_pending(acrtc_state));
>> +               }
>>          }
>>   }
>>
>> @@ -489,6 +516,11 @@ static void dm_crtc_high_irq(void *interrupt_params)
>>                                  &acrtc_state->vrr_params.adjust);
>>                          spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
>>                  }
>> +
>> +               if (acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED) {
>> +                       DRM_DEBUG_DRIVER("%s:crtc:%d, flip_pending=%d\n", __func__,
>> +                                        acrtc->crtc_id, dm_crtc_is_flip_pending(acrtc_state));
>> +               }
>>          }
>>   }
>>
>> @@ -543,13 +575,29 @@ static void dm_dcn_crtc_high_irq(void *interrupt_params)
>>                          &acrtc_state->vrr_params.adjust);
>>          }
>>
>> -       if (acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED) {
>> +       /*
>> +        * If there aren't any active_planes, or PSR is active, then DCH HUBP
>> +        * may be clock-gated. In that state, pageflip completion interrupts
>> +        * won't fire and pageflip completion events won't get delivered.
>> +        *
>> +        * Prevent this by sending pending pageflip events from here if a flip
>> +        * has been submitted, but is no longer pending in hw, ie. has already
>> +        * completed.
>> +        *
>> +        * If the flip is still pending in hw, then use dm_pflip_high_irq()
>> +        * instead, handling completion as usual by pflip irq.
>> +        */
>> +       if (acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED &&
>> +           !dm_crtc_is_flip_pending(acrtc_state)) {
>>                  if (acrtc->event) {
>>                          drm_crtc_send_vblank_event(&acrtc->base, acrtc->event);
>>                          acrtc->event = NULL;
>>                          drm_crtc_vblank_put(&acrtc->base);
>>                  }
>>                  acrtc->pflip_status = AMDGPU_FLIP_NONE;
>> +
>> +               DRM_DEBUG_DRIVER("crtc:%d, pflip_stat:AMDGPU_FLIP_NONE\n",
>> +                                acrtc->crtc_id);
>>          }
>>
>>          spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
>> @@ -6325,7 +6373,7 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>>                          to_dm_crtc_state(drm_atomic_get_old_crtc_state(state, pcrtc));
>>          int planes_count = 0, vpos, hpos;
>>          long r;
>> -       unsigned long flags;
>> +       unsigned long flags = 0;
>>          struct amdgpu_bo *abo;
>>          uint64_t tiling_flags;
>>          uint32_t target_vblank, last_flip_vblank;
>> @@ -6513,17 +6561,6 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>>                          usleep_range(1000, 1100);
>>                  }
>>
>> -               if (acrtc_attach->base.state->event) {
>> -                       drm_crtc_vblank_get(pcrtc);
>> -
>> -                       spin_lock_irqsave(&pcrtc->dev->event_lock, flags);
>> -
>> -                       WARN_ON(acrtc_attach->pflip_status != AMDGPU_FLIP_NONE);
>> -                       prepare_flip_isr(acrtc_attach);
>> -
>> -                       spin_unlock_irqrestore(&pcrtc->dev->event_lock, flags);
>> -               }
>> -
>>                  if (acrtc_state->stream) {
>>                          if (acrtc_state->freesync_vrr_info_changed)
>>                                  bundle->stream_update.vrr_infopacket =
>> @@ -6575,6 +6612,15 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>>                                  acrtc_state->stream->link->psr_allow_active)
>>                          amdgpu_dm_psr_disable(acrtc_state->stream);
>>
>> +               if (pflip_present && acrtc_attach->base.state->event) {
>> +                       drm_crtc_vblank_get(pcrtc);
>> +
>> +                       spin_lock_irqsave(&pcrtc->dev->event_lock, flags);
>> +
>> +                       WARN_ON(acrtc_attach->pflip_status != AMDGPU_FLIP_NONE);
>> +                       prepare_flip_isr(acrtc_attach);
>> +               }
>> +
>>                  dc_commit_updates_for_stream(dm->dc,
>>                                                       bundle->surface_updates,
>>                                                       planes_count,
>> @@ -6582,6 +6628,14 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>>                                                       &bundle->stream_update,
>>                                                       dc_state);
>>
>> +               /*
>> +                * Must event_lock protect prepare_flip_isr() above and
>> +                * dc_commit_updates_for_stream within same critical section,
>> +                * or pageflip completion will suffer bad races on DCN.
>> +                */
>> +               if (pflip_present && acrtc_attach->pflip_status == AMDGPU_FLIP_SUBMITTED)
>> +                       spin_unlock_irqrestore(&pcrtc->dev->event_lock, flags);
>> +
>>                  if ((acrtc_state->update_type > UPDATE_TYPE_FAST) &&
>>                                                  acrtc_state->stream->psr_version &&
>>                                                  !acrtc_state->stream->link->psr_feature_enabled)
>> --
>> 2.20.1
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the dri-devel mailing list