[PATCH 1/2] drm/amd/display: Send vblank and user events at vsartup for DCN

Fri Nov 29 19:20:18 UTC 2019

Hi Leo and others,

sorry for the late reply. I just spent some time looking at your patches
and testing them on a Raven DCN-1.

I looked at how the vstartup line is computed in the dc_bandwidth_calcs
etc., and added some DRM_DEBUG statements to the dm_dcn_crtc_high_irq and
dm_pflip_high_irq handlers to print the scanline at which the handlers get
invoked.

>From my reading and the results my understanding is that VSTARTUP always
fires after end of front-porch in VRR mode, so the dm_dcn_crtc_high_irq
handler will only get invoked in the vsync/back-porch area? This is good
and very important, as otherwise all the vblank and timestamp calculations
would often be wrong (if it ever happened inside front-porch).

Could you give me some overview of which interrupts / hw events happens
when on DCN vs DCE? I intend to spend quite a bit of quality time in
December playing with the freesync code in DC and see if i can hack up some
proof-of-concept for precisely timed pageflips - the approach Harry
suggested in his XDC2019 talk which i finally found time to watch. I think
with the highly precise vblank and pageflip timestamps we should be able to
implement this precisely without the need for (jittery) software timers,
just some extensions to DRR hw programming and some trickery similar to
what below-the-range BTR support does. That would be so cool, especially
for neuro-science/vision-science/medical research applications.

My rough undestanding so far for DCN seems to be:

1. Pageflips can execute in front-porch, ie. the register double-buffering
can switch there. Can they also still execute after front-porch? How far
into vsync/back-porch? I assume at some point close to the end of
back-porch they can't anymore, because after a flip the line buffer needs
time to prefetch the new pixeldata from the new scanout buffer [a]?

2. The VSTARTUP interrupt/event in VRR mode happens somewhere programmable
after end of front-porch (suggested by the bandwidth calc code), but before
VUPDATE? Is VSTARTUP the last point at which double-buffering for a
pageflip can happen, ie. after that the line-buffer refill for the next
frame starts, ie. [a]?

3. The VUPDATE interrupt/event marks the end of vblank? And that's where
double-buffering / switch of new values for the DRR registers happens? So
DRR values programmed before VUPDATE will take effect after VUPDATE and
thereby affect the vblank after the current one ie. after the following
video frame?

Is this correct? And how does it differ from Vega/DCE-12 and older <=
Polaris / <= DCE-11 ? I remember from earlier this year that BTR works much
better on DCN (tested) and DCE-12 (presumably, don't have hw to test) than
it does on DCE-11 and earlier. This was due to different behaviour of when
the DRR programing takes effect, allowing for much quicker switching on
DCN. I'd like to understand in detail how the DRR
switching/latching/double-buffering differs, if one of you can enlighten me.

There's one thing about this patch though that i think is not right. The
sending of pageflip completion events from within dm_dcn_crtc_high_irq()
seems to be both not needed and possibly causing potentially wrong results
in pageflip events/timestamps / visual glitches due to races?

Two cases:

a) If a pageflip completes in front porch and the pageflip handler
dm_pflip_high_irq() executes while in front-porch, it will queue the proper
pageflip event for later delivery to user space by drm_crtc_handle_vblank()
which is called by dm_dcn_crtc_high_irq() already.

b) If dm_pflip_high_irq() executes after front-porch (pageflip completes in
back-porch if this is possible), it will deliver the pageflip event itself
after updating the vblank count and timestamps correctly via
drm_crtc_accurate_vblank_count().

There isn't a need for the extra code in your patch (if
(acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED) {...}) and indeed i can just
comment it out and everything works fine.

I think the code could be even harmful if a pageflip gets queued into the
hardware before invocation of dm_dcn_crtc_high_irq() (ie. a bit before
VSTARTUP + irq handling delay), but after missing the deadline for
double-buffering of the hardwares primary surface base address registers.
You could end up with setting acrtc->pflip_status = AMDGPU_FLIP_SUBMITTED,
missing the hw double-buffering deadline, and then dm_dcn_crtc_high_irq()
would decide to send out a pageflip completion event to userspace for a
flip that hasn't actually taken place in the hw in this vblank. Userspace
would then misschedule its presentation due to the wrong pageflip event /
timestamp and you'd end up with the previous rendered image presented one
scanout cycle too long, and the current image silently dropped and never
displayed!

Indeed debug output i added shows that the dm_pflip_high_irq() handler
essentially turns into doing nothing with your patch applied, so pageflip
completion events sent to user space no longer correspond to true hw flips.

I have some hw measuring equipment to verify flip timing independent of the
driver and during a few short test runs i think i observed this glitch at
least once, suggesting the problem is real.

thanks,
-mario

On Tue, Nov 5, 2019 at 7:32 PM Li, Sun peng (Leo) <Sunpeng.Li at amd.com>
wrote:

>
>
> On 2019-11-05 11:15 a.m., Kazlauskas, Nicholas wrote:
> > On 2019-11-05 10:34 a.m., sunpeng.li at amd.com wrote:
> >> From: Leo Li <sunpeng.li at amd.com>
> >>
> >> [Why]
> >>
> >> For DCN hardware, the crtc_high_irq handler is assigned to the vstartup
> >> interrupt. This is different from DCE, which has it assigned to vblank
> >> start.
> >>
> >> We'd like to send vblank and user events at vstartup because:
> >>
> >> * It happens close enough to vupdate - the point of no return for HW.
> >>
> >> * It is programmed as lines relative to vblank end - i.e. it is not in
> >>    the variable portion when VRR is enabled. We should signal user
> >>    events here.
> >>
> >> * The pflip interrupt responsible for sending user events today only
> >>    fires if the DCH HUBP component is not clock gated. In situations
> >>    where planes are disabled - but the CRTC is enabled - user events
> won't
> >>    be sent out, leading to flip done timeouts.
> >>
> >> Consequently, this makes vupdate on DCN hardware redundant. It will be
> >> removed in the next change.
> >>
> >> [How]
> >>
> >> Add a DCN-specific crtc_high_irq handler, and hook it to the VStartup
> >> signal. Inside the DCN handler, we send off user events if the pflip
> >> handler hasn't already done so.
> >>
> >> Signed-off-by: Leo Li <sunpeng.li at amd.com>
> >> ---
> >>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 65 ++++++++++++++++++-
> >>   1 file changed, 64 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> index 00017b91c91a..256a23a0ec28 100644
> >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >> @@ -485,6 +485,69 @@ static void dm_crtc_high_irq(void
> *interrupt_params)
> >>      }
> >>   }
> >>
> >> +
> >> +/**
> >> + * dm_dcn_crtc_high_irq() - Handles VStartup interrupt for DCN
> generation ASICs
> >> + * @interrupt params - interrupt parameters
> >> + *
> >> + * Notify DRM's vblank event handler at VSTARTUP
> >> + *
> >> + * Unlike DCE hardware, we trigger the handler at VSTARTUP. at which:
> >> + * * We are close enough to VUPDATE - the point of no return for hw
> >> + * * We are in the fixed portion of variable front porch when vrr is
> enabled
> >> + * * We are before VUPDATE, where double-buffered vrr registers are
> swapped
> >> + *
> >> + * It is therefore the correct place to signal vblank, send user flip
> events,
> >> + * and update VRR.
> >> + */
> >> +static void dm_dcn_crtc_high_irq(void *interrupt_params)
> >> +{
> >> +    struct common_irq_params *irq_params = interrupt_params;
> >> +    struct amdgpu_device *adev = irq_params->adev;
> >> +    struct amdgpu_crtc *acrtc;
> >> +    struct dm_crtc_state *acrtc_state;
> >> +    unsigned long flags;
> >> +
> >> +    acrtc = get_crtc_by_otg_inst(adev, irq_params->irq_src -
> IRQ_TYPE_VBLANK);
> >> +
> >> +    if (!acrtc)
> >> +            return;
> >> +
> >> +    acrtc_state = to_dm_crtc_state(acrtc->base.state);
> >> +
> >> +    DRM_DEBUG_DRIVER("crtc:%d, vupdate-vrr:%d\n", acrtc->crtc_id,
> >> +                            amdgpu_dm_vrr_active(acrtc_state));
> >> +
> >> +    amdgpu_dm_crtc_handle_crc_irq(&acrtc->base);
> >> +    drm_crtc_handle_vblank(&acrtc->base);
> >
> > Shouldn't this be the other way around? Don't we want the CRC sent back
> > to userspace to have the updated vblank counter?
> >
> > This is how it worked before at least.
> >
> > Other than that, this patch looks fine to me.
> >
> > Nicholas Kazlauskas
>
>
> Looks like we're doing a crtc_accurate_vblank_count() inside the crc
> handler,
> so I don't think order matters here.
>
> Leo
>
> >
> >> +
> >> +    spin_lock_irqsave(&adev->ddev->event_lock, flags)
> >> +
> >> +    if (acrtc_state->vrr_params.supported &&
> >> +        acrtc_state->freesync_config.state ==
> VRR_STATE_ACTIVE_VARIABLE) {
> >> +            mod_freesync_handle_v_update(
> >> +            adev->dm.freesync_module,
> >> +            acrtc_state->stream,
> >> +            &acrtc_state->vrr_params);
> >> +
> >> +            dc_stream_adjust_vmin_vmax(
> >> +                    adev->dm.dc,
> >> +                    acrtc_state->stream,
> >> +                    &acrtc_state->vrr_params.adjust);
> >> +    }
> >> +
> >> +    if (acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED) {
> >> +            if (acrtc->event) {
> >> +                    drm_crtc_send_vblank_event(&acrtc->base,
> acrtc->event);
> >> +                    acrtc->event = NULL;
> >> +                    drm_crtc_vblank_put(&acrtc->base);
> >> +            }
> >> +            acrtc->pflip_status = AMDGPU_FLIP_NONE;
> >> +    }
> >> +
> >> +    spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
> >> +}
> >> +
> >>   static int dm_set_clockgating_state(void *handle,
> >>                enum amd_clockgating_state state)
> >>   {
> >> @@ -2175,7 +2238,7 @@ static int dcn10_register_irq_handlers(struct
> amdgpu_device *adev)
> >>              c_irq_params->irq_src = int_params.irq_source;
> >>
> >>              amdgpu_dm_irq_register_interrupt(adev, &int_params,
> >> -                            dm_crtc_high_irq, c_irq_params);
> >> +                            dm_dcn_crtc_high_irq, c_irq_params);
> >>      }
> >>
> >>      /* Use VUPDATE_NO_LOCK interrupt on DCN, which seems to correspond
> to
> >> --
> >> 2.23.0
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20191129/280fa512/attachment-0001.html>