[Intel-gfx] [PATCH v7 1/6] drm/i915: Fallback to lower link rate and lane count during link training

Thu Sep 29 16:05:01 UTC 2016

On Thu, Sep 29, 2016 at 06:48:43PM +0300, Jani Nikula wrote:
> On Thu, 29 Sep 2016, Ville Syrjälä <ville.syrjala at linux.intel.com> wrote:
> > On Thu, Sep 29, 2016 at 12:44:19PM +0100, Chris Wilson wrote:
> >> On Thu, Sep 29, 2016 at 02:26:16PM +0300, Jani Nikula wrote:
> >> > On Thu, 29 Sep 2016, Manasi Navare <manasi.d.navare at intel.com> wrote:
> >> > > On Tue, Sep 27, 2016 at 08:07:01PM +0300, Jani Nikula wrote:
> >> > >> On Tue, 27 Sep 2016, Manasi Navare <manasi.d.navare at intel.com> wrote:
> >> > >> > On Mon, Sep 26, 2016 at 04:39:34PM +0300, Jani Nikula wrote:
> >> > >> >> On Fri, 16 Sep 2016, Manasi Navare <manasi.d.navare at intel.com> wrote:
> >> > >> >> > According to the DisplayPort Spec, in case of Clock Recovery failure
> >> > >> >> > the link training sequence should fall back to the lower link rate
> >> > >> >> > followed by lower lane count until CR succeeds.
> >> > >> >> > On CR success, the sequence proceeds with Channel EQ.
> >> > >> >> > In case of Channel EQ failures, it should fallback to
> >> > >> >> > lower link rate and lane count and start the CR phase again.
> >> > >> >> 
> >> > >> >> This change makes the link training start at the max lane count and max
> >> > >> >> link rate. This is not ideal, as it wastes the link. And it is not a
> >> > >> >> spec requirement. "The Link Policy Maker of the upstream device may
> >> > >> >> choose any link count and link rate as long as they do not exceed the
> >> > >> >> capabilities of the DP receiver."
> >> > >> >> 
> >> > >> >> Our current code starts at the minimum required bandwidth for the mode,
> >> > >> >> therefore we can't fall back to lower link rate and lane count without
> >> > >> >> reducing the mode.
> >> > >> >> 
> >> > >> >> AFAICT this patch here makes it possible for the link bandwidth to drop
> >> > >> >> below what is required for the mode. This is unacceptable.
> >> > >> >> 
> >> > >> >> BR,
> >> > >> >> Jani.
> >> > >> >> 
> >> > >> >>
> >> > >> >
> >> > >> > Thanks Jani for your review comments.
> >> > >> > Yes in this change we start at the max link rate and lane count. This
> >> > >> > change was made according to the design document discussions we had
> >> > >> > before strating this DP Redesign project. The main reason for starting
> >> > >> > at the maxlink rate and max lane count was for ensuring proper
> >> > >> > behavior of DP MST. In case of DP MST, we want to train the link at
> >> > >> > the maximum supported link rate/lane count based on an early/ upfront
> >> > >> > link training result so that we dont fail when we try to connect a
> >> > >> > higher resolution monitor as a second monitor. This a trade off
> >> > >> > between wsting the link or higher power vs. needing to retrain for
> >> > >> > every monitor that requests a higher BW in case of DP MST.
> >> > >> 
> >> > >> We already train at max bandwidth for DP MST, which seems to be the
> >> > >> sensible thing to do.
> >> > >> 
> >> > >> > Actually this is also the reason for enabling upfront link training in
> >> > >> > the following patch where we train the link much ahead in the modeset
> >> > >> > sequence to understand the link rate and lane count values at which
> >> > >> > the link can be successfully trained and then the link training
> >> > >> > through modeset will always start at the upfront values (maximum
> >> > >> > supported values of lane count and link rate based on upfront link
> >> > >> > training).
> >> > >> 
> >> > >> I don't see a need to do this for DP SST.
> >> > >> 
> >> > >> > As per the CTS, all the test 4.3.1.4 requires that you fall back to
> >> > >> > the lower link rate after trying to train at the maximum link rate
> >> > >> > advertised through the DPCD registers.
> >> > >> 
> >> > >> That test does not require the source DUT to default to maximum lane
> >> > >> count or link rate of the sink. The source may freely choose the lane
> >> > >> count and link rate as long as they don't exceed sink capabilities.
> >> > >> 
> >> > >> For the purposes of the test, the test setup can request specific
> >> > >> parameters to be used, but that does not mean using maximum by
> >> > >> *default*.
> >> > >> 
> >> > >> We currently lack the feature to reduce lane count and link rate. The
> >> > >> key to understand here is that starting at max and reducing down to the
> >> > >> sufficient parameters for the mode (which is where we start now) offers
> >> > >> no real benefit for any use case. What we're lacking is a feature to
> >> > >> reduce the link parameters *below* what's required by the mode the
> >> > >> userspace wants. This can only be achieved through cooperation with
> >> > >> userspace.
> >> > >> 
> >> > >
> >> > > We can train at the optimal link rate required for the requested mode as
> >> > > done in the existing implementation and retrain whenever the link training
> >> > > test request is sent. 
> >> > > For the test 4.3.1.4 in CTS, it does force a failure in CR and expects the
> >> > > driver to fall back to even lower link rate. We do not implement this in the
> >> > > current driver and so this test fails. Could you elaborate on how this can
> >> > > be achieved with the the cooperation with userspace?
> >> > > Should we send a uevent to the userspace asking to retry at a lower resolution
> >> > > after retraining at the lower link rate?
> >> > > This is pertty much the place where majority of the compliance tests are failing.
> >> > > How can we pass compliance with regards to this feature?
> >> > 
> >> > So here's an idea Ville and I came up with. It's not completely thought
> >> > out yet, probably has some wrinkles still, but then there are wrinkles
> >> > with the upfront link training too (I'll get back to those separately).
> >> > 
> >> > If link training fails during modeset (either for real or because it's a
> >> > test sink that wants to test failures), we 1) store the link parameters
> >> > as failing, 2) send a uevent to userspace, hopefully getting the
> >> > userspace to do another get modes and try again, 3) propage errors from
> >> > modeset.
> >> 
> >> userspace already tries to do a reprobe after a setcrtc fails, to try
> >> and gracefully handle the race between hotplug being in its event queue
> >> and performing setcrtc, i.e. I think the error is enough.
> >
> > I presume we want the modeset to be async, so by the time we notice the
> > problem we're no longer in the ioctl.
> 
> IOW, we'll just need to send the hotplug uevent anyway.
> 
> BR,
> Jani.
>

I am going to try to implement a the code where if wefail link training at a 
particular link rate then i send the uevent to the userspace saving off the
values at which thelink training failed so that these values can be used in the next
attempt of the modeset to prune the modes accordingly and link training should be
tried in that attempt with the lower link rate. The hope is that this will make the
compliance test 4.3.1.4 happy.

Regards
Manasi
> -- 
> Jani Nikula, Intel Open Source Technology Center