[Intel-gfx] [PATCH v7 1/6] drm/i915: Fallback to lower link rate and lane count during link training

Thu Sep 29 15:48:43 UTC 2016

On Thu, 29 Sep 2016, Ville Syrjälä <ville.syrjala at linux.intel.com> wrote:
> On Thu, Sep 29, 2016 at 12:44:19PM +0100, Chris Wilson wrote:
>> On Thu, Sep 29, 2016 at 02:26:16PM +0300, Jani Nikula wrote:
>> > On Thu, 29 Sep 2016, Manasi Navare <manasi.d.navare at intel.com> wrote:
>> > > On Tue, Sep 27, 2016 at 08:07:01PM +0300, Jani Nikula wrote:
>> > >> On Tue, 27 Sep 2016, Manasi Navare <manasi.d.navare at intel.com> wrote:
>> > >> > On Mon, Sep 26, 2016 at 04:39:34PM +0300, Jani Nikula wrote:
>> > >> >> On Fri, 16 Sep 2016, Manasi Navare <manasi.d.navare at intel.com> wrote:
>> > >> >> > According to the DisplayPort Spec, in case of Clock Recovery failure
>> > >> >> > the link training sequence should fall back to the lower link rate
>> > >> >> > followed by lower lane count until CR succeeds.
>> > >> >> > On CR success, the sequence proceeds with Channel EQ.
>> > >> >> > In case of Channel EQ failures, it should fallback to
>> > >> >> > lower link rate and lane count and start the CR phase again.
>> > >> >> 
>> > >> >> This change makes the link training start at the max lane count and max
>> > >> >> link rate. This is not ideal, as it wastes the link. And it is not a
>> > >> >> spec requirement. "The Link Policy Maker of the upstream device may
>> > >> >> choose any link count and link rate as long as they do not exceed the
>> > >> >> capabilities of the DP receiver."
>> > >> >> 
>> > >> >> Our current code starts at the minimum required bandwidth for the mode,
>> > >> >> therefore we can't fall back to lower link rate and lane count without
>> > >> >> reducing the mode.
>> > >> >> 
>> > >> >> AFAICT this patch here makes it possible for the link bandwidth to drop
>> > >> >> below what is required for the mode. This is unacceptable.
>> > >> >> 
>> > >> >> BR,
>> > >> >> Jani.
>> > >> >> 
>> > >> >>
>> > >> >
>> > >> > Thanks Jani for your review comments.
>> > >> > Yes in this change we start at the max link rate and lane count. This
>> > >> > change was made according to the design document discussions we had
>> > >> > before strating this DP Redesign project. The main reason for starting
>> > >> > at the maxlink rate and max lane count was for ensuring proper
>> > >> > behavior of DP MST. In case of DP MST, we want to train the link at
>> > >> > the maximum supported link rate/lane count based on an early/ upfront
>> > >> > link training result so that we dont fail when we try to connect a
>> > >> > higher resolution monitor as a second monitor. This a trade off
>> > >> > between wsting the link or higher power vs. needing to retrain for
>> > >> > every monitor that requests a higher BW in case of DP MST.
>> > >> 
>> > >> We already train at max bandwidth for DP MST, which seems to be the
>> > >> sensible thing to do.
>> > >> 
>> > >> > Actually this is also the reason for enabling upfront link training in
>> > >> > the following patch where we train the link much ahead in the modeset
>> > >> > sequence to understand the link rate and lane count values at which
>> > >> > the link can be successfully trained and then the link training
>> > >> > through modeset will always start at the upfront values (maximum
>> > >> > supported values of lane count and link rate based on upfront link
>> > >> > training).
>> > >> 
>> > >> I don't see a need to do this for DP SST.
>> > >> 
>> > >> > As per the CTS, all the test 4.3.1.4 requires that you fall back to
>> > >> > the lower link rate after trying to train at the maximum link rate
>> > >> > advertised through the DPCD registers.
>> > >> 
>> > >> That test does not require the source DUT to default to maximum lane
>> > >> count or link rate of the sink. The source may freely choose the lane
>> > >> count and link rate as long as they don't exceed sink capabilities.
>> > >> 
>> > >> For the purposes of the test, the test setup can request specific
>> > >> parameters to be used, but that does not mean using maximum by
>> > >> *default*.
>> > >> 
>> > >> We currently lack the feature to reduce lane count and link rate. The
>> > >> key to understand here is that starting at max and reducing down to the
>> > >> sufficient parameters for the mode (which is where we start now) offers
>> > >> no real benefit for any use case. What we're lacking is a feature to
>> > >> reduce the link parameters *below* what's required by the mode the
>> > >> userspace wants. This can only be achieved through cooperation with
>> > >> userspace.
>> > >> 
>> > >
>> > > We can train at the optimal link rate required for the requested mode as
>> > > done in the existing implementation and retrain whenever the link training
>> > > test request is sent. 
>> > > For the test 4.3.1.4 in CTS, it does force a failure in CR and expects the
>> > > driver to fall back to even lower link rate. We do not implement this in the
>> > > current driver and so this test fails. Could you elaborate on how this can
>> > > be achieved with the the cooperation with userspace?
>> > > Should we send a uevent to the userspace asking to retry at a lower resolution
>> > > after retraining at the lower link rate?
>> > > This is pertty much the place where majority of the compliance tests are failing.
>> > > How can we pass compliance with regards to this feature?
>> > 
>> > So here's an idea Ville and I came up with. It's not completely thought
>> > out yet, probably has some wrinkles still, but then there are wrinkles
>> > with the upfront link training too (I'll get back to those separately).
>> > 
>> > If link training fails during modeset (either for real or because it's a
>> > test sink that wants to test failures), we 1) store the link parameters
>> > as failing, 2) send a uevent to userspace, hopefully getting the
>> > userspace to do another get modes and try again, 3) propage errors from
>> > modeset.
>> 
>> userspace already tries to do a reprobe after a setcrtc fails, to try
>> and gracefully handle the race between hotplug being in its event queue
>> and performing setcrtc, i.e. I think the error is enough.
>
> I presume we want the modeset to be async, so by the time we notice the
> problem we're no longer in the ioctl.

IOW, we'll just need to send the hotplug uevent anyway.

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Technology Center