[PATCH] drm: dp_mst_topology: Ensure the proposed pbn does not exceed available bandwidth

Wed Aug 28 14:29:32 UTC 2019

On Tue, Aug 27, 2019 at 08:31:29PM -0400, Lyude Paul wrote:
> Hi! So, I don't have access to the DP 1.3 spec or anything later than 1.3.

1.3 was just what I was looking at, I checked 1.2 and it looks the same (aside
from a typo fix).

> However, I'm fairly sure this is very much againt spec since there's no way
> for us to determine the available PBN otherwise...
> Honestly though, being against spec might not surprise me here.

Yeah, I was pretty surprised by this behavior too. Everything in the spec states
that Available_Payload_Bandwidth_Number is what we should be using to determine
maximum PBN.

> 
> I kinda want to look into this more before giving an r-b on this, although
> I've got some comments down below on the patch itself. Feel free to poke me
> tommorrow so we can take a closer look and maybe figure out more about what's
> going on here, I'll try to remember to poke you as well.

Help would be very much appreciated, thanks!

> 
> On Tue, 2019-08-27 at 16:35 -0400, Sean Paul wrote:
> > From: Sean Paul <seanpaul at chromium.org>
> > 
> > The PBN is calculated by the DP helpers during atomic check using the
> > adjusted mode. In some cases, the EDID may contain modes which are not
> > possible given the available PBN. One such example of this is if you
> > downgrade the DP version of a 4k display from 1.2 to 1.1. The EDID would
> > still contain the 4k mode, but it is not possible without HBR2. Another
> > case might be if the branch device and sink have to downgrade their link
> > speed during training.
> > 
> > This patch checks that the proposed pbn does not exceed a port's
> > available pbn before allocating vcpi slots.
> > 
> > Signed-off-by: Sean Paul <seanpaul at chromium.org>
> > ---
> > 
> > This is my first dip into MST, so it's possible (probable?) that I'm
> > doing something wrong. However this seems to fix the issue I'm
> > experiencing, so at least we have a base to work with.
> > 
> > I'm more than a bit confused when available_pbn is 0, and I've tried
> > retrying enum_path_resources and even doing a phy powerup before epr,
> > but it still insists available_pbn=0. I've been looking at the DP 1.3
> > spec and it isn't too clear on why this might be. If there are better
> > resources, I'm interested :)
> > 
> >  drivers/gpu/drm/drm_dp_mst_topology.c | 31 ++++++++++++++++++++++++++-
> >  1 file changed, 30 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> > b/drivers/gpu/drm/drm_dp_mst_topology.c
> > index 82add736e17d..16a88230091a 100644
> > --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> > @@ -2182,7 +2182,26 @@ static int drm_dp_send_enum_path_resources(struct
> > drm_dp_mst_topology_mgr *mgr,
> >  				DRM_ERROR("got incorrect port in response\n");
> >  			DRM_DEBUG_KMS("enum path resources %d: %d %d\n",
> > txmsg->reply.u.path_resources.port_number, txmsg-
> > >reply.u.path_resources.full_payload_bw_number,
> >  			       txmsg-
> > >reply.u.path_resources.avail_payload_bw_number);
> > -			port->available_pbn = txmsg-
> > >reply.u.path_resources.avail_payload_bw_number;
> > +
> > +			/*
> > +			 * Some monitors (Samsung U28D590 at least) return 0
> > for
> > +			 * available pbn while in low power mode.
> > +			 *
> > +			 * If we were to trust this value, we'd end up failing
> > +			 * all vcpi allocations, since the requested bandwidth
> > +			 * will be compared to 0. So use the full pbn as
> > +			 * available. Doing this will cap the vcpi allocations
> > +			 * at the trained link rate and will at least prevent
> > +			 * us from trying to allocate payloads larger than our
> > +			 * full pbn.
> > +			 *
> > +			 * If there is actually no bandwidth left, we'll fail
> > +			 * on ALLOCATE_PAYLOAD anyways.
> 
> I would hope this would be the case, but I've seen a lot of situations where
> MST hubs will just stop responding in situations like this instead of
> providing an actual error. So it's probably safer to validate this as much as
> possible beforehand without relying on the sink to tell us when we've done
> something wrong.
> 

thismakesmesad.gif

> > +			 */
> > +			if (txmsg-
> > >reply.u.path_resources.avail_payload_bw_number)
> > +				port->available_pbn = txmsg-
> > >reply.u.path_resources.avail_payload_bw_number;
> > +			else
> > +				port->available_pbn = txmsg-
> > >reply.u.path_resources.full_payload_bw_number;
> 
> I think we should use a DP quirk for this so that we only follow this behavior
> for this monitor, and not any others. It's possible that other things can
> cause bandwidth restrictions, and while I haven't had a chance to look further
> into it I've noticed that sometimes sinks even end up allocating more
> handwidth then we actually asked for.
> 

After reading your reply, I tested a few other monitors I had laying around and
all are behaving the same way -- even without the "compatibility" mode enabled.
The common theme seems to be that when I reboot my source the available_pbn on
boot is 0. If I hotplug after that, available_pbn is correct.

I'm wondering whether the branch device (CableMatters USB-C 2x DP hub) is
holding onto that allocation across reboot. That said, the payload allocation
I'm making doesn't use the full available PBN, so I would kind of expect the
available pbn to be non-zero if this were the case, but ¯\_(ツ)_/¯

> >  		}
> >  	}
> >  
> > @@ -3239,6 +3258,16 @@ int drm_dp_atomic_find_vcpi_slots(struct
> > drm_atomic_state *state,
> >  	struct drm_dp_vcpi_allocation *pos, *vcpi = NULL;
> >  	int prev_slots, req_slots, ret;
> >  
> > +	/*
> > +	 * Sanity check that the pbn proposed doesn't exceed the maximum
> > +	 * available pbn for the port. This could happen if the EDID is
> > +	 * advertising a mode which needs a faster link rate than has been
> > +	 * trained between the sink and the branch device (easy to repro by
> > +	 * using "compatibility" mode on a 4k display and downgrading to DP
> > 1.1)
> > +	 */
> > +	if (pbn > port->available_pbn)
> > +		return -EINVAL;
> > +
> 
> port->available_pbn isn't really protected by any actual locking yet
> unfortunately :(. See
> 
> https://patchwork.freedesktop.org/patch/318683/?series=63847&rev=1
> 
> so I'm not sure we should be validating the PBN during the atomic check until
> that's landed (we already don't do this, so dropping this wouldn't make things
> any worse then they are right now). Additionally, we also don't have a handler
> for DP_RESOURCE_STATUS_NOTIFY UP messages yet either.

Yep, that's fine with me.

Sean

> 
> >  	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
> >  	if (IS_ERR(topology_state))
> >  		return PTR_ERR(topology_state);
> -- 
> Cheers,
> 	Lyude Paul
> 

-- 
Sean Paul, Software Engineer, Google / Chromium OS