[PATCH] drm: dp_mst_topology: Ensure the proposed pbn does not exceed available bandwidth

Wed Aug 28 00:31:29 UTC 2019

Hi! So, I don't have access to the DP 1.3 spec or anything later than 1.3.
However, I'm fairly sure this is very much againt spec since there's no way
for us to determine the available PBN otherwise...
Honestly though, being against spec might not surprise me here.

I kinda want to look into this more before giving an r-b on this, although
I've got some comments down below on the patch itself. Feel free to poke me
tommorrow so we can take a closer look and maybe figure out more about what's
going on here, I'll try to remember to poke you as well.

On Tue, 2019-08-27 at 16:35 -0400, Sean Paul wrote:
> From: Sean Paul <seanpaul at chromium.org>
> 
> The PBN is calculated by the DP helpers during atomic check using the
> adjusted mode. In some cases, the EDID may contain modes which are not
> possible given the available PBN. One such example of this is if you
> downgrade the DP version of a 4k display from 1.2 to 1.1. The EDID would
> still contain the 4k mode, but it is not possible without HBR2. Another
> case might be if the branch device and sink have to downgrade their link
> speed during training.
> 
> This patch checks that the proposed pbn does not exceed a port's
> available pbn before allocating vcpi slots.
> 
> Signed-off-by: Sean Paul <seanpaul at chromium.org>
> ---
> 
> This is my first dip into MST, so it's possible (probable?) that I'm
> doing something wrong. However this seems to fix the issue I'm
> experiencing, so at least we have a base to work with.
> 
> I'm more than a bit confused when available_pbn is 0, and I've tried
> retrying enum_path_resources and even doing a phy powerup before epr,
> but it still insists available_pbn=0. I've been looking at the DP 1.3
> spec and it isn't too clear on why this might be. If there are better
> resources, I'm interested :)
> 
>  drivers/gpu/drm/drm_dp_mst_topology.c | 31 ++++++++++++++++++++++++++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c
> b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 82add736e17d..16a88230091a 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -2182,7 +2182,26 @@ static int drm_dp_send_enum_path_resources(struct
> drm_dp_mst_topology_mgr *mgr,
>  				DRM_ERROR("got incorrect port in response\n");
>  			DRM_DEBUG_KMS("enum path resources %d: %d %d\n",
> txmsg->reply.u.path_resources.port_number, txmsg-
> >reply.u.path_resources.full_payload_bw_number,
>  			       txmsg-
> >reply.u.path_resources.avail_payload_bw_number);
> -			port->available_pbn = txmsg-
> >reply.u.path_resources.avail_payload_bw_number;
> +
> +			/*
> +			 * Some monitors (Samsung U28D590 at least) return 0
> for
> +			 * available pbn while in low power mode.
> +			 *
> +			 * If we were to trust this value, we'd end up failing
> +			 * all vcpi allocations, since the requested bandwidth
> +			 * will be compared to 0. So use the full pbn as
> +			 * available. Doing this will cap the vcpi allocations
> +			 * at the trained link rate and will at least prevent
> +			 * us from trying to allocate payloads larger than our
> +			 * full pbn.
> +			 *
> +			 * If there is actually no bandwidth left, we'll fail
> +			 * on ALLOCATE_PAYLOAD anyways.

I would hope this would be the case, but I've seen a lot of situations where
MST hubs will just stop responding in situations like this instead of
providing an actual error. So it's probably safer to validate this as much as
possible beforehand without relying on the sink to tell us when we've done
something wrong.

> +			 */
> +			if (txmsg-
> >reply.u.path_resources.avail_payload_bw_number)
> +				port->available_pbn = txmsg-
> >reply.u.path_resources.avail_payload_bw_number;
> +			else
> +				port->available_pbn = txmsg-
> >reply.u.path_resources.full_payload_bw_number;

I think we should use a DP quirk for this so that we only follow this behavior
for this monitor, and not any others. It's possible that other things can
cause bandwidth restrictions, and while I haven't had a chance to look further
into it I've noticed that sometimes sinks even end up allocating more
handwidth then we actually asked for.

>  		}
>  	}
>  
> @@ -3239,6 +3258,16 @@ int drm_dp_atomic_find_vcpi_slots(struct
> drm_atomic_state *state,
>  	struct drm_dp_vcpi_allocation *pos, *vcpi = NULL;
>  	int prev_slots, req_slots, ret;
>  
> +	/*
> +	 * Sanity check that the pbn proposed doesn't exceed the maximum
> +	 * available pbn for the port. This could happen if the EDID is
> +	 * advertising a mode which needs a faster link rate than has been
> +	 * trained between the sink and the branch device (easy to repro by
> +	 * using "compatibility" mode on a 4k display and downgrading to DP
> 1.1)
> +	 */
> +	if (pbn > port->available_pbn)
> +		return -EINVAL;
> +

port->available_pbn isn't really protected by any actual locking yet
unfortunately :(. See

https://patchwork.freedesktop.org/patch/318683/?series=63847&rev=1

so I'm not sure we should be validating the PBN during the atomic check until
that's landed (we already don't do this, so dropping this wouldn't make things
any worse then they are right now). Additionally, we also don't have a handler
for DP_RESOURCE_STATUS_NOTIFY UP messages yet either.

>  	topology_state = drm_atomic_get_mst_topology_state(state, mgr);
>  	if (IS_ERR(topology_state))
>  		return PTR_ERR(topology_state);
-- 
Cheers,
	Lyude Paul