[PATCH v5 07/20] drm/dp_mst: Restart last_connected_port_and_mstb() if topology ref fails

Daniel Vetter daniel at ffwll.ch
Thu Jan 10 09:16:30 UTC 2019


On Tue, Jan 08, 2019 at 07:35:04PM -0500, Lyude Paul wrote:
> While this isn't a complete fix, this will improve the reliability of
> drm_dp_get_last_connected_port_and_mstb() pretty significantly during
> hotplug events, since there's a chance that the in-memory topology tree
> may not be fully updated when drm_dp_get_last_connected_port_and_mstb()
> is called and thus might end up causing our search to fail on an mstb
> whose topology refcount has reached 0, but has not yet been removed from
> it's parent.
> 
> Ideally, we should further fix this problem by ensuring that we deal
> with the potential for racing with a hotplug event, which would look
> like this:
> 
> * drm_dp_payload_send_msg() retrieves the last living relative of mstb
>   with drm_dp_get_last_connected_port_and_mstb()
> * drm_dp_payload_send_msg() starts building payload message
>   At the same time, mstb gets unplugged from the topology and is no
>   longer the actual last living relative of the original mstb
> * drm_dp_payload_send_msg() tries sending the payload message, hub times
>   out
> * Hub timed out, we give up and run away-resulting in the payload being
>   leaked
> 
> This could be fixed by restarting the
> drm_dp_get_last_connected_port_and_mstb() search whenever we get a
> timeout, sending the payload to the new mstb, then repeating until
> either the entire topology is removed from the system or
> drm_dp_get_last_connected_port_and_mstb() fails. But since the above
> race condition is not terribly likely, we'll address that in a later
> patch series once we've improved the recovery handling for VCPI
> allocations in the rest of the DP MST helpers.
> 
> Signed-off-by: Lyude Paul <lyude at redhat.com>
> Reviewed-by: Harry Wentland <harry.wentland at amd.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: David Airlie <airlied at redhat.com>
> Cc: Jerry Zuo <Jerry.Zuo at amd.com>
> Cc: Juston Li <juston.li at intel.com>
> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 54 ++++++++++++++++++++++-----
>  1 file changed, 44 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index c53cf7eb1dbc..bafc85f08606 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -2047,24 +2047,50 @@ static struct drm_dp_mst_port *drm_dp_get_last_connected_port_to_mstb(struct drm
>  	return drm_dp_get_last_connected_port_to_mstb(mstb->port_parent->parent);
>  }
>  
> -static struct drm_dp_mst_branch *drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr *mgr,
> -									 struct drm_dp_mst_branch *mstb,
> -									 int *port_num)
> +/**
> + * drm_dp_get_last_connected_port_and_mstb() - Find the last living relatives
> + * in a topology of a given branch device
> + * @mgr: The topology manager to use
> + * @mstb: The disconnected branch device
> + * @port_num: Where to store the number of the last connected port
> + *
> + * Searches upwards in the topology starting from @mstb to try to find the
> + * closest available parent of @mstb that's still connected to the rest of the
> + * topology. This can be used in order to perform operations like releasing
> + * payloads, where the branch device which owned the payload may no longer be
> + * around and thus would require that the payload on the last living relative
> + * be freed instead.
> + *
> + * Returns:
> + * The last connected &drm_dp_mst_branch in the topology that was a parent of
> + * @mstb, if there is one.
> + */

Since this is an internal function and we don't pull it into the kerneldoc
there's a good chance the details will bitrot (since 0day won't spot when
e.g. the arguments change). It'd just keep a small explaining why we need
to do this and drop the other bits. Either way:

Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>


> +static struct drm_dp_mst_branch *
> +drm_dp_get_last_connected_port_and_mstb(struct drm_dp_mst_topology_mgr *mgr,
> +					struct drm_dp_mst_branch *mstb,
> +					int *port_num)
>  {
>  	struct drm_dp_mst_branch *rmstb = NULL;
>  	struct drm_dp_mst_port *found_port;
> +
>  	mutex_lock(&mgr->lock);
> -	if (mgr->mst_primary) {
> +	if (!mgr->mst_primary)
> +		goto out;
> +
> +	do {
>  		found_port = drm_dp_get_last_connected_port_to_mstb(mstb);
> +		if (!found_port)
> +			break;
>  
> -		if (found_port) {
> +		if (drm_dp_mst_topology_try_get_mstb(found_port->parent)) {
>  			rmstb = found_port->parent;
> -			if (drm_dp_mst_topology_try_get_mstb(rmstb))
> -				*port_num = found_port->port_num;
> -			else
> -				rmstb = NULL;
> +			*port_num = found_port->port_num;
> +		} else {
> +			/* Search again, starting from this parent */
> +			mstb = found_port->parent;
>  		}
> -	}
> +	} while (!rmstb);
> +out:
>  	mutex_unlock(&mgr->lock);
>  	return rmstb;
>  }
> @@ -2113,6 +2139,14 @@ static int drm_dp_payload_send_msg(struct drm_dp_mst_topology_mgr *mgr,
>  
>  	drm_dp_queue_down_tx(mgr, txmsg);
>  
> +	/*
> +	 * FIXME: there is a small chance that between getting the last
> +	 * connected mstb and sending the payload message, the last connected
> +	 * mstb could also be removed from the topology. In the future, this
> +	 * needs to be fixed by restarting the
> +	 * drm_dp_get_last_connected_port_and_mstb() search in the event of a
> +	 * timeout if the topology is still connected to the system.
> +	 */
>  	ret = drm_dp_mst_wait_tx_reply(mstb, txmsg);
>  	if (ret > 0) {
>  		if (txmsg->reply.reply_type == 1)
> -- 
> 2.20.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the amd-gfx mailing list