[Intel-gfx] [PATCH v5 06/14] drm/dp_mst: Protect drm_dp_mst_port members with locking

Tue Oct 22 20:08:15 UTC 2019

On Mon, Oct 21, 2019 at 10:36:01PM -0400, Lyude Paul wrote:
> This is a complicated one. Essentially, there's currently a problem in the MST
> core that hasn't really caused any issues that we're aware of (emphasis on "that
> we're aware of"): locking.
> 
> When we go through and probe the link addresses and path resources in a
> topology, we hold no locks when updating ports with said information. The
> members I'm referring to in particular are:
> 
> - ldps
> - ddps
> - mcs
> - pdt
> - dpcd_rev
> - num_sdp_streams
> - num_sdp_stream_sinks
> - available_pbn
> - input
> - connector
> 
> Now that we're handling UP requests asynchronously and will be using some of
> the struct members mentioned above in atomic modesetting in the future for
> features such as PBN validation, this is going to become a lot more important.
> As well, the next few commits that prepare us for and introduce suspend/resume
> reprobing will also need clear locking in order to prevent from additional
> racing hilarities that we never could have hit in the past.
> 
> So, let's solve this issue by using &mgr->base.lock, the modesetting
> lock which currently only protects &mgr->base.state. This works
> perfectly because it allows us to avoid blocking connection_mutex
> unnecessarily, and we can grab this in connector detection paths since
> it's a ww mutex. We start by having drm_dp_mst_handle_up_req() hold this
> when updating ports. For drm_dp_mst_handle_link_address_port() things
> are a bit more complicated. As I've learned the hard way, we can grab
> &mgr->lock.base for everything except for port->connector. See, our
> normal driver probing paths end up generating this rather obvious
> lockdep chain:
> 
> &drm->mode_config.mutex
>   -> crtc_ww_class_mutex/crtc_ww_class_acquire
>     -> &connector->mutex
> 
> However, sysfs grabs &drm->mode_config.mutex in order to protect itself
> from connector state changing under it. Because this entails grabbing
> kn->count, e.g. the lock that the kernel provides for protecting sysfs
> contexts, we end up grabbing kn->count followed by
> &drm->mode_config.mutex. This ends up creating an extremely rude chain:
> 
> &kn->count
>   -> &drm->mode_config.mutex
>     -> crtc_ww_class_mutex/crtc_ww_class_acquire
>       -> &connector->mutex
> 
> I mean, look at that thing! It's just evil!!! This gross thing ends up
> making any calls to drm_connector_register()/drm_connector_unregister()
> impossible when holding any kind of modesetting lock. This is annoying
> because ideally, we always want to ensure that
> drm_dp_mst_port->connector never changes when doing an atomic commit or
> check that would affect the atomic topology state so that it can
> reliably and easily be used from future DRM DP MST helpers to assist
> with tasks such as scanning through the current VCPI allocations and
> adding connectors which need to have their allocations updated in
> response to a bandwidth change or the like.
> 
> Being able to hold &mgr->base.lock throughout the entire link probe
> process would have been _great_, since we could prevent userspace from
> ever seeing any states in-between individual port changes and as a
> result likely end up with a much faster probe and more consistent
> results from said probes. But without some rework of how we handle
> connector probing in sysfs it's not at all currently possible. In the
> future, maybe we can try using the sysfs locks to protect updates to
> connector probing state and fix this mess.
> 
> So for now, to protect everything other than port->connector under
> &mgr->base.lock and ensure that we still have the guarantee that atomic
> check/commit contexts will never see port->connector change we use a
> silly trick. See: port->connector only needs to change in order to
> ensure that input ports (see the MST spec) never have a ghost connector
> associated with them. But, there's nothing stopping us from simply
> throwing the entire port out and creating a new one in order to maintain
> that requirement while still keeping port->connector consistent across
> the lifetime of the port in atomic check/commit contexts. For all
> intended purposes this works fine, as we validate ports in any contexts
> we care about before using them and as such will end up reporting the
> connector as disconnected until it's port's destruction finalizes. So,
> we just do that in cases where we detect port->input has transitioned
> from true->false. We don't need to worry about the other direction,
> since a port without a connector isn't visible to userspace and as such
> doesn't need to be protected by &mgr->base.lock until we finish
> registering a connector for it.
> 
> For updating members of drm_dp_mst_port other than port->connector, we
> simply grab &mgr->base.lock in drm_dp_mst_link_probe_work() for already
> registered ports, update said members and drop the lock before
> potentially registering a connector and probing the link address of it's
> children.
> 
> Finally, we modify drm_dp_mst_detect_port() to take a modesetting lock
> acquisition context in order to acquire &mgr->base.lock under
> &connection_mutex and convert all it's users over to using the
> .detect_ctx probe hooks.
> 
> With that, we finally have well defined locking.
> 
> Changes since v4:
> * Get rid of port->mutex, stop using connection_mutex and just use our own
>   modesetting lock - mgr->base.lock. Also, add a probe_lock that comes
>   before this patch.
> * Just throw out ports that get changed from an output to an input, and
>   replace them with new ports. This lets us ensure that modesetting
>   contexts never see port->connector go from having a connector to being
>   NULL.
> * Write an extremely detailed explanation of what problems this is
>   trying to fix, since there's a _lot_ of context here and I honestly
>   forgot some of it myself a couple times.
> * Don't grab mgr->lock when reading port->mstb in
>   drm_dp_mst_handle_link_address_port(). It's not needed.
> 
> Cc: Juston Li <juston.li at intel.com>
> Cc: Imre Deak <imre.deak at intel.com>
> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> Cc: Harry Wentland <hwentlan at amd.com>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Cc: Sean Paul <sean at poorly.run>
> Signed-off-by: Lyude Paul <lyude at redhat.com>

Overall makes sense to me. Thanks for the comprehensive commit message and
comments, they definitely help :)

Just one nit below,

Reviewed-by: Sean Paul <sean at poorly.run>

> ---
>  .../display/amdgpu_dm/amdgpu_dm_mst_types.c   |  28 +--
>  drivers/gpu/drm/drm_dp_mst_topology.c         | 230 ++++++++++++------
>  drivers/gpu/drm/i915/display/intel_dp_mst.c   |  28 ++-
>  drivers/gpu/drm/nouveau/dispnv50/disp.c       |  32 +--
>  drivers/gpu/drm/radeon/radeon_dp_mst.c        |  24 +-
>  include/drm/drm_dp_mst_helper.h               |  38 ++-
>  6 files changed, 240 insertions(+), 140 deletions(-)
> 

/snip

> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 11d842f0bff5..7bf4db91ff90 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c

/snip

> @@ -1912,35 +1984,40 @@ drm_dp_mst_handle_link_address_port(struct drm_dp_mst_branch *mstb,
>  {
>  	struct drm_dp_mst_topology_mgr *mgr = mstb->mgr;
>  	struct drm_dp_mst_port *port;
> -	bool created = false;
> -	int old_ddps = 0;
> +	int old_ddps = 0, ret;
> +	u8 new_pdt = DP_PEER_DEVICE_NONE;
> +	bool created = false, send_link_addr = false;
>  
>  	port = drm_dp_get_port(mstb, port_msg->port_number);
>  	if (!port) {
> -		port = kzalloc(sizeof(*port), GFP_KERNEL);
> +		port = drm_dp_mst_add_port(dev, mgr, mstb,
> +					   port_msg->port_number);
>  		if (!port)
>  			return;
> -		kref_init(&port->topology_kref);
> -		kref_init(&port->malloc_kref);
> -		port->parent = mstb;
> -		port->port_num = port_msg->port_number;
> -		port->mgr = mgr;
> -		port->aux.name = "DPMST";
> -		port->aux.dev = dev->dev;
> -		port->aux.is_remote = true;
> -
> -		/*
> -		 * Make sure the memory allocation for our parent branch stays
> -		 * around until our own memory allocation is released
> +		created = true;
> +	} else if (port_msg->input_port && !port->input && port->connector) {
> +		/* Destroying the connector is impossible in this context, so
> +		 * replace the port with a new one
>  		 */
> -		drm_dp_mst_get_mstb_malloc(mstb);
> +		drm_dp_mst_topology_unlink_port(mgr, port);
> +		drm_dp_mst_topology_put_port(port);
>  
> +		port = drm_dp_mst_add_port(dev, mgr, mstb,
> +					   port_msg->port_number);
> +		if (!port)
> +			return;
>  		created = true;
>  	} else {
> +		/* Locking is only needed when the port has a connector
> +		 * exposed to userspace
> +		 */
> +		drm_modeset_lock(&mgr->base.lock, NULL);

Random musing: It's kind of unfortunate that we don't have a void varient of
drm_modeset_lock for when there's no acquire_ctx since we end up with a mix of
drm_modeset_lock calls with and without return checking. 

/snip

> @@ -3441,22 +3516,31 @@ EXPORT_SYMBOL(drm_dp_mst_hpd_irq);
>  /**
>   * drm_dp_mst_detect_port() - get connection status for an MST port
>   * @connector: DRM connector for this port
> + * @ctx: The acquisition context to use for grabbing locks
>   * @mgr: manager for this port
> - * @port: unverified pointer to a port
> + * @port: pointer to a port
>   *
> - * This returns the current connection state for a port. It validates the
> - * port pointer still exists so the caller doesn't require a reference
> + * This returns the current connection state for a port.

"On error, this returns -errno"

/snip

> -- 
> 2.21.0
> 

-- 
Sean Paul, Software Engineer, Google / Chromium OS