[PATCH] drm/amd/display: take dc_lock in short pulse handler only

Wed May 19 20:59:05 UTC 2021

On 2021-05-19 4:55 p.m., Aurabindo Pillai wrote:
> [Why]
> Conditions that end up modifying the global dc state must be locked.
> However, during mst allocate payload sequence, lock is already taken.
> With StarTech 1.2 DP hub, we get an HPD RX interrupt for a reason other
> than to indicate down reply availability right after sending payload
> allocation. The handler again takes dc lock before calling the
> dc's HPD RX handler. Due to this contention, the DRM thread which waits
> for MST down reply never gets a chance to finish its waiting
> successfully and ends up timing out. Once the lock is released, the hpd
> rx handler fires and goes ahead to read from the MST HUB, but now its
> too late and the HUB doesnt lightup all displays since DRM lacks error
> handling when payload allocation fails.
> 
> [How]
> Take lock only if there is a change in link status or if automated test
> pattern bit is set. The latter fixes the null pointer dereference when
> running certain DP Link Layer Compliance test.
> 
> Signed-off-by: Aurabindo Pillai <aurabindo.pillai at amd.com>

Discussed this a bit offline and I'd *really* like the proper interface 
in sooner rather than later.

Conditional locking is almost always a sign of a bug, in this case we 
know it's OK but someone can change the function underneath later 
without understanding that we're duplicating some of the checking logic 
in the upper layer.

I don't think the code changes enough in this area for this to happen 
(as it's spec based), but please be mindful and consider splitting the 
checking logic (which is thread safe) out with the link loss logic (the 
functional bit, that isn't thread safe).

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>

Regards,
Nicholas Kazlauskas

> ---
>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 +++++++++++++++++--
>   .../gpu/drm/amd/display/dc/core/dc_link_dp.c  |  2 +-
>   .../gpu/drm/amd/display/dc/inc/dc_link_dp.h   |  4 ++++
>   3 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index e79910cc179c..2c9d099adfc2 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -28,6 +28,7 @@
>   
>   #include "dm_services_types.h"
>   #include "dc.h"
> +#include "dc_link_dp.h"
>   #include "dc/inc/core_types.h"
>   #include "dal_asic_id.h"
>   #include "dmub/dmub_srv.h"
> @@ -2740,6 +2741,7 @@ static void handle_hpd_rx_irq(void *param)
>   	enum dc_connection_type new_connection_type = dc_connection_none;
>   	struct amdgpu_device *adev = drm_to_adev(dev);
>   	union hpd_irq_data hpd_irq_data;
> +	bool lock_flag = 0;
>   
>   	memset(&hpd_irq_data, 0, sizeof(hpd_irq_data));
>   
> @@ -2769,15 +2771,28 @@ static void handle_hpd_rx_irq(void *param)
>   		}
>   	}
>   
> -	if (!amdgpu_in_reset(adev)) {
> +	/*
> +	 * TODO: We need the lock to avoid touching DC state while it's being
> +	 * modified during automated compliance testing, or when link loss
> +	 * happens. While this should be split into subhandlers and proper
> +	 * interfaces to avoid having to conditionally lock like this in the
> +	 * outer layer, we need this workaround temporarily to allow MST
> +	 * lightup in some scenarios to avoid timeout.
> +	 */
> +	if (!amdgpu_in_reset(adev) &&
> +	    (hpd_rx_irq_check_link_loss_status(dc_link, &hpd_irq_data) ||
> +	     hpd_irq_data.bytes.device_service_irq.bits.AUTOMATED_TEST)) {
>   		mutex_lock(&adev->dm.dc_lock);
> +		lock_flag = 1;
> +	}
> +
>   #ifdef CONFIG_DRM_AMD_DC_HDCP
>   	result = dc_link_handle_hpd_rx_irq(dc_link, &hpd_irq_data, NULL);
>   #else
>   	result = dc_link_handle_hpd_rx_irq(dc_link, NULL, NULL);
>   #endif
> +	if (!amdgpu_in_reset(adev) && lock_flag)
>   		mutex_unlock(&adev->dm.dc_lock);
> -	}
>   
>   out:
>   	if (result && !is_mst_root_connector) {
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> index 9e08410bfdfd..32fb9cdbd980 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
> @@ -2070,7 +2070,7 @@ enum dc_status read_hpd_rx_irq_data(
>   	return retval;
>   }
>   
> -static bool hpd_rx_irq_check_link_loss_status(
> +bool hpd_rx_irq_check_link_loss_status(
>   	struct dc_link *link,
>   	union hpd_irq_data *hpd_irq_dpcd_data)
>   {
> diff --git a/drivers/gpu/drm/amd/display/dc/inc/dc_link_dp.h b/drivers/gpu/drm/amd/display/dc/inc/dc_link_dp.h
> index ffc3f2c63db8..7dd8bca542b9 100644
> --- a/drivers/gpu/drm/amd/display/dc/inc/dc_link_dp.h
> +++ b/drivers/gpu/drm/amd/display/dc/inc/dc_link_dp.h
> @@ -68,6 +68,10 @@ bool perform_link_training_with_retries(
>   	enum signal_type signal,
>   	bool do_fallback);
>   
> +bool hpd_rx_irq_check_link_loss_status(
> +	struct dc_link *link,
> +	union hpd_irq_data *hpd_irq_dpcd_data);
> +
>   bool is_mst_supported(struct dc_link *link);
>   
>   bool detect_dp_sink_caps(struct dc_link *link);
>