[PATCH] drm/amd/display: take dc_lock in short pulse handler only

Wed May 19 21:01:51 UTC 2021


On 2021-05-19 4:59 p.m., Kazlauskas, Nicholas wrote:
> On 2021-05-19 4:55 p.m., Aurabindo Pillai wrote:
>> [Why]
>> Conditions that end up modifying the global dc state must be locked.
>> However, during mst allocate payload sequence, lock is already taken.
>> With StarTech 1.2 DP hub, we get an HPD RX interrupt for a reason other
>> than to indicate down reply availability right after sending payload
>> allocation. The handler again takes dc lock before calling the
>> dc's HPD RX handler. Due to this contention, the DRM thread which waits
>> for MST down reply never gets a chance to finish its waiting
>> successfully and ends up timing out. Once the lock is released, the hpd
>> rx handler fires and goes ahead to read from the MST HUB, but now its
>> too late and the HUB doesnt lightup all displays since DRM lacks error
>> handling when payload allocation fails.
>>
>> [How]
>> Take lock only if there is a change in link status or if automated test
>> pattern bit is set. The latter fixes the null pointer dereference when
>> running certain DP Link Layer Compliance test.
>>
>> Signed-off-by: Aurabindo Pillai <aurabindo.pillai at amd.com>
> 
> Discussed this a bit offline and I'd *really* like the proper interface in sooner rather than later.
> 
> Conditional locking is almost always a sign of a bug, in this case we know it's OK but someone can 
> change the function underneath later without understanding that we're duplicating some of the 
> checking logic in the upper layer.
> 
> I don't think the code changes enough in this area for this to happen (as it's spec based), but 
> please be mindful and consider splitting the checking logic (which is thread safe) out with the link 
> loss logic (the functional bit, that isn't thread safe).
> 
> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>

Thanks for the review! I'll get back to fixing this properly in a few weeks.
> 
> Regards,
> Nicholas Kazlauskas
> 
>> ---
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 +++++++++++++++++--
>>   .../gpu/drm/amd/display/dc/core/dc_link_dp.c  |  2 +-
>>   .../gpu/drm/amd/display/dc/inc/dc_link_dp.h   |  4 ++++
>>   3 files changed, 22 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index e79910cc179c..2c9d099adfc2 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -28,6 +28,7 @@
>>   #include "dm_services_types.h"
>>   #include "dc.h"
>> +#include "dc_link_dp.h"
>>   #include "dc/inc/core_types.h"
>>   #include "dal_asic_id.h"
>>   #include "dmub/dmub_srv.h"
>> @@ -2740,6 +2741,7 @@ static void handle_hpd_rx_irq(void *param)
>>       enum dc_connection_type new_connection_type = dc_connection_none;
>>       struct amdgpu_device *adev = drm_to_adev(dev);
>>       union hpd_irq_data hpd_irq_data;
>> +    bool lock_flag = 0;
>>       memset(&hpd_irq_data, 0, sizeof(hpd_irq_data));
>> @@ -2769,15 +2771,28 @@ static void handle_hpd_rx_irq(void *param)
>>           }
>>       }
>> -    if (!amdgpu_in_reset(adev)) {
>> +    /*
>> +     * TODO: We need the lock to avoid touching DC state while it's being
>> +     * modified during automated compliance testing, or when link loss
>> +     * happens. While this should be split into subhandlers and proper
>> +     * interfaces to avoid having to conditionally lock like this in the
>> +     * outer layer, we need this workaround temporarily to allow MST
>> +     * lightup in some scenarios to avoid timeout.
>> +     */
>> +    if (!amdgpu_in_reset(adev) &&
>> +        (hpd_rx_irq_check_link_loss_status(dc_link, &hpd_irq_data) ||
>> +         hpd_irq_data.bytes.device_service_irq.bits.AUTOMATED_TEST)) {
>>           mutex_lock(&adev->dm.dc_lock);
>> +        lock_flag = 1;
>> +    }
>> +
>>   #ifdef CONFIG_DRM_AMD_DC_HDCP
>>       result = dc_link_handle_hpd_rx_irq(dc_link, &hpd_irq_data, NULL);
>>   #else
>>       result = dc_link_handle_hpd_rx_irq(dc_link, NULL, NULL);
>>   #endif
>> +    if (!amdgpu_in_reset(adev) && lock_flag)
>>           mutex_unlock(&adev->dm.dc_lock);
>> -    }
>>   out:
>>       if (result && !is_mst_root_connector) {
>> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c 
>> b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
>> index 9e08410bfdfd..32fb9cdbd980 100644
>> --- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
>> +++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
>> @@ -2070,7 +2070,7 @@ enum dc_status read_hpd_rx_irq_data(
>>       return retval;
>>   }
>> -static bool hpd_rx_irq_check_link_loss_status(
>> +bool hpd_rx_irq_check_link_loss_status(
>>       struct dc_link *link,
>>       union hpd_irq_data *hpd_irq_dpcd_data)
>>   {
>> diff --git a/drivers/gpu/drm/amd/display/dc/inc/dc_link_dp.h 
>> b/drivers/gpu/drm/amd/display/dc/inc/dc_link_dp.h
>> index ffc3f2c63db8..7dd8bca542b9 100644
>> --- a/drivers/gpu/drm/amd/display/dc/inc/dc_link_dp.h
>> +++ b/drivers/gpu/drm/amd/display/dc/inc/dc_link_dp.h
>> @@ -68,6 +68,10 @@ bool perform_link_training_with_retries(
>>       enum signal_type signal,
>>       bool do_fallback);
>> +bool hpd_rx_irq_check_link_loss_status(
>> +    struct dc_link *link,
>> +    union hpd_irq_data *hpd_irq_dpcd_data);
>> +
>>   bool is_mst_supported(struct dc_link *link);
>>   bool detect_dp_sink_caps(struct dc_link *link);
>>
> 

-- 
Regards,
Aurabindo Pillai