[Intel-xe] [PATCH v9 4/8] drm/xe: tweak lock ordering for freq_lock

Tue May 23 13:13:29 UTC 2023

On 23/05/2023 14:07, Rodrigo Vivi wrote:
> On Tue, May 23, 2023 at 10:47:23AM +0100, Matthew Auld wrote:
>> Lockdep spits out:
>>
>> Possible unsafe locking scenario:
>>     CPU0				CPU1
>>     ----				----
>>     lock(&xe->mem_access.lock);
>> 				lock(&pc->freq_lock);
>> 				lock(&xe->mem_access.lock);
>>     lock(&pc->freq_lock);
>>
>> It looks like we already mostly take care to ensure that the mem_access
>> ref is taken outside of taking freq_lock, since that is also grabbed
>> from the runtime_pm callbacks, except for a couple of spots in
>> xe_guc_pc and guc_ct_send_recv(). But tt looks like all the callers are
>> already holding the mem_access.ref, so just switch over to use
>> guc_ct_send_locked().
>>
>> v2:
>>   - There are few more cases it seems.
> 
> should we do this to all the ct_send? and kill this function in favor
> of the locked one?

Yeah, my new thinking is that everything calling ct_send or similar 
needs to already be holding the mem_access.ref i.e we drop the 
mem_access_get() and just keep the assert. But need to check all the 
callers...

> 
> or how we can be sure that we are done with all the cases?
> 
>>
>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_guc_ct.c | 4 +++-
>>   drivers/gpu/drm/xe/xe_guc_pc.c | 8 ++++++--
>>   2 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>> index 9dc906f2651a..7a8c9d6a03f3 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>> @@ -689,7 +689,9 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
>>   retry:
>>   	g2h_fence_init(&g2h_fence, response_buffer);
>>   retry_same_fence:
>> -	ret = guc_ct_send(ct, action, len, 0, 0, &g2h_fence);
>> +	mutex_lock(&ct->lock);
>> +	ret = guc_ct_send_locked(ct, action, len, 0, 0, &g2h_fence);
>> +	mutex_unlock(&ct->lock);
>>   	if (unlikely(ret == -ENOMEM)) {
>>   		void *ptr;
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c
>> index e799faa1c6b8..3b56e57d29af 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_pc.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_pc.c
>> @@ -225,7 +225,9 @@ static int pc_action_set_param(struct xe_guc_pc *pc, u8 id, u32 value)
>>   	if (wait_for_pc_state(pc, SLPC_GLOBAL_STATE_RUNNING))
>>   		return -EAGAIN;
>>   
>> -	ret = xe_guc_ct_send(ct, action, ARRAY_SIZE(action), 0, 0);
>> +	mutex_lock(&ct->lock);
>> +	ret = xe_guc_ct_send_locked(ct, action, ARRAY_SIZE(action), 0, 0);
>> +	mutex_unlock(&ct->lock);
>>   	if (ret)
>>   		drm_err(&pc_to_xe(pc)->drm, "GuC PC set param failed: %pe",
>>   			ERR_PTR(ret));
>> @@ -242,7 +244,9 @@ static int pc_action_setup_gucrc(struct xe_guc_pc *pc, u32 mode)
>>   	};
>>   	int ret;
>>   
>> -	ret = xe_guc_ct_send(ct, action, ARRAY_SIZE(action), 0, 0);
>> +	mutex_lock(&ct->lock);
>> +	ret = xe_guc_ct_send_locked(ct, action, ARRAY_SIZE(action), 0, 0);
>> +	mutex_unlock(&ct->lock);
>>   	if (ret)
>>   		drm_err(&pc_to_xe(pc)->drm, "GuC RC enable failed: %pe",
>>   			ERR_PTR(ret));
>> -- 
>> 2.40.1
>>