[PATCH] drm/amdgpu/vcn: Fix video_profile switch race condition

David Wu davidwu2 at amd.com
Tue Aug 12 21:39:24 UTC 2025


Hi Sathis,

I think the issue is the /workload_profile_active/ is not protected by 
the lock /workload_profile_mutex/in idle and begin_use.
for multi instance case - all instances could be idle so the last one is 
trying to power off vcn as the fences is 0.
since it does not hold a global lock when counting the fences then when 
another instance has a new job, its /workload_profile_active/
is unknown as it could be ON as the last instance (idle handler) has not 
set it to OFF yet in its idle work handler. A context switch from idle 
to begin_use
will end up vcn power in OFF state. Also to make sure there isn't any 
fence miss - the /workload_profile_mutex/should be used for the
entire begin_use function and idle work handler.
I think Alex's patch just tightens it up to make race condition less 
likely happen.

David

On 2025-08-12 16:36, Sundararaju, Sathishkumar wrote:
>
> Hi David,
>
> On 8/12/2025 10:21 PM, David Wu wrote:
>>
>>
>> On 2025-08-12 10:56, Sathishkumar S wrote:
>>> There is a race condition which leads to dpm video power
>>> profile switch (disable and enable) during active video
>>> decode on multi-instance VCN hardware.
>>>
>>> This patch aims to fix/skip step 3 in the below sequence:
>>>
>>>   - inst_1       power_on
>>>   - inst_0(idle) power_off
>>>   - inst_0(idle) video_power_profile OFF (step 3)
>>>   - inst_1       video_power_profile ON during next begin_use
>>>
>>> Add flags to track ON/OFF vcn instances and check if all
>>> instances are off before disabling video power profile.
>>>
>>> Protect workload_profile_active also within pg_lock and ON it
>>> during first use and OFF it when last VCN instance is powered
>>> OFF. VCN workload_profile_mutex can be removed after similar
>>> clean up is done for vcn2_5.
>>>
>>> Signed-off-by: Sathishkumar S<sathishkumar.sundararaju at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 24 +++++++++---------------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h |  4 ++++
>>>   2 files changed, 13 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>> index 9a76e11d1c18..da372dd7b761 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>> @@ -445,16 +445,16 @@ static void amdgpu_vcn_idle_work_handler(struct work_struct *work)
>>>   	if (!fences && !atomic_read(&vcn_inst->total_submission_cnt)) {
>>>   		mutex_lock(&vcn_inst->vcn_pg_lock);
>>>   		vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_GATE);
>>> -		mutex_unlock(&vcn_inst->vcn_pg_lock);
>>> -		mutex_lock(&adev->vcn.workload_profile_mutex);
>>> -		if (adev->vcn.workload_profile_active) {
>>> +		adev->vcn.flags &= AMDGPU_VCN_FLAG_VINST_OFF(vcn_inst->inst);
>>> +		if (adev->vcn.workload_profile_active &&
>>> +		    !(adev->vcn.flags & AMDGPU_VCN_FLAG_VINST_MASK(adev->vcn.num_vcn_inst))) {
>>>   			r = amdgpu_dpm_switch_power_profile(adev, PP_SMC_POWER_PROFILE_VIDEO,
>>>   							    false);
>>>   			if (r)
>>>   				dev_warn(adev->dev, "(%d) failed to disable video power profile mode\n", r);
>> what if there is a context switch here? since the vcn_pg_lock is per 
>> instance - if another instance starts to
>> call amdgpu_vcn_ring_begin_use() the 
>> amdgpu_dpm_switch_power_profile() will not be called due to 
>> workload_profile_active is per device.
>> I think you still have a race condition.
>
> The situation you are explaining is bound to happen even in the 
> current form of locks without this patch as well, in both cases, 
> processes will run mutually exclusively at different times
>
> with the one holding lock finishing first and then the other continues 
> after, without defined ordering between them. workload_profile_active 
> is common for all vcn instances, it is ON before powering ON
>
> first inst and OFF after all the instances are powered off.
>
> Regards,
> Sathish
>>
>> David
>>
>>>   			adev->vcn.workload_profile_active = false;
>>>   		}
>>> -		mutex_unlock(&adev->vcn.workload_profile_mutex);
>>> +		mutex_unlock(&vcn_inst->vcn_pg_lock);
>>>   	} else {
>>>   		schedule_delayed_work(&vcn_inst->idle_work, VCN_IDLE_TIMEOUT);
>>>   	}
>>> @@ -470,14 +470,8 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
>>>   
>>>   	cancel_delayed_work_sync(&vcn_inst->idle_work);
>>>   
>>> -	/* We can safely return early here because we've cancelled the
>>> -	 * the delayed work so there is no one else to set it to false
>>> -	 * and we don't care if someone else sets it to true.
>>> -	 */
>>> -	if (adev->vcn.workload_profile_active)
>>> -		goto pg_lock;
>>> +	mutex_lock(&vcn_inst->vcn_pg_lock);
>>>   
>>> -	mutex_lock(&adev->vcn.workload_profile_mutex);
>>>   	if (!adev->vcn.workload_profile_active) {
>>>   		r = amdgpu_dpm_switch_power_profile(adev, PP_SMC_POWER_PROFILE_VIDEO,
>>>   						    true);
>>> @@ -485,11 +479,11 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
>>>   			dev_warn(adev->dev, "(%d) failed to switch to video power profile mode\n", r);
>>>   		adev->vcn.workload_profile_active = true;
>>>   	}
>>> -	mutex_unlock(&adev->vcn.workload_profile_mutex);
>>>   
>>> -pg_lock:
>>> -	mutex_lock(&vcn_inst->vcn_pg_lock);
>>> -	vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_UNGATE);
>>> +	if (!(adev->vcn.flags & AMDGPU_VCN_FLAG_VINST_ON(vcn_inst->inst))) {
>>> +		vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_UNGATE);
>>> +		adev->vcn.flags |= AMDGPU_VCN_FLAG_VINST_ON(vcn_inst->inst);
>>> +	}
>>>   
>>>   	/* Only set DPG pause for VCN3 or below, VCN4 and above will be handled by FW */
>>>   	if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG &&
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>> index b3fb1d0e43fc..a876a182ff88 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>> @@ -366,6 +366,10 @@ struct amdgpu_vcn {
>>>   	struct mutex            workload_profile_mutex;
>>>   	u32 reg_count;
>>>   	const struct amdgpu_hwip_reg_entry *reg_list;
>>> +#define AMDGPU_VCN_FLAG_VINST_MASK(n)  (BIT(n+1) - 1)
>>> +#define AMDGPU_VCN_FLAG_VINST_ON(n)    (BIT(n))
>>> +#define AMDGPU_VCN_FLAG_VINST_OFF(n)   (~BIT(n))
>>> +	u32 flags;
>>>   };
>>>   
>>>   struct amdgpu_fw_shared_rb_ptrs_struct {
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250812/0f210179/attachment-0001.htm>


More information about the amd-gfx mailing list