[PATCH] drm/amdgpu/vcn: fix video profile race condition (v3)
Sundararaju, Sathishkumar
sathishkumar.sundararaju at amd.com
Wed Aug 13 21:16:00 UTC 2025
On 8/14/2025 2:33 AM, Alex Deucher wrote:
> On Wed, Aug 13, 2025 at 4:58 PM Sundararaju, Sathishkumar
> <sathishkumar.sundararaju at amd.com> wrote:
>>
>> On 8/14/2025 1:35 AM, Alex Deucher wrote:
>>> On Wed, Aug 13, 2025 at 2:23 PM Sundararaju, Sathishkumar
>>> <sathishkumar.sundararaju at amd.com> wrote:
>>>> Hi Alex, Hi David,
>>>>
>>>> I see David's concern but his suggestion yet wont solve the problem,
>>>> neither the current form , reason :-
>>>>
>>>> The emitted fence count and total submission count are fast transients
>>>> which frequently become 0 in between video decodes (between jobs) even
>>>> with the atomics and locks there can be a switch of video power profile,
>>>> in the current form of patch that window is minimized, but still can
>>>> happen if stress tested. But power state of any instance becoming zero
>>> Can you explain how this can happen? I'm not seeing it.
>> Consider this situation, inst0 and inst1 actively decoding, inst0 decode
>> completes, delayed idle work starts.
>> inst0 idle handler can read 0 total fences and 0 total submission count,
>> even if inst1 is actively decoding,
>> that's between the jobs,
>> - as begin_use increaments vcn.total_submission_cnt and end_use
>> decreaments vcn.total_submission_cnt that can be 0.
>> - if outstanding fences are cleared and no new emitted fence, between
>> jobs , can be 0.
>> - both of the above conditions do not mean video decode is complete on
>> inst1, it is actively decoding.
> How can there be active decoding without an outstanding fence? In
> that case, total_fences (fences from both instances) would be non-0.
I mean on inst1 the job scheduled is already complete, so 0 outstanding
fences, newer job is yet to be scheduled
and commited to ring (inst1) , this doesn't mean decode has stopped on
inst1 right (I am saying if timing of inst0 idle work coincides with this),
Or am I wrong in assuming this ? Can't this ever happen ? Please correct
my understanding here.
Regards,
Sathish
>
> Alex
>
>> Whereas if instances are powered off we are sure idle time is past and
>> it is powered off, no possible way of
>> active video decode, when all instances are off we can safely assume no
>> active decode and global lock protects
>> it against new begin_use on any instance. But the only distant concern
>> is global common locks w.r.t perf, but we
>> are already having a global workprofile mutex , so there shouldn't be
>> any drop in perf, with just one single
>> global lock for all instances.
>>
>> Just sending out a patch with this fix, will leave it to you to decide
>> the right method. If you think outstanding total fences
>> can never be 0 during decode, then your previous version (v3) itself is
>> good, there is no real benefit of splitting the handlers as such.
>>
>> Regards,
>> Sathish
>>> If it is possible, maybe it would be easier to just split the profile
>>> and powergating into separate handlers. The profile one would be
>>> global and the powergating one would be per instance. See the
>>> attached patches.
>>>
>>> Alex
>>>
>>>> can be a sure shot indication of break in a video decode, the mistake in
>>>> my patch was using per instance mutex, I should have used a common
>>>> global mutex, then that covers the situation David is trying to bring out.
>>>>
>>>> Using one global vcn.pg_lock for idle and begin_use and using flags to
>>>> track power state could help us totally avoid this situation.
>>>>
>>>> Regards,
>>>>
>>>> Sathish
>>>>
>>>> On 8/13/2025 11:46 PM, Wu, David wrote:
>>>>> On 8/13/2025 12:51 PM, Alex Deucher wrote:
>>>>>> On Wed, Aug 13, 2025 at 12:39 PM Wu, David <davidwu2 at amd.com> wrote:
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> The addition of total_submission_cnt should work - in that
>>>>>>> it is unlikely to have a context switch right after the begin_use().
>>>>>>> The suggestion of moving it inside the lock (which I prefer in case
>>>>>>> someone
>>>>>>> adds more before the lock and not reviewed thoroughly)
>>>>>>> - up to you to decide.
>>>>>>>
>>>>>>> Reviewed-by: David (Ming Qiang) Wu <David.Wu3 at amd.com>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> On 8/13/2025 9:45 AM, Alex Deucher wrote:
>>>>>>>> If there are multiple instances of the VCN running,
>>>>>>>> we may end up switching the video profile while another
>>>>>>>> instance is active because we only take into account
>>>>>>>> the current instance's submissions. Look at all
>>>>>>>> outstanding fences for the video profile.
>>>>>>>>
>>>>>>>> v2: drop early exit in begin_use()
>>>>>>>> v3: handle possible race between begin_use() work handler
>>>>>>>>
>>>>>>>> Fixes: 3b669df92c85 ("drm/amdgpu/vcn: adjust workload profile
>>>>>>>> handling")
>>>>>>>> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju at amd.com> (v1)
>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>>>>>> ---
>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 40
>>>>>>>> ++++++++++++-------------
>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 +
>>>>>>>> 2 files changed, 21 insertions(+), 20 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>> index 9a76e11d1c184..593c1ddf8819b 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>> @@ -415,19 +415,25 @@ static void
>>>>>>>> amdgpu_vcn_idle_work_handler(struct work_struct *work)
>>>>>>>> struct amdgpu_vcn_inst *vcn_inst =
>>>>>>>> container_of(work, struct amdgpu_vcn_inst,
>>>>>>>> idle_work.work);
>>>>>>>> struct amdgpu_device *adev = vcn_inst->adev;
>>>>>>>> - unsigned int fences = 0, fence[AMDGPU_MAX_VCN_INSTANCES] = {0};
>>>>>>>> - unsigned int i = vcn_inst->inst, j;
>>>>>>>> + unsigned int total_fences = 0,
>>>>>>>> fence[AMDGPU_MAX_VCN_INSTANCES] = {0};
>>>>>>>> + unsigned int i, j;
>>>>>>>> int r = 0;
>>>>>>>>
>>>>>>>> - if (adev->vcn.harvest_config & (1 << i))
>>>>>>>> + if (adev->vcn.harvest_config & (1 << vcn_inst->inst))
>>>>>>>> return;
>>>>>>>>
>>>>>>>> - for (j = 0; j < adev->vcn.inst[i].num_enc_rings; ++j)
>>>>>>>> - fence[i] +=
>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_enc[j]);
>>>>>>>> + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>>>>>> + struct amdgpu_vcn_inst *v = &adev->vcn.inst[i];
>>>>>>>> +
>>>>>>>> + for (j = 0; j < v->num_enc_rings; ++j)
>>>>>>>> + fence[i] +=
>>>>>>>> amdgpu_fence_count_emitted(&v->ring_enc[j]);
>>>>>>>> + fence[i] += amdgpu_fence_count_emitted(&v->ring_dec);
>>>>>>>> + total_fences += fence[i];
>>>>>>>> + }
>>>>>>>>
>>>>>>>> /* Only set DPG pause for VCN3 or below, VCN4 and above will
>>>>>>>> be handled by FW */
>>>>>>>> if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG &&
>>>>>>>> - !adev->vcn.inst[i].using_unified_queue) {
>>>>>>>> + !vcn_inst->using_unified_queue) {
>>>>>>>> struct dpg_pause_state new_state;
>>>>>>>>
>>>>>>>> if (fence[i] ||
>>>>>>>> @@ -436,18 +442,18 @@ static void
>>>>>>>> amdgpu_vcn_idle_work_handler(struct work_struct *work)
>>>>>>>> else
>>>>>>>> new_state.fw_based = VCN_DPG_STATE__UNPAUSE;
>>>>>>>>
>>>>>>>> - adev->vcn.inst[i].pause_dpg_mode(vcn_inst, &new_state);
>>>>>>>> + vcn_inst->pause_dpg_mode(vcn_inst, &new_state);
>>>>>>>> }
>>>>>>>>
>>>>>>>> - fence[i] += amdgpu_fence_count_emitted(&vcn_inst->ring_dec);
>>>>>>>> - fences += fence[i];
>>>>>>>> -
>>>>>>>> - if (!fences && !atomic_read(&vcn_inst->total_submission_cnt)) {
>>>>>>>> + if (!fence[vcn_inst->inst] &&
>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) {
>>>>>>>> + /* This is specific to this instance */
>>>>>>>> mutex_lock(&vcn_inst->vcn_pg_lock);
>>>>>>>> vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_GATE);
>>>>>>>> mutex_unlock(&vcn_inst->vcn_pg_lock);
>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
>>>>>>>> - if (adev->vcn.workload_profile_active) {
>>>>>>>> + /* This is global and depends on all VCN instances */
>>>>>>>> + if (adev->vcn.workload_profile_active &&
>>>>>>>> !total_fences &&
>>>>>>>> + !atomic_read(&adev->vcn.total_submission_cnt)) {
>>>>>>>> r = amdgpu_dpm_switch_power_profile(adev,
>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO,
>>>>>>>> false);
>>>>>>>> if (r)
>>>>>>>> @@ -467,16 +473,10 @@ void amdgpu_vcn_ring_begin_use(struct
>>>>>>>> amdgpu_ring *ring)
>>>>>>>> int r = 0;
>>>>>>>>
>>>>>>>> atomic_inc(&vcn_inst->total_submission_cnt);
>>>>>>>> + atomic_inc(&adev->vcn.total_submission_cnt);
>>>>>>> move this addition down inside the mutex lock
>>>>>>>> cancel_delayed_work_sync(&vcn_inst->idle_work);
>>>>>>>>
>>>>>>>> - /* We can safely return early here because we've cancelled the
>>>>>>>> - * the delayed work so there is no one else to set it to false
>>>>>>>> - * and we don't care if someone else sets it to true.
>>>>>>>> - */
>>>>>>>> - if (adev->vcn.workload_profile_active)
>>>>>>>> - goto pg_lock;
>>>>>>>> -
>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
>>>>>>> move to here:
>>>>>>> atomic_inc(&adev->vcn.total_submission_cnt);
>>>>>>> I think this should work for multiple instances.
>>>>>> Why does this need to be protected by the mutex?
>>>>> hmm.. OK - no need and it is actually better before the mutex.
>>>>> David
>>>>>> Alex
>>>>>>
>>>>>>> David
>>>>>>>> if (!adev->vcn.workload_profile_active) {
>>>>>>>> r = amdgpu_dpm_switch_power_profile(adev,
>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO,
>>>>>>>> @@ -487,7 +487,6 @@ void amdgpu_vcn_ring_begin_use(struct
>>>>>>>> amdgpu_ring *ring)
>>>>>>>> }
>>>>>>>> mutex_unlock(&adev->vcn.workload_profile_mutex);
>>>>>>>>
>>>>>>>> -pg_lock:
>>>>>>>> mutex_lock(&vcn_inst->vcn_pg_lock);
>>>>>>>> vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_UNGATE);
>>>>>>>>
>>>>>>>> @@ -528,6 +527,7 @@ void amdgpu_vcn_ring_end_use(struct amdgpu_ring
>>>>>>>> *ring)
>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].dpg_enc_submission_cnt);
>>>>>>>>
>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].total_submission_cnt);
>>>>>>>> + atomic_dec(&ring->adev->vcn.total_submission_cnt);
>>>>>>>>
>>>>>>>> schedule_delayed_work(&ring->adev->vcn.inst[ring->me].idle_work,
>>>>>>>> VCN_IDLE_TIMEOUT);
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>> index b3fb1d0e43fc9..febc3ce8641ff 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>> @@ -352,6 +352,7 @@ struct amdgpu_vcn {
>>>>>>>>
>>>>>>>> uint16_t inst_mask;
>>>>>>>> uint8_t num_inst_per_aid;
>>>>>>>> + atomic_t total_submission_cnt;
>>>>>>>>
>>>>>>>> /* IP reg dump */
>>>>>>>> uint32_t *ip_dump;
More information about the amd-gfx
mailing list