[PATCH] drm/amdgpu/vcn: fix video profile race condition (v3)

Thu Aug 14 17:06:14 UTC 2025

Hi Lijo,

On 8/14/2025 6:18 PM, Sundararaju, Sathishkumar wrote:
>
> On 8/14/2025 5:33 PM, Lazar, Lijo wrote:
>> [Public]
>>
>> There is no need for nested lock. It only needs to follow the order
>>          set instance power_state
>>          set profile (this takes a global lock and hence instance 
>> power state will be visible to whichever thread that gets the work 
>> profile lock).
>>
>> You are seeing nested lock just because I added the code just after 
>> power state setting.
>
> Pasting your code from the file for ref :
>
> @@ -464,32 +509,14 @@ void amdgpu_vcn_ring_begin_use(struct 
> amdgpu_ring *ring)
>
> -pg_lock:
>
>      mutex_lock(&vcn_inst->vcn_pg_lock);
>      vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_UNGATE);
>
> +   amdgpu_vcn_get_profile(adev);
>
> vcn_pg_lock isn't  released here yet right ? And in-case you intend to 
> only order the locks, then still there is an un-necessary OFF followed 
> by ON, but yes that is acceptable,
>
> May be you want to move that vcn_pg_lock after amdgpu_vcn_get_profile 
> to protect concurrent dpg_state access in begin_use.
>
> The concern is, this patch access power_state that is protected by 
> some other mutex lock hoping it reflects right values also when 
> holding powerprofile_lock.

Thanks for taking time to explain this personally to me , I get it now, 
your patch should work fine. This above is a wrong hypothetical issue I 
imagined.

amdgpu_vcn_get_profile() and amdgpu_vcn_put_profile can be void return 
type.

And please remove the typo to avoid nested locking.

With above changes, your patch is  :-

Reviewed-by: Sathishkumar S <sathishkumar.sundararaju at amd.com>

And there is other thread which also has Alex's (v3) fix for this issue, 
leave it to Alex/yourself and other review participants to decide for 
the best approach, I am inclined with your approach, as it is centered 
around pg_state.

Regards,

Sathish

>
> Or
>
> Have shared a patch with global workload_profile_mutex that simplifies 
> this handling, and renamed pg_lock -> dpg_lock  and used
>
> that only for dpg_state changes per instance.
>
> Regards,
>
> Sathish
>
>>
>> Thanks,
>> Lijo
>>
>> -----Original Message-----
>> From: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>
>> Sent: Thursday, August 14, 2025 5:23 PM
>> To: Lazar, Lijo <Lijo.Lazar at amd.com>; Alex Deucher 
>> <alexdeucher at gmail.com>
>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander 
>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition 
>> (v3)
>>
>>
>> On 8/14/2025 3:16 PM, Lazar, Lijo wrote:
>>> [Public]
>>>
>>> I see your point now. Attached should work, I guess. Is the concern 
>>> more about having to take the lock for every begin?
>> This is closer,  but the thing is, IMO we shouldn't have to use 2 
>> locks and go into nested locking, we can do with just one global lock.
>>
>> Power_state of each instance, and global workload_profile_active are 
>> inter-related, they need to be guarded together,
>>
>> nested could work , but why nested if single lock is enough ? nested 
>> complicates it.
>>
>> Regards,
>>
>> Sathish
>>
>>> Thanks,
>>> Lijo
>>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>>> Lazar, Lijo
>>> Sent: Thursday, August 14, 2025 2:55 PM
>>> To: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>; Alex
>>> Deucher <alexdeucher at gmail.com>
>>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
>>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
>>> Subject: RE: [PATCH] drm/amdgpu/vcn: fix video profile race condition
>>> (v3)
>>>
>>> [Public]
>>>
>>> That is not required I think. The power profile is set by an 
>>> instance *after* setting itself to power on. Also, it's switched 
>>> back after changing its power state to off.  If idle worker is run 
>>> by another instance, it won't be seeing the inst0 as power gated and 
>>> won't change power profile.
>>>
>>> Thanks,
>>> Lijo
>>> -----Original Message-----
>>> From: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>
>>> Sent: Thursday, August 14, 2025 2:41 PM
>>> To: Lazar, Lijo <Lijo.Lazar at amd.com>; Alex Deucher
>>> <alexdeucher at gmail.com>
>>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
>>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition
>>> (v3)
>>>
>>> Hi Lijo,
>>>
>>> On 8/14/2025 2:11 PM, Lazar, Lijo wrote:
>>>> [Public]
>>>>
>>>> We already have a per instance power state that can be tracked. 
>>>> What about something like attached?
>>> This also has concurrent access of the power state ,
>>> vcn.inst[i].cur_state is not protected by workload_profile_mutex
>>>
>>> every where, it can still change while you are holding 
>>> workload_profile_mutex and checking it.
>>>
>>> Regards,
>>>
>>> Sathish
>>>
>>>> Thanks,
>>>> Lijo
>>>> -----Original Message-----
>>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>>>> Sundararaju, Sathishkumar
>>>> Sent: Thursday, August 14, 2025 4:43 AM
>>>> To: Alex Deucher <alexdeucher at gmail.com>
>>>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
>>>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition
>>>> (v3)
>>>>
>>>>
>>>> On 8/14/2025 3:38 AM, Alex Deucher wrote:
>>>>> On Wed, Aug 13, 2025 at 5:1 PM Sundararaju, Sathishkumar
>>>>> <sathishkumar.sundararaju at amd.com> wrote:
>>>>>> On 8/14/2025 2:33 AM, Alex Deucher wrote:
>>>>>>> On Wed, Aug 13, 2025 at 4:58 PM Sundararaju, Sathishkumar
>>>>>>> <sathishkumar.sundararaju at amd.com> wrote:
>>>>>>>> On 8/14/2025 1:35 AM, Alex Deucher wrote:
>>>>>>>>> On Wed, Aug 13, 2025 at 2:23 PM Sundararaju, Sathishkumar
>>>>>>>>> <sathishkumar.sundararaju at amd.com> wrote:
>>>>>>>>>> Hi Alex, Hi David,
>>>>>>>>>>
>>>>>>>>>> I see David's concern but his suggestion yet wont solve the
>>>>>>>>>> problem, neither the current form , reason :-
>>>>>>>>>>
>>>>>>>>>> The emitted fence count and total submission count are fast
>>>>>>>>>> transients which frequently become 0 in between video decodes
>>>>>>>>>> (between jobs) even with the atomics and locks there can be a
>>>>>>>>>> switch of video power profile, in the current form of patch
>>>>>>>>>> that window is minimized, but still can happen if stress
>>>>>>>>>> tested. But power state of any instance becoming zero
>>>>>>>>> Can you explain how this can happen?  I'm not seeing it.
>>>>>>>> Consider this situation, inst0 and inst1 actively decoding, inst0
>>>>>>>> decode completes, delayed idle work starts.
>>>>>>>> inst0 idle handler can read 0 total fences and 0 total submission
>>>>>>>> count, even if inst1 is actively decoding, that's between the 
>>>>>>>> jobs,
>>>>>>>>        - as begin_use increaments vcn.total_submission_cnt and
>>>>>>>> end_use decreaments vcn.total_submission_cnt that can be 0.
>>>>>>>>        - if outstanding fences are cleared and no new emitted
>>>>>>>> fence, between jobs , can be 0.
>>>>>>>>        - both of the above conditions do not mean video decode is
>>>>>>>> complete on inst1, it is actively decoding.
>>>>>>> How can there be active decoding without an outstanding fence?  In
>>>>>>> that case, total_fences (fences from both instances) would be 
>>>>>>> non-0.
>>>>>> I mean on inst1 the job scheduled is already complete, so 0
>>>>>> outstanding fences, newer job is yet to be scheduled
>>>>>>
>>>>>> and commited to ring (inst1) , this doesn't mean decode has stopped
>>>>>> on
>>>>>> inst1 right (I am saying if timing of inst0 idle work coincides
>>>>>> with this),
>>>>>>
>>>>>> Or am I wrong in assuming this ? Can't this ever happen ? Please
>>>>>> correct my understanding here.
>>>>> The flow looks like:
>>>>>
>>>>> begin_use(inst)
>>>>> emit_fence(inst)
>>>>> end_use(inst)
>>>>>
>>>>> ...later
>>>>> fence signals
>>>>> ...later
>>>>> work handler
>>>>>
>>>>> In begin_use we increment the global and per instance submission.
>>>>> This protects the power gating and profile until end_use. In end
>>>>> use we decrement the submissions because we don't need to protect
>>>>> anything any more as we have the fence that was submitted via the
>>>>> ring.  That fence won't signal until the job is complete.
>>>> Is a next begin_use always guaranteed to be run before current job 
>>>> fence signals ?
>>>>
>>>> if not then both total submission and total fence are zero , example
>>>> delayed job/packet submissions
>>>>
>>>> from user space, or next job schedule happens after current job 
>>>> fence signals.
>>>>
>>>> if this is never possible then (v3) is perfect.
>>>>
>>>> Regards,
>>>>
>>>> Sathish
>>>>
>>>>> For power gating, we
>>>>> only care about the submission count and fences for that instance,
>>>>> for the profile, we care about submission count and fences all 
>>>>> instances.
>>>>> Once the fences have signalled, the outstanding fences will be 0 and
>>>>> there won't be any active work.
>>>>>
>>>>> Alex
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Sathish
>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>>> Whereas if instances are powered off we are sure idle time is
>>>>>>>> past and it is powered off, no possible way of active video
>>>>>>>> decode, when all instances are off we can safely assume no active
>>>>>>>> decode and global lock protects it against new begin_use on any 
>>>>>>>> instance.
>>>>>>>> But the only distant concern is global common locks w.r.t perf,
>>>>>>>> but we are already having a global workprofile mutex , so there
>>>>>>>> shouldn't be any drop in perf, with just one single global lock
>>>>>>>> for all instances.
>>>>>>>>
>>>>>>>> Just sending out a patch with this fix, will leave it to you to
>>>>>>>> decide the right method. If you think outstanding total fences
>>>>>>>> can never be 0 during decode, then your previous version (v3)
>>>>>>>> itself is good, there is no real benefit of splitting the 
>>>>>>>> handlers as such.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Sathish
>>>>>>>>> If it is possible, maybe it would be easier to just split the
>>>>>>>>> profile and powergating into separate handlers. The profile one
>>>>>>>>> would be global and the powergating one would be per instance.
>>>>>>>>> See the attached patches.
>>>>>>>>>
>>>>>>>>> Alex
>>>>>>>>>
>>>>>>>>>> can be a sure shot indication of break in a video decode, the
>>>>>>>>>> mistake in my patch was using per instance mutex, I should have
>>>>>>>>>> used a common global mutex, then that covers the situation 
>>>>>>>>>> David is trying to bring out.
>>>>>>>>>>
>>>>>>>>>> Using one global vcn.pg_lock for idle and begin_use and using
>>>>>>>>>> flags to track power state could help us totally avoid this 
>>>>>>>>>> situation.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Sathish
>>>>>>>>>>
>>>>>>>>>> On 8/13/2025 11:46 PM, Wu, David wrote:
>>>>>>>>>>> On 8/13/2025 12:51 PM, Alex Deucher wrote:
>>>>>>>>>>>> On Wed, Aug 13, 2025 at 12:39 PM Wu, David 
>>>>>>>>>>>> <davidwu2 at amd.com> wrote:
>>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>>
>>>>>>>>>>>>> The addition of  total_submission_cnt should work - in that
>>>>>>>>>>>>> it is unlikely to have a context switch right after the 
>>>>>>>>>>>>> begin_use().
>>>>>>>>>>>>> The suggestion of moving it inside the lock (which I prefer
>>>>>>>>>>>>> in case someone adds more before the lock and not reviewed
>>>>>>>>>>>>> thoroughly)
>>>>>>>>>>>>>          - up to you to decide.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Reviewed-by: David (Ming Qiang) Wu <David.Wu3 at amd.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>>> On 8/13/2025 9:45 AM, Alex Deucher wrote:
>>>>>>>>>>>>>> If there are multiple instances of the VCN running, we may
>>>>>>>>>>>>>> end up switching the video profile while another instance
>>>>>>>>>>>>>> is active because we only take into account the current
>>>>>>>>>>>>>> instance's submissions.  Look at all outstanding fences for
>>>>>>>>>>>>>> the video profile.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> v2: drop early exit in begin_use()
>>>>>>>>>>>>>> v3: handle possible race between begin_use() work handler
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fixes: 3b669df92c85 ("drm/amdgpu/vcn: adjust workload
>>>>>>>>>>>>>> profile
>>>>>>>>>>>>>> handling")
>>>>>>>>>>>>>> Reviewed-by: Sathishkumar S
>>>>>>>>>>>>>> <sathishkumar.sundararaju at amd.com> (v1)
>>>>>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 40
>>>>>>>>>>>>>> ++++++++++++-------------
>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h |  1 +
>>>>>>>>>>>>>>          2 files changed, 21 insertions(+), 20 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>>>>>>>> index 9a76e11d1c184..593c1ddf8819b 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>>>>>>>>>>>>>> @@ -415,19 +415,25 @@ static void
>>>>>>>>>>>>>> amdgpu_vcn_idle_work_handler(struct work_struct *work)
>>>>>>>>>>>>>>              struct amdgpu_vcn_inst *vcn_inst =
>>>>>>>>>>>>>>                      container_of(work, struct
>>>>>>>>>>>>>> amdgpu_vcn_inst, idle_work.work);
>>>>>>>>>>>>>>              struct amdgpu_device *adev = vcn_inst->adev;
>>>>>>>>>>>>>> -     unsigned int fences = 0, 
>>>>>>>>>>>>>> fence[AMDGPU_MAX_VCN_INSTANCES] = {0};
>>>>>>>>>>>>>> -     unsigned int i = vcn_inst->inst, j;
>>>>>>>>>>>>>> +     unsigned int total_fences = 0,
>>>>>>>>>>>>>> fence[AMDGPU_MAX_VCN_INSTANCES] = {0};
>>>>>>>>>>>>>> +     unsigned int i, j;
>>>>>>>>>>>>>>              int r = 0;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -     if (adev->vcn.harvest_config & (1 << i))
>>>>>>>>>>>>>> +     if (adev->vcn.harvest_config & (1 << vcn_inst->inst))
>>>>>>>>>>>>>>                      return;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -     for (j = 0; j < adev->vcn.inst[i].num_enc_rings; ++j)
>>>>>>>>>>>>>> -             fence[i] +=
>>>>>>>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_enc[j]);
>>>>>>>>>>>>>> +     for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>>>>>>>>>>>>>> +             struct amdgpu_vcn_inst *v =
>>>>>>>>>>>>>> + &adev->vcn.inst[i];
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +             for (j = 0; j < v->num_enc_rings; ++j)
>>>>>>>>>>>>>> +                     fence[i] +=
>>>>>>>>>>>>>> amdgpu_fence_count_emitted(&v->ring_enc[j]);
>>>>>>>>>>>>>> +             fence[i] += 
>>>>>>>>>>>>>> amdgpu_fence_count_emitted(&v->ring_dec);
>>>>>>>>>>>>>> +             total_fences += fence[i];
>>>>>>>>>>>>>> +     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>              /* Only set DPG pause for VCN3 or below, VCN4
>>>>>>>>>>>>>> and above will be handled by FW */
>>>>>>>>>>>>>>              if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG &&
>>>>>>>>>>>>>> - !adev->vcn.inst[i].using_unified_queue) {
>>>>>>>>>>>>>> + !vcn_inst->using_unified_queue) {
>>>>>>>>>>>>>>                      struct dpg_pause_state new_state;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                      if (fence[i] || @@ -436,18 +442,18 @@
>>>>>>>>>>>>>> static void amdgpu_vcn_idle_work_handler(struct work_struct
>>>>>>>>>>>>>> *work)
>>>>>>>>>>>>>>                      else
>>>>>>>>>>>>>> new_state.fw_based =
>>>>>>>>>>>>>> VCN_DPG_STATE__UNPAUSE;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - adev->vcn.inst[i].pause_dpg_mode(vcn_inst, &new_state);
>>>>>>>>>>>>>> + vcn_inst->pause_dpg_mode(vcn_inst,
>>>>>>>>>>>>>> + &new_state);
>>>>>>>>>>>>>>              }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -     fence[i] += 
>>>>>>>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_dec);
>>>>>>>>>>>>>> -     fences += fence[i];
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -     if (!fences && 
>>>>>>>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) {
>>>>>>>>>>>>>> +     if (!fence[vcn_inst->inst] &&
>>>>>>>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) {
>>>>>>>>>>>>>> +             /* This is specific to this instance */
>>>>>>>>>>>>>> mutex_lock(&vcn_inst->vcn_pg_lock);
>>>>>>>>>>>>>> vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_GATE);
>>>>>>>>>>>>>> mutex_unlock(&vcn_inst->vcn_pg_lock);
>>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
>>>>>>>>>>>>>> -             if (adev->vcn.workload_profile_active) {
>>>>>>>>>>>>>> +             /* This is global and depends on all VCN 
>>>>>>>>>>>>>> instances */
>>>>>>>>>>>>>> +             if (adev->vcn.workload_profile_active &&
>>>>>>>>>>>>>> !total_fences &&
>>>>>>>>>>>>>> + !atomic_read(&adev->vcn.total_submission_cnt)) {
>>>>>>>>>>>>>>                              r =
>>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev,
>>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, false);
>>>>>>>>>>>>>>                              if (r) @@ -467,16 +473,10 @@
>>>>>>>>>>>>>> void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
>>>>>>>>>>>>>>              int r = 0;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> atomic_inc(&vcn_inst->total_submission_cnt);
>>>>>>>>>>>>>> + atomic_inc(&adev->vcn.total_submission_cnt);
>>>>>>>>>>>>> move this addition down inside the mutex lock
>>>>>>>>>>>>>> cancel_delayed_work_sync(&vcn_inst->idle_work);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -     /* We can safely return early here because we've 
>>>>>>>>>>>>>> cancelled the
>>>>>>>>>>>>>> -      * the delayed work so there is no one else to set 
>>>>>>>>>>>>>> it to false
>>>>>>>>>>>>>> -      * and we don't care if someone else sets it to true.
>>>>>>>>>>>>>> -      */
>>>>>>>>>>>>>> -     if (adev->vcn.workload_profile_active)
>>>>>>>>>>>>>> -             goto pg_lock;
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
>>>>>>>>>>>>> move to here:
>>>>>>>>>>>>> atomic_inc(&adev->vcn.total_submission_cnt);
>>>>>>>>>>>>> I think this should work for multiple instances.
>>>>>>>>>>>> Why does this need to be protected by the mutex?
>>>>>>>>>>> hmm.. OK - no need and it is actually better before the mutex.
>>>>>>>>>>> David
>>>>>>>>>>>> Alex
>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>>>              if (!adev->vcn.workload_profile_active) {
>>>>>>>>>>>>>>                      r =
>>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev,
>>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, @@ -487,7 +487,6 @@ void
>>>>>>>>>>>>>> amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
>>>>>>>>>>>>>>              }
>>>>>>>>>>>>>> mutex_unlock(&adev->vcn.workload_profile_mutex);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -pg_lock:
>>>>>>>>>>>>>> mutex_lock(&vcn_inst->vcn_pg_lock);
>>>>>>>>>>>>>> vcn_inst->set_pg_state(vcn_inst,
>>>>>>>>>>>>>> AMD_PG_STATE_UNGATE);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @@ -528,6 +527,7 @@ void amdgpu_vcn_ring_end_use(struct
>>>>>>>>>>>>>> amdgpu_ring
>>>>>>>>>>>>>> *ring)
>>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].dpg_enc_submissi
>>>>>>>>>>>>>> o
>>>>>>>>>>>>>> n
>>>>>>>>>>>>>> _cnt);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].total_submission
>>>>>>>>>>>>>> _
>>>>>>>>>>>>>> c
>>>>>>>>>>>>>> nt);
>>>>>>>>>>>>>> + atomic_dec(&ring->adev->vcn.total_submission_cnt);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> schedule_delayed_work(&ring->adev->vcn.inst[ring->me].idle_work, 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> VCN_IDLE_TIMEOUT); diff
>>>>>>>>>>>>>> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>>>>>>>> index b3fb1d0e43fc9..febc3ce8641ff 100644
>>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>>>>>>>>>>>>>> @@ -352,6 +352,7 @@ struct amdgpu_vcn {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>              uint16_t inst_mask;
>>>>>>>>>>>>>>              uint8_t num_inst_per_aid;
>>>>>>>>>>>>>> +     atomic_t total_submission_cnt;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>              /* IP reg dump */
>>>>>>>>>>>>>>              uint32_t *ip_dump;