[PATCH] drm/amdgpu/vcn: fix video profile race condition (v3)

Thu Aug 14 18:42:20 UTC 2025

hmm.. it is my concern for the same instance. but I got it now. Your 
patch is good.
Thanks,
David
On 2025-08-14 14:06, Lazar, Lijo wrote:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
>
> I didn't fully understand the question.
>
> For the same instance, begin_thread will set the power state only 
> after cancelling any idle worker for the instance. If idle worker is 
> already under progress, then it needs to complete as that's a 
> cancel_sync (it's the existing logic).
>
> Basically, by the time begin_thread sets the PG state, no idle worker 
> for the same vcn instance would be active. If it's about context 
> switch to another vcn instance's begin_thread, I think that won't be a 
> problem.
>
> Thanks,
> Lijo
> ------------------------------------------------------------------------
> *From:* Wu, David <David.Wu3 at amd.com>
> *Sent:* Thursday, August 14, 2025 11:14:26 PM
> *To:* Lazar, Lijo <Lijo.Lazar at amd.com>; Sundararaju, Sathishkumar 
> <Sathishkumar.Sundararaju at amd.com>; Alex Deucher <alexdeucher at gmail.com>
> *Cc:* Wu, David <David.Wu3 at amd.com>; Deucher, Alexander 
> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org 
> <amd-gfx at lists.freedesktop.org>
> *Subject:* Re: [PATCH] drm/amdgpu/vcn: fix video profile race 
> condition (v3)
> amdgpu_vcn_idle_work_handler():
>      if (!fences && !atomic_read(&vcn_inst->total_submission_cnt)) {
> ----------- could it be possible a context switch here to
> amdgpu_vcn_ring_begin_use()?
>   if it could then AMD_PG_STATE_GATE will be set by mistake.
>
> David
>
> On 2025-08-14 08:54, Lazar, Lijo wrote:
> > [Public]
> >
> > The request profile can be moved outside the pg_lock in begin_use as 
> in the attached patch. It needs set power state -> set profile order.
> >
> > This is the premise -
> >
> > Let's say there are two threads, begin_use thread and idle_work 
> threads. begin_use and idle_work will need the workprofile mutex to 
> request a profile.
> >
> > Case 1) Idle thread gets the lock first.
> >          a) Idle thread sees vinst power state as PG_UNGATE, no harm 
> done. It exits without requesting power profile change. begin_use 
> thread gets the lock next, it sees profile as active and continues.
> >          b) Idle thread sees vinst power state as PG_GATE, it will 
> make workprofile_active to false and exit. Now when begin_use thread 
> gets the mutex next, it's guaranteed to see the workprofile_active as 
> false, hence it will request the profile.
> >
> > Case 2) begin_use thread gets the lock first.
> >          a) Workload profile is active, hence it doesn't do anything 
> and exits. The change made by begin_use thread to vinst power state 
> (state = on) will now be visible to idle thread which gets the lock 
> next. It will do nothing and exit.
> >          b) Workload profile is inactive, hence it requests a 
> profile change. Again, the change made by begin_use thread to vinst 
> power state will now be visible to idle thread which gets the lock 
> next. It will do nothing and exit.
> >
> > Thanks,
> > Lijo
> >
> > -----Original Message-----
> > From: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>
> > Sent: Thursday, August 14, 2025 6:18 PM
> > To: Lazar, Lijo <Lijo.Lazar at amd.com>; Alex Deucher 
> <alexdeucher at gmail.com>
> > Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander 
> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> > Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race 
> condition (v3)
> >
> >
> > On 8/14/2025 5:33 PM, Lazar, Lijo wrote:
> >> [Public]
> >>
> >> There is no need for nested lock. It only needs to follow the order
> >>           set instance power_state
> >>           set profile (this takes a global lock and hence instance 
> power state will be visible to whichever thread that gets the work 
> profile lock).
> >>
> >> You are seeing nested lock just because I added the code just after 
> power state setting.
> > Pasting your code from the file for ref :
> >
> > @@ -464,32 +509,14 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring
> > *ring)
> >
> > -pg_lock:
> >
> >        mutex_lock(&vcn_inst->vcn_pg_lock);
> >        vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_UNGATE);
> >
> > +   amdgpu_vcn_get_profile(adev);
> >
> > vcn_pg_lock isn't  released here yet right ? And in-case you intend 
> to only order the locks, then still there is an un-necessary OFF 
> followed by ON, but yes that is acceptable,
> >
> > May be you want to move that vcn_pg_lock after 
> amdgpu_vcn_get_profile to protect concurrent dpg_state access in 
> begin_use.
> >
> > The concern is, this patch access power_state that is protected by 
> some other mutex lock hoping it reflects right values also when 
> holding powerprofile_lock.
> >
> > Or
> >
> > Have shared a patch with global workload_profile_mutex that 
> simplifies this handling, and renamed pg_lock -> dpg_lock  and used
> >
> > that only for dpg_state changes per instance.
> >
> > Regards,
> >
> > Sathish
> >
> >> Thanks,
> >> Lijo
> >>
> >> -----Original Message-----
> >> From: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>
> >> Sent: Thursday, August 14, 2025 5:23 PM
> >> To: Lazar, Lijo <Lijo.Lazar at amd.com>; Alex Deucher
> >> <alexdeucher at gmail.com>
> >> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
> >> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> >> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition
> >> (v3)
> >>
> >>
> >> On 8/14/2025 3:16 PM, Lazar, Lijo wrote:
> >>> [Public]
> >>>
> >>> I see your point now. Attached should work, I guess. Is the 
> concern more about having to take the lock for every begin?
> >> This is closer,  but the thing is, IMO we shouldn't have to use 2 
> locks and go into nested locking, we can do with just one global lock.
> >>
> >> Power_state of each instance, and global workload_profile_active are
> >> inter-related, they need to be guarded together,
> >>
> >> nested could work , but why nested if single lock is enough ? 
> nested complicates it.
> >>
> >> Regards,
> >>
> >> Sathish
> >>
> >>> Thanks,
> >>> Lijo
> >>>
> >>> -----Original Message-----
> >>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> >>> Lazar, Lijo
> >>> Sent: Thursday, August 14, 2025 2:55 PM
> >>> To: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>;
> >>> Alex Deucher <alexdeucher at gmail.com>
> >>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
> >>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> >>> Subject: RE: [PATCH] drm/amdgpu/vcn: fix video profile race condition
> >>> (v3)
> >>>
> >>> [Public]
> >>>
> >>> That is not required I think. The power profile is set by an 
> instance *after* setting itself to power on. Also, it's switched back 
> after changing its power state to off.  If idle worker is run by 
> another instance, it won't be seeing the inst0 as power gated and 
> won't change power profile.
> >>>
> >>> Thanks,
> >>> Lijo
> >>> -----Original Message-----
> >>> From: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>
> >>> Sent: Thursday, August 14, 2025 2:41 PM
> >>> To: Lazar, Lijo <Lijo.Lazar at amd.com>; Alex Deucher
> >>> <alexdeucher at gmail.com>
> >>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
> >>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> >>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition
> >>> (v3)
> >>>
> >>> Hi Lijo,
> >>>
> >>> On 8/14/2025 2:11 PM, Lazar, Lijo wrote:
> >>>> [Public]
> >>>>
> >>>> We already have a per instance power state that can be tracked. 
> What about something like attached?
> >>> This also has concurrent access of the power state ,
> >>> vcn.inst[i].cur_state is not protected by workload_profile_mutex
> >>>
> >>> every where, it can still change while you are holding 
> workload_profile_mutex and checking it.
> >>>
> >>> Regards,
> >>>
> >>> Sathish
> >>>
> >>>> Thanks,
> >>>> Lijo
> >>>> -----Original Message-----
> >>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> >>>> Sundararaju, Sathishkumar
> >>>> Sent: Thursday, August 14, 2025 4:43 AM
> >>>> To: Alex Deucher <alexdeucher at gmail.com>
> >>>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
> >>>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> >>>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race
> >>>> condition
> >>>> (v3)
> >>>>
> >>>>
> >>>> On 8/14/2025 3:38 AM, Alex Deucher wrote:
> >>>>> On Wed, Aug 13, 2025 at 5:1 PM Sundararaju, Sathishkumar
> >>>>> <sathishkumar.sundararaju at amd.com> wrote:
> >>>>>> On 8/14/2025 2:33 AM, Alex Deucher wrote:
> >>>>>>> On Wed, Aug 13, 2025 at 4:58 PM Sundararaju, Sathishkumar
> >>>>>>> <sathishkumar.sundararaju at amd.com> wrote:
> >>>>>>>> On 8/14/2025 1:35 AM, Alex Deucher wrote:
> >>>>>>>>> On Wed, Aug 13, 2025 at 2:23 PM Sundararaju, Sathishkumar
> >>>>>>>>> <sathishkumar.sundararaju at amd.com> wrote:
> >>>>>>>>>> Hi Alex, Hi David,
> >>>>>>>>>>
> >>>>>>>>>> I see David's concern but his suggestion yet wont solve the
> >>>>>>>>>> problem, neither the current form , reason :-
> >>>>>>>>>>
> >>>>>>>>>> The emitted fence count and total submission count are fast
> >>>>>>>>>> transients which frequently become 0 in between video decodes
> >>>>>>>>>> (between jobs) even with the atomics and locks there can be a
> >>>>>>>>>> switch of video power profile, in the current form of patch
> >>>>>>>>>> that window is minimized, but still can happen if stress
> >>>>>>>>>> tested. But power state of any instance becoming zero
> >>>>>>>>> Can you explain how this can happen?  I'm not seeing it.
> >>>>>>>> Consider this situation, inst0 and inst1 actively decoding,
> >>>>>>>> inst0 decode completes, delayed idle work starts.
> >>>>>>>> inst0 idle handler can read 0 total fences and 0 total
> >>>>>>>> submission count, even if inst1 is actively decoding, that's 
> between the jobs,
> >>>>>>>>         - as begin_use increaments vcn.total_submission_cnt and
> >>>>>>>> end_use decreaments vcn.total_submission_cnt that can be 0.
> >>>>>>>>         - if outstanding fences are cleared and no new emitted
> >>>>>>>> fence, between jobs , can be 0.
> >>>>>>>>         - both of the above conditions do not mean video decode
> >>>>>>>> is complete on inst1, it is actively decoding.
> >>>>>>> How can there be active decoding without an outstanding fence?
> >>>>>>> In that case, total_fences (fences from both instances) would 
> be non-0.
> >>>>>> I mean on inst1 the job scheduled is already complete, so 0
> >>>>>> outstanding fences, newer job is yet to be scheduled
> >>>>>>
> >>>>>> and commited to ring (inst1) , this doesn't mean decode has
> >>>>>> stopped on
> >>>>>> inst1 right (I am saying if timing of inst0 idle work coincides
> >>>>>> with this),
> >>>>>>
> >>>>>> Or am I wrong in assuming this ? Can't this ever happen ? Please
> >>>>>> correct my understanding here.
> >>>>> The flow looks like:
> >>>>>
> >>>>> begin_use(inst)
> >>>>> emit_fence(inst)
> >>>>> end_use(inst)
> >>>>>
> >>>>> ...later
> >>>>> fence signals
> >>>>> ...later
> >>>>> work handler
> >>>>>
> >>>>> In begin_use we increment the global and per instance submission.
> >>>>> This protects the power gating and profile until end_use.  In end
> >>>>> use we decrement the submissions because we don't need to protect
> >>>>> anything any more as we have the fence that was submitted via the
> >>>>> ring.  That fence won't signal until the job is complete.
> >>>> Is a next begin_use always guaranteed to be run before current 
> job fence signals ?
> >>>>
> >>>> if not then both total submission and total fence are zero , example
> >>>> delayed job/packet submissions
> >>>>
> >>>> from user space, or next job schedule happens after current job 
> fence signals.
> >>>>
> >>>> if this is never possible then (v3) is perfect.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Sathish
> >>>>
> >>>>> For power gating, we
> >>>>> only care about the submission count and fences for that instance,
> >>>>> for the profile, we care about submission count and fences all 
> instances.
> >>>>> Once the fences have signalled, the outstanding fences will be 0
> >>>>> and there won't be any active work.
> >>>>>
> >>>>> Alex
> >>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Sathish
> >>>>>>
> >>>>>>> Alex
> >>>>>>>
> >>>>>>>> Whereas if instances are powered off we are sure idle time is
> >>>>>>>> past and it is powered off, no possible way of active video
> >>>>>>>> decode, when all instances are off we can safely assume no
> >>>>>>>> active decode and global lock protects it against new 
> begin_use on any instance.
> >>>>>>>> But the only distant concern is global common locks w.r.t perf,
> >>>>>>>> but we are already having a global workprofile mutex , so there
> >>>>>>>> shouldn't be any drop in perf, with just one single global lock
> >>>>>>>> for all instances.
> >>>>>>>>
> >>>>>>>> Just sending out a patch with this fix, will leave it to you to
> >>>>>>>> decide the right method. If you think outstanding total fences
> >>>>>>>> can never be 0 during decode, then your previous version (v3)
> >>>>>>>> itself is good, there is no real benefit of splitting the 
> handlers as such.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Sathish
> >>>>>>>>> If it is possible, maybe it would be easier to just split the
> >>>>>>>>> profile and powergating into separate handlers.  The profile
> >>>>>>>>> one would be global and the powergating one would be per 
> instance.
> >>>>>>>>> See the attached patches.
> >>>>>>>>>
> >>>>>>>>> Alex
> >>>>>>>>>
> >>>>>>>>>> can be a sure shot indication of break in a video decode, the
> >>>>>>>>>> mistake in my patch was using per instance mutex, I should
> >>>>>>>>>> have used a common global mutex, then that covers the 
> situation David is trying to bring out.
> >>>>>>>>>>
> >>>>>>>>>> Using one global vcn.pg_lock for idle and begin_use and using
> >>>>>>>>>> flags to track power state could help us totally avoid this 
> situation.
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>>
> >>>>>>>>>> Sathish
> >>>>>>>>>>
> >>>>>>>>>> On 8/13/2025 11:46 PM, Wu, David wrote:
> >>>>>>>>>>> On 8/13/2025 12:51 PM, Alex Deucher wrote:
> >>>>>>>>>>>> On Wed, Aug 13, 2025 at 12:39 PM Wu, David 
> <davidwu2 at amd.com> wrote:
> >>>>>>>>>>>>> Hi Alex,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The addition of  total_submission_cnt should work - in that
> >>>>>>>>>>>>> it is unlikely to have a context switch right after the 
> begin_use().
> >>>>>>>>>>>>> The suggestion of moving it inside the lock (which I prefer
> >>>>>>>>>>>>> in case someone adds more before the lock and not reviewed
> >>>>>>>>>>>>> thoroughly)
> >>>>>>>>>>>>>           - up to you to decide.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Reviewed-by: David (Ming Qiang) Wu <David.Wu3 at amd.com>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> David
> >>>>>>>>>>>>> On 8/13/2025 9:45 AM, Alex Deucher wrote:
> >>>>>>>>>>>>>> If there are multiple instances of the VCN running, we may
> >>>>>>>>>>>>>> end up switching the video profile while another instance
> >>>>>>>>>>>>>> is active because we only take into account the current
> >>>>>>>>>>>>>> instance's submissions.  Look at all outstanding fences
> >>>>>>>>>>>>>> for the video profile.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> v2: drop early exit in begin_use()
> >>>>>>>>>>>>>> v3: handle possible race between begin_use() work handler
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Fixes: 3b669df92c85 ("drm/amdgpu/vcn: adjust workload
> >>>>>>>>>>>>>> profile
> >>>>>>>>>>>>>> handling")
> >>>>>>>>>>>>>> Reviewed-by: Sathishkumar S
> >>>>>>>>>>>>>> <sathishkumar.sundararaju at amd.com> (v1)
> >>>>>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> >>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>>          drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 40
> >>>>>>>>>>>>>> ++++++++++++-------------
> >>>>>>>>>>>>>>          drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h |  1 +
> >>>>>>>>>>>>>>           2 files changed, 21 insertions(+), 20
> >>>>>>>>>>>>>> deletions(-)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> >>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> >>>>>>>>>>>>>> index 9a76e11d1c184..593c1ddf8819b 100644
> >>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> >>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> >>>>>>>>>>>>>> @@ -415,19 +415,25 @@ static void
> >>>>>>>>>>>>>> amdgpu_vcn_idle_work_handler(struct work_struct *work)
> >>>>>>>>>>>>>>              struct amdgpu_vcn_inst *vcn_inst =
> >>>>>>>>>>>>>>                      container_of(work, struct
> >>>>>>>>>>>>>> amdgpu_vcn_inst, idle_work.work);
> >>>>>>>>>>>>>>              struct amdgpu_device *adev = vcn_inst->adev;
> >>>>>>>>>>>>>> -     unsigned int fences = 0, 
> fence[AMDGPU_MAX_VCN_INSTANCES] = {0};
> >>>>>>>>>>>>>> -     unsigned int i = vcn_inst->inst, j;
> >>>>>>>>>>>>>> +     unsigned int total_fences = 0,
> >>>>>>>>>>>>>> fence[AMDGPU_MAX_VCN_INSTANCES] = {0};
> >>>>>>>>>>>>>> +     unsigned int i, j;
> >>>>>>>>>>>>>>              int r = 0;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -     if (adev->vcn.harvest_config & (1 << i))
> >>>>>>>>>>>>>> +     if (adev->vcn.harvest_config & (1 <<
> >>>>>>>>>>>>>> + vcn_inst->inst))
> >>>>>>>>>>>>>>                      return;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -     for (j = 0; j < adev->vcn.inst[i].num_enc_rings; ++j)
> >>>>>>>>>>>>>> -             fence[i] +=
> >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_enc[j]);
> >>>>>>>>>>>>>> +     for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
> >>>>>>>>>>>>>> +             struct amdgpu_vcn_inst *v =
> >>>>>>>>>>>>>> + &adev->vcn.inst[i];
> >>>>>>>>>>>>>> +
> >>>>>>>>>>>>>> +             for (j = 0; j < v->num_enc_rings; ++j)
> >>>>>>>>>>>>>> +                     fence[i] +=
> >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&v->ring_enc[j]);
> >>>>>>>>>>>>>> +             fence[i] += 
> amdgpu_fence_count_emitted(&v->ring_dec);
> >>>>>>>>>>>>>> +             total_fences += fence[i];
> >>>>>>>>>>>>>> +     }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>              /* Only set DPG pause for VCN3 or below, VCN4
> >>>>>>>>>>>>>> and above will be handled by FW */
> >>>>>>>>>>>>>>              if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG &&
> >>>>>>>>>>>>>> -         !adev->vcn.inst[i].using_unified_queue) {
> >>>>>>>>>>>>>> +         !vcn_inst->using_unified_queue) {
> >>>>>>>>>>>>>>                      struct dpg_pause_state new_state;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                      if (fence[i] || @@ -436,18 +442,18 @@
> >>>>>>>>>>>>>> static void amdgpu_vcn_idle_work_handler(struct
> >>>>>>>>>>>>>> work_struct
> >>>>>>>>>>>>>> *work)
> >>>>>>>>>>>>>>                      else
> >>>>>>>>>>>>>>                              new_state.fw_based =
> >>>>>>>>>>>>>> VCN_DPG_STATE__UNPAUSE;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - adev->vcn.inst[i].pause_dpg_mode(vcn_inst, &new_state);
> >>>>>>>>>>>>>> +             vcn_inst->pause_dpg_mode(vcn_inst,
> >>>>>>>>>>>>>> + &new_state);
> >>>>>>>>>>>>>>               }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -     fence[i] += 
> amdgpu_fence_count_emitted(&vcn_inst->ring_dec);
> >>>>>>>>>>>>>> -     fences += fence[i];
> >>>>>>>>>>>>>> -
> >>>>>>>>>>>>>> -     if (!fences && 
> !atomic_read(&vcn_inst->total_submission_cnt)) {
> >>>>>>>>>>>>>> +     if (!fence[vcn_inst->inst] &&
> >>>>>>>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) {
> >>>>>>>>>>>>>> +             /* This is specific to this instance */
> >>>>>>>>>>>>>>                      mutex_lock(&vcn_inst->vcn_pg_lock);
> >>>>>>>>>>>>>>                      vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_GATE);
> >>>>>>>>>>>>>>                      mutex_unlock(&vcn_inst->vcn_pg_lock);
> >>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
> >>>>>>>>>>>>>> -             if (adev->vcn.workload_profile_active) {
> >>>>>>>>>>>>>> +             /* This is global and depends on all VCN 
> instances */
> >>>>>>>>>>>>>> +             if (adev->vcn.workload_profile_active &&
> >>>>>>>>>>>>>> !total_fences &&
> >>>>>>>>>>>>>> + !atomic_read(&adev->vcn.total_submission_cnt)) {
> >>>>>>>>>>>>>>                              r =
> >>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev,
> >>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, false);
> >>>>>>>>>>>>>>                              if (r) @@ -467,16 +473,10 @@
> >>>>>>>>>>>>>> void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
> >>>>>>>>>>>>>>              int r = 0;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>              atomic_inc(&vcn_inst->total_submission_cnt);
> >>>>>>>>>>>>>> + atomic_inc(&adev->vcn.total_submission_cnt);
> >>>>>>>>>>>>> move this addition down inside the mutex lock
> >>>>>>>>>>>>>> cancel_delayed_work_sync(&vcn_inst->idle_work);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -     /* We can safely return early here because we've 
> cancelled the
> >>>>>>>>>>>>>> -      * the delayed work so there is no one else to 
> set it to false
> >>>>>>>>>>>>>> -      * and we don't care if someone else sets it to true.
> >>>>>>>>>>>>>> -      */
> >>>>>>>>>>>>>> -     if (adev->vcn.workload_profile_active)
> >>>>>>>>>>>>>> -             goto pg_lock;
> >>>>>>>>>>>>>> -
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
> >>>>>>>>>>>>> move to here:
> >>>>>>>>>>>>> atomic_inc(&adev->vcn.total_submission_cnt);
> >>>>>>>>>>>>> I think this should work for multiple instances.
> >>>>>>>>>>>> Why does this need to be protected by the mutex?
> >>>>>>>>>>> hmm.. OK - no need and it is actually better before the mutex.
> >>>>>>>>>>> David
> >>>>>>>>>>>> Alex
> >>>>>>>>>>>>
> >>>>>>>>>>>>> David
> >>>>>>>>>>>>>>              if (!adev->vcn.workload_profile_active) {
> >>>>>>>>>>>>>>                      r =
> >>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev,
> >>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, @@ -487,7 +487,6 @@ void
> >>>>>>>>>>>>>> amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
> >>>>>>>>>>>>>>               }
> >>>>>>>>>>>>>> mutex_unlock(&adev->vcn.workload_profile_mutex);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -pg_lock:
> >>>>>>>>>>>>>>              mutex_lock(&vcn_inst->vcn_pg_lock);
> >>>>>>>>>>>>>>              vcn_inst->set_pg_state(vcn_inst,
> >>>>>>>>>>>>>> AMD_PG_STATE_UNGATE);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> @@ -528,6 +527,7 @@ void amdgpu_vcn_ring_end_use(struct
> >>>>>>>>>>>>>> amdgpu_ring
> >>>>>>>>>>>>>> *ring)
> >>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].dpg_enc_submiss
> >>>>>>>>>>>>>> i
> >>>>>>>>>>>>>> o
> >>>>>>>>>>>>>> n
> >>>>>>>>>>>>>> _cnt);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].total_submissio
> >>>>>>>>>>>>>> n
> >>>>>>>>>>>>>> _
> >>>>>>>>>>>>>> c
> >>>>>>>>>>>>>> nt);
> >>>>>>>>>>>>>> + atomic_dec(&ring->adev->vcn.total_submission_cnt);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 
> schedule_delayed_work(&ring->adev->vcn.inst[ring->me].idle_work,
> >>>>>>>>>>>>>>                                    VCN_IDLE_TIMEOUT); diff
> >>>>>>>>>>>>>> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
> >>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
> >>>>>>>>>>>>>> index b3fb1d0e43fc9..febc3ce8641ff 100644
> >>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
> >>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
> >>>>>>>>>>>>>> @@ -352,6 +352,7 @@ struct amdgpu_vcn {
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>              uint16_t inst_mask;
> >>>>>>>>>>>>>>              uint8_t num_inst_per_aid;
> >>>>>>>>>>>>>> +     atomic_t                total_submission_cnt;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>              /* IP reg dump */
> >>>>>>>>>>>>>>              uint32_t                *ip_dump;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250814/2b35bc51/attachment-0001.htm>