[PATCH] drm/amdgpu/vcn: fix video profile race condition (v3)

David Wu davidwu2 at amd.com
Mon Aug 18 19:22:58 UTC 2025


Hi Lijo and Alex,

I prefer Lijo's method as it should not cause unexpected power profile 
change for the active session.
Liji, could you make some adjustments for your patch such as removing 
extra blank line and function
declaration such as using "void" instead of "int" for the new functions 
(should they be static?)

Thanks,

David

On 2025-08-14 14:42, David Wu wrote:
> hmm.. it is my concern for the same instance. but I got it now. Your 
> patch is good.
> Thanks,
> David
> On 2025-08-14 14:06, Lazar, Lijo wrote:
>>
>> [AMD Official Use Only - AMD Internal Distribution Only]
>>
>>
>> I didn't fully understand the question.
>>
>> For the same instance, begin_thread will set the power state only 
>> after cancelling any idle worker for the instance. If idle worker is 
>> already under progress, then it needs to complete as that's a 
>> cancel_sync (it's the existing logic).
>>
>> Basically, by the time begin_thread sets the PG state, no idle worker 
>> for the same vcn instance would be active. If it's about context 
>> switch to another vcn instance's begin_thread, I think that won't be 
>> a problem.
>>
>> Thanks,
>> Lijo
>> ------------------------------------------------------------------------
>> *From:* Wu, David <David.Wu3 at amd.com>
>> *Sent:* Thursday, August 14, 2025 11:14:26 PM
>> *To:* Lazar, Lijo <Lijo.Lazar at amd.com>; Sundararaju, Sathishkumar 
>> <Sathishkumar.Sundararaju at amd.com>; Alex Deucher <alexdeucher at gmail.com>
>> *Cc:* Wu, David <David.Wu3 at amd.com>; Deucher, Alexander 
>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org 
>> <amd-gfx at lists.freedesktop.org>
>> *Subject:* Re: [PATCH] drm/amdgpu/vcn: fix video profile race 
>> condition (v3)
>> amdgpu_vcn_idle_work_handler():
>>      if (!fences && !atomic_read(&vcn_inst->total_submission_cnt)) {
>> ----------- could it be possible a context switch here to
>> amdgpu_vcn_ring_begin_use()?
>>   if it could then AMD_PG_STATE_GATE will be set by mistake.
>>
>> David
>>
>> On 2025-08-14 08:54, Lazar, Lijo wrote:
>> > [Public]
>> >
>> > The request profile can be moved outside the pg_lock in begin_use 
>> as in the attached patch. It needs  set power state -> set profile order.
>> >
>> > This is the premise -
>> >
>> > Let's say there are two threads, begin_use thread and idle_work 
>> threads. begin_use and idle_work will need the workprofile mutex to 
>> request a profile.
>> >
>> > Case 1) Idle thread gets the lock first.
>> >          a) Idle thread sees vinst power state as PG_UNGATE, no 
>> harm done. It exits without requesting power profile change. 
>> begin_use thread gets the lock next, it sees profile as active and 
>> continues.
>> >          b) Idle thread sees vinst power state as PG_GATE, it will 
>> make workprofile_active to false and exit. Now when begin_use thread 
>> gets the mutex next, it's guaranteed to see the workprofile_active as 
>> false, hence it will request the profile.
>> >
>> > Case 2) begin_use thread gets the lock first.
>> >          a) Workload profile is active, hence it doesn't do 
>> anything and exits. The change made by begin_use thread to vinst 
>> power state (state = on) will now be visible to idle thread which 
>> gets the lock next. It will do nothing and exit.
>> >          b) Workload profile is inactive, hence it requests a 
>> profile change. Again, the change made by begin_use thread to vinst 
>> power state will now be visible to idle thread which gets the lock 
>> next. It will do nothing and exit.
>> >
>> > Thanks,
>> > Lijo
>> >
>> > -----Original Message-----
>> > From: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>
>> > Sent: Thursday, August 14, 2025 6:18 PM
>> > To: Lazar, Lijo <Lijo.Lazar at amd.com>; Alex Deucher 
>> <alexdeucher at gmail.com>
>> > Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander 
>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
>> > Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race 
>> condition (v3)
>> >
>> >
>> > On 8/14/2025 5:33 PM, Lazar, Lijo wrote:
>> >> [Public]
>> >>
>> >> There is no need for nested lock. It only needs to follow the order
>> >>           set instance power_state
>> >>           set profile (this takes a global lock and hence instance 
>> power state will be visible to whichever thread that gets the work 
>> profile lock).
>> >>
>> >> You are seeing nested lock just because I added the code just 
>> after power state setting.
>> > Pasting your code from the file for ref :
>> >
>> > @@ -464,32 +509,14 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring
>> > *ring)
>> >
>> > -pg_lock:
>> >
>> >        mutex_lock(&vcn_inst->vcn_pg_lock);
>> >        vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_UNGATE);
>> >
>> > +   amdgpu_vcn_get_profile(adev);
>> >
>> > vcn_pg_lock isn't  released here yet right ? And in-case you intend 
>> to only order the locks, then still there is an un-necessary OFF 
>> followed by ON, but yes that is acceptable,
>> >
>> > May be you want to move that vcn_pg_lock after 
>> amdgpu_vcn_get_profile to protect concurrent dpg_state access in 
>> begin_use.
>> >
>> > The concern is, this patch access power_state that is protected by 
>> some other mutex lock hoping it reflects right values also when 
>> holding powerprofile_lock.
>> >
>> > Or
>> >
>> > Have shared a patch with global workload_profile_mutex that 
>> simplifies this handling, and renamed pg_lock -> dpg_lock  and used
>> >
>> > that only for dpg_state changes per instance.
>> >
>> > Regards,
>> >
>> > Sathish
>> >
>> >> Thanks,
>> >> Lijo
>> >>
>> >> -----Original Message-----
>> >> From: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>
>> >> Sent: Thursday, August 14, 2025 5:23 PM
>> >> To: Lazar, Lijo <Lijo.Lazar at amd.com>; Alex Deucher
>> >> <alexdeucher at gmail.com>
>> >> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
>> >> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
>> >> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition
>> >> (v3)
>> >>
>> >>
>> >> On 8/14/2025 3:16 PM, Lazar, Lijo wrote:
>> >>> [Public]
>> >>>
>> >>> I see your point now. Attached should work, I guess. Is the 
>> concern more about having to take the lock for every begin?
>> >> This is closer,  but the thing is, IMO we shouldn't have to use 2 
>> locks and go into nested locking, we can do with just one global lock.
>> >>
>> >> Power_state of each instance, and global workload_profile_active are
>> >> inter-related, they need to be guarded together,
>> >>
>> >> nested could work , but why nested if single lock is enough ? 
>> nested complicates it.
>> >>
>> >> Regards,
>> >>
>> >> Sathish
>> >>
>> >>> Thanks,
>> >>> Lijo
>> >>>
>> >>> -----Original Message-----
>> >>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>> >>> Lazar, Lijo
>> >>> Sent: Thursday, August 14, 2025 2:55 PM
>> >>> To: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>;
>> >>> Alex Deucher <alexdeucher at gmail.com>
>> >>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
>> >>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
>> >>> Subject: RE: [PATCH] drm/amdgpu/vcn: fix video profile race condition
>> >>> (v3)
>> >>>
>> >>> [Public]
>> >>>
>> >>> That is not required I think. The power profile is set by an 
>> instance *after* setting itself to power on. Also, it's switched back 
>> after changing its power state to off.  If idle worker is run by 
>> another instance, it won't be seeing the inst0 as power gated and 
>> won't change power profile.
>> >>>
>> >>> Thanks,
>> >>> Lijo
>> >>> -----Original Message-----
>> >>> From: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>
>> >>> Sent: Thursday, August 14, 2025 2:41 PM
>> >>> To: Lazar, Lijo <Lijo.Lazar at amd.com>; Alex Deucher
>> >>> <alexdeucher at gmail.com>
>> >>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
>> >>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
>> >>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition
>> >>> (v3)
>> >>>
>> >>> Hi Lijo,
>> >>>
>> >>> On 8/14/2025 2:11 PM, Lazar, Lijo wrote:
>> >>>> [Public]
>> >>>>
>> >>>> We already have a per instance power state that can be tracked. 
>> What about something like attached?
>> >>> This also has concurrent access of the power state ,
>> >>> vcn.inst[i].cur_state is not protected by workload_profile_mutex
>> >>>
>> >>> every where, it can still change while you are holding 
>> workload_profile_mutex and checking it.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Sathish
>> >>>
>> >>>> Thanks,
>> >>>> Lijo
>> >>>> -----Original Message-----
>> >>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>> >>>> Sundararaju, Sathishkumar
>> >>>> Sent: Thursday, August 14, 2025 4:43 AM
>> >>>> To: Alex Deucher <alexdeucher at gmail.com>
>> >>>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
>> >>>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
>> >>>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race
>> >>>> condition
>> >>>> (v3)
>> >>>>
>> >>>>
>> >>>> On 8/14/2025 3:38 AM, Alex Deucher wrote:
>> >>>>> On Wed, Aug 13, 2025 at 5:1 PM Sundararaju, Sathishkumar
>> >>>>> <sathishkumar.sundararaju at amd.com> wrote:
>> >>>>>> On 8/14/2025 2:33 AM, Alex Deucher wrote:
>> >>>>>>> On Wed, Aug 13, 2025 at 4:58 PM Sundararaju, Sathishkumar
>> >>>>>>> <sathishkumar.sundararaju at amd.com> wrote:
>> >>>>>>>> On 8/14/2025 1:35 AM, Alex Deucher wrote:
>> >>>>>>>>> On Wed, Aug 13, 2025 at 2:23 PM Sundararaju, Sathishkumar
>> >>>>>>>>> <sathishkumar.sundararaju at amd.com> wrote:
>> >>>>>>>>>> Hi Alex, Hi David,
>> >>>>>>>>>>
>> >>>>>>>>>> I see David's concern but his suggestion yet wont solve the
>> >>>>>>>>>> problem, neither the current form , reason :-
>> >>>>>>>>>>
>> >>>>>>>>>> The emitted fence count and total submission count are fast
>> >>>>>>>>>> transients which frequently become 0 in between video decodes
>> >>>>>>>>>> (between jobs) even with the atomics and locks there can be a
>> >>>>>>>>>> switch of video power profile, in the current form of patch
>> >>>>>>>>>> that window is minimized, but still can happen if stress
>> >>>>>>>>>> tested. But power state of any instance becoming zero
>> >>>>>>>>> Can you explain how this can happen?  I'm not seeing it.
>> >>>>>>>> Consider this situation, inst0 and inst1 actively decoding,
>> >>>>>>>> inst0 decode completes, delayed idle work starts.
>> >>>>>>>> inst0 idle handler can read 0 total fences and 0 total
>> >>>>>>>> submission count, even if inst1 is actively decoding, that's 
>> between the jobs,
>> >>>>>>>>         - as begin_use increaments vcn.total_submission_cnt and
>> >>>>>>>> end_use decreaments vcn.total_submission_cnt that can be 0.
>> >>>>>>>>         - if outstanding fences are cleared and no new emitted
>> >>>>>>>> fence, between jobs , can be 0.
>> >>>>>>>>         - both of the above conditions do not mean video decode
>> >>>>>>>> is complete on inst1, it is actively decoding.
>> >>>>>>> How can there be active decoding without an outstanding fence?
>> >>>>>>> In that case, total_fences (fences from both instances) would 
>> be non-0.
>> >>>>>> I mean on inst1 the job scheduled is already complete, so 0
>> >>>>>> outstanding fences, newer job is yet to be scheduled
>> >>>>>>
>> >>>>>> and commited to ring (inst1) , this doesn't mean decode has
>> >>>>>> stopped on
>> >>>>>> inst1 right (I am saying if timing of inst0 idle work coincides
>> >>>>>> with this),
>> >>>>>>
>> >>>>>> Or am I wrong in assuming this ? Can't this ever happen ? Please
>> >>>>>> correct my understanding here.
>> >>>>> The flow looks like:
>> >>>>>
>> >>>>> begin_use(inst)
>> >>>>> emit_fence(inst)
>> >>>>> end_use(inst)
>> >>>>>
>> >>>>> ...later
>> >>>>> fence signals
>> >>>>> ...later
>> >>>>> work handler
>> >>>>>
>> >>>>> In begin_use we increment the global and per instance submission.
>> >>>>> This protects the power gating and profile until end_use.  In end
>> >>>>> use we decrement the submissions because we don't need to protect
>> >>>>> anything any more as we have the fence that was submitted via the
>> >>>>> ring.  That fence won't signal until the job is complete.
>> >>>> Is a next begin_use always guaranteed to be run before current 
>> job fence signals ?
>> >>>>
>> >>>> if not then both total submission and total fence are zero , example
>> >>>> delayed job/packet submissions
>> >>>>
>> >>>> from user space, or next job schedule happens after current job 
>> fence signals.
>> >>>>
>> >>>> if this is never possible then (v3) is perfect.
>> >>>>
>> >>>> Regards,
>> >>>>
>> >>>> Sathish
>> >>>>
>> >>>>> For power gating, we
>> >>>>> only care about the submission count and fences for that instance,
>> >>>>> for the profile, we care about submission count and fences all 
>> instances.
>> >>>>> Once the fences have signalled, the outstanding fences will be 0
>> >>>>> and there won't be any active work.
>> >>>>>
>> >>>>> Alex
>> >>>>>
>> >>>>>> Regards,
>> >>>>>>
>> >>>>>> Sathish
>> >>>>>>
>> >>>>>>> Alex
>> >>>>>>>
>> >>>>>>>> Whereas if instances are powered off we are sure idle time is
>> >>>>>>>> past and it is powered off, no possible way of active video
>> >>>>>>>> decode, when all instances are off we can safely assume no
>> >>>>>>>> active decode and global lock protects it against new 
>> begin_use on any instance.
>> >>>>>>>> But the only distant concern is global common locks w.r.t perf,
>> >>>>>>>> but we are already having a global workprofile mutex , so there
>> >>>>>>>> shouldn't be any drop in perf, with just one single global lock
>> >>>>>>>> for all instances.
>> >>>>>>>>
>> >>>>>>>> Just sending out a patch with this fix, will leave it to you to
>> >>>>>>>> decide the right method. If you think outstanding total fences
>> >>>>>>>> can never be 0 during decode, then your previous version (v3)
>> >>>>>>>> itself is good, there is no real benefit of splitting the 
>> handlers as such.
>> >>>>>>>>
>> >>>>>>>> Regards,
>> >>>>>>>> Sathish
>> >>>>>>>>> If it is possible, maybe it would be easier to just split the
>> >>>>>>>>> profile and powergating into separate handlers.  The profile
>> >>>>>>>>> one would be global and the powergating one would be per 
>> instance.
>> >>>>>>>>> See the attached patches.
>> >>>>>>>>>
>> >>>>>>>>> Alex
>> >>>>>>>>>
>> >>>>>>>>>> can be a sure shot indication of break in a video decode, the
>> >>>>>>>>>> mistake in my patch was using per instance mutex, I should
>> >>>>>>>>>> have used a common global mutex, then that covers the 
>> situation David is trying to bring out.
>> >>>>>>>>>>
>> >>>>>>>>>> Using one global vcn.pg_lock for idle and begin_use and using
>> >>>>>>>>>> flags to track power state could help us totally avoid 
>> this situation.
>> >>>>>>>>>>
>> >>>>>>>>>> Regards,
>> >>>>>>>>>>
>> >>>>>>>>>> Sathish
>> >>>>>>>>>>
>> >>>>>>>>>> On 8/13/2025 11:46 PM, Wu, David wrote:
>> >>>>>>>>>>> On 8/13/2025 12:51 PM, Alex Deucher wrote:
>> >>>>>>>>>>>> On Wed, Aug 13, 2025 at 12:39 PM Wu, David 
>> <davidwu2 at amd.com> wrote:
>> >>>>>>>>>>>>> Hi Alex,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> The addition of  total_submission_cnt should work - in that
>> >>>>>>>>>>>>> it is unlikely to have a context switch right after the 
>> begin_use().
>> >>>>>>>>>>>>> The suggestion of moving it inside the lock (which I prefer
>> >>>>>>>>>>>>> in case someone adds more before the lock and not reviewed
>> >>>>>>>>>>>>> thoroughly)
>> >>>>>>>>>>>>>           - up to you to decide.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Reviewed-by: David (Ming Qiang) Wu <David.Wu3 at amd.com>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>> David
>> >>>>>>>>>>>>> On 8/13/2025 9:45 AM, Alex Deucher wrote:
>> >>>>>>>>>>>>>> If there are multiple instances of the VCN running, we may
>> >>>>>>>>>>>>>> end up switching the video profile while another instance
>> >>>>>>>>>>>>>> is active because we only take into account the current
>> >>>>>>>>>>>>>> instance's submissions.  Look at all outstanding fences
>> >>>>>>>>>>>>>> for the video profile.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> v2: drop early exit in begin_use()
>> >>>>>>>>>>>>>> v3: handle possible race between begin_use() work handler
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Fixes: 3b669df92c85 ("drm/amdgpu/vcn: adjust workload
>> >>>>>>>>>>>>>> profile
>> >>>>>>>>>>>>>> handling")
>> >>>>>>>>>>>>>> Reviewed-by: Sathishkumar S
>> >>>>>>>>>>>>>> <sathishkumar.sundararaju at amd.com> (v1)
>> >>>>>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>> >>>>>>>>>>>>>> ---
>> >>>>>>>>>>>>>>          drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 40
>> >>>>>>>>>>>>>> ++++++++++++-------------
>> >>>>>>>>>>>>>>          drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h |  1 +
>> >>>>>>>>>>>>>>           2 files changed, 21 insertions(+), 20
>> >>>>>>>>>>>>>> deletions(-)
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>> >>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>> >>>>>>>>>>>>>> index 9a76e11d1c184..593c1ddf8819b 100644
>> >>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>> >>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
>> >>>>>>>>>>>>>> @@ -415,19 +415,25 @@ static void
>> >>>>>>>>>>>>>> amdgpu_vcn_idle_work_handler(struct work_struct *work)
>> >>>>>>>>>>>>>>              struct amdgpu_vcn_inst *vcn_inst =
>> >>>>>>>>>>>>>>                      container_of(work, struct
>> >>>>>>>>>>>>>> amdgpu_vcn_inst, idle_work.work);
>> >>>>>>>>>>>>>>              struct amdgpu_device *adev = vcn_inst->adev;
>> >>>>>>>>>>>>>> -     unsigned int fences = 0, fence[AMDGPU_MAX_VCN_INSTANCES] = {0};
>> >>>>>>>>>>>>>> -     unsigned int i = vcn_inst->inst, j;
>> >>>>>>>>>>>>>> +     unsigned int total_fences = 0,
>> >>>>>>>>>>>>>> fence[AMDGPU_MAX_VCN_INSTANCES] = {0};
>> >>>>>>>>>>>>>> +     unsigned int i, j;
>> >>>>>>>>>>>>>>              int r = 0;
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> -     if (adev->vcn.harvest_config & (1 << i))
>> >>>>>>>>>>>>>> +     if (adev->vcn.harvest_config & (1 <<
>> >>>>>>>>>>>>>> + vcn_inst->inst))
>> >>>>>>>>>>>>>>                      return;
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> -     for (j = 0; j < adev->vcn.inst[i].num_enc_rings; ++j)
>> >>>>>>>>>>>>>> -            fence[i] +=
>> >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_enc[j]);
>> >>>>>>>>>>>>>> +     for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
>> >>>>>>>>>>>>>> +            struct amdgpu_vcn_inst *v =
>> >>>>>>>>>>>>>> + &adev->vcn.inst[i];
>> >>>>>>>>>>>>>> +
>> >>>>>>>>>>>>>> +            for (j = 0; j < v->num_enc_rings; ++j)
>> >>>>>>>>>>>>>> +                     fence[i] +=
>> >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&v->ring_enc[j]);
>> >>>>>>>>>>>>>> +            fence[i] += amdgpu_fence_count_emitted(&v->ring_dec);
>> >>>>>>>>>>>>>> +            total_fences += fence[i];
>> >>>>>>>>>>>>>> +     }
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>              /* Only set DPG pause for VCN3 or below, VCN4
>> >>>>>>>>>>>>>> and above will be handled by FW */
>> >>>>>>>>>>>>>>              if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG &&
>> >>>>>>>>>>>>>> -        !adev->vcn.inst[i].using_unified_queue) {
>> >>>>>>>>>>>>>> +        !vcn_inst->using_unified_queue) {
>> >>>>>>>>>>>>>>                      struct dpg_pause_state new_state;
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>                      if (fence[i] || @@ -436,18 +442,18 @@
>> >>>>>>>>>>>>>> static void amdgpu_vcn_idle_work_handler(struct
>> >>>>>>>>>>>>>> work_struct
>> >>>>>>>>>>>>>> *work)
>> >>>>>>>>>>>>>>                      else
>> >>>>>>>>>>>>>>                              new_state.fw_based =
>> >>>>>>>>>>>>>> VCN_DPG_STATE__UNPAUSE;
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> -            adev->vcn.inst[i].pause_dpg_mode(vcn_inst, &new_state);
>> >>>>>>>>>>>>>> +            vcn_inst->pause_dpg_mode(vcn_inst,
>> >>>>>>>>>>>>>> + &new_state);
>> >>>>>>>>>>>>>>               }
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> -     fence[i] += amdgpu_fence_count_emitted(&vcn_inst->ring_dec);
>> >>>>>>>>>>>>>> -     fences += fence[i];
>> >>>>>>>>>>>>>> -
>> >>>>>>>>>>>>>> -     if (!fences && !atomic_read(&vcn_inst->total_submission_cnt)) {
>> >>>>>>>>>>>>>> +     if (!fence[vcn_inst->inst] &&
>> >>>>>>>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) {
>> >>>>>>>>>>>>>> +            /* This is specific to this instance */
>> >>>>>>>>>>>>>>                      mutex_lock(&vcn_inst->vcn_pg_lock);
>> >>>>>>>>>>>>>>                      vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_GATE);
>> >>>>>>>>>>>>>>                      mutex_unlock(&vcn_inst->vcn_pg_lock);
>> >>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
>> >>>>>>>>>>>>>> -            if (adev->vcn.workload_profile_active) {
>> >>>>>>>>>>>>>> +            /* This is global and depends on all VCN instances */
>> >>>>>>>>>>>>>> +            if (adev->vcn.workload_profile_active &&
>> >>>>>>>>>>>>>> !total_fences &&
>> >>>>>>>>>>>>>> + !atomic_read(&adev->vcn.total_submission_cnt)) {
>> >>>>>>>>>>>>>>                              r =
>> >>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev,
>> >>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, false);
>> >>>>>>>>>>>>>>                              if (r) @@ -467,16 +473,10 @@
>> >>>>>>>>>>>>>> void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
>> >>>>>>>>>>>>>>              int r = 0;
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>              atomic_inc(&vcn_inst->total_submission_cnt);
>> >>>>>>>>>>>>>> +    atomic_inc(&adev->vcn.total_submission_cnt);
>> >>>>>>>>>>>>> move this addition down inside the mutex lock
>> >>>>>>>>>>>>>> cancel_delayed_work_sync(&vcn_inst->idle_work);
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> -     /* We can safely return early here because we've cancelled the
>> >>>>>>>>>>>>>> -      * the delayed work so there is no one else to set it to false
>> >>>>>>>>>>>>>> -      * and we don't care if someone else sets it to true.
>> >>>>>>>>>>>>>> -      */
>> >>>>>>>>>>>>>> -     if (adev->vcn.workload_profile_active)
>> >>>>>>>>>>>>>> -            goto pg_lock;
>> >>>>>>>>>>>>>> -
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
>> >>>>>>>>>>>>> move to here:
>> >>>>>>>>>>>>> atomic_inc(&adev->vcn.total_submission_cnt);
>> >>>>>>>>>>>>> I think this should work for multiple instances.
>> >>>>>>>>>>>> Why does this need to be protected by the mutex?
>> >>>>>>>>>>> hmm.. OK - no need and it is actually better before the 
>> mutex.
>> >>>>>>>>>>> David
>> >>>>>>>>>>>> Alex
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> David
>> >>>>>>>>>>>>>>              if (!adev->vcn.workload_profile_active) {
>> >>>>>>>>>>>>>>                      r =
>> >>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev,
>> >>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, @@ -487,7 +487,6 @@ void
>> >>>>>>>>>>>>>> amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
>> >>>>>>>>>>>>>>               }
>> >>>>>>>>>>>>>> mutex_unlock(&adev->vcn.workload_profile_mutex);
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> -pg_lock:
>> >>>>>>>>>>>>>>              mutex_lock(&vcn_inst->vcn_pg_lock);
>> >>>>>>>>>>>>>>              vcn_inst->set_pg_state(vcn_inst,
>> >>>>>>>>>>>>>> AMD_PG_STATE_UNGATE);
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> @@ -528,6 +527,7 @@ void amdgpu_vcn_ring_end_use(struct
>> >>>>>>>>>>>>>> amdgpu_ring
>> >>>>>>>>>>>>>> *ring)
>> >>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].dpg_enc_submiss
>> >>>>>>>>>>>>>> i
>> >>>>>>>>>>>>>> o
>> >>>>>>>>>>>>>> n
>> >>>>>>>>>>>>>> _cnt);
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].total_submissio
>> >>>>>>>>>>>>>> n
>> >>>>>>>>>>>>>> _
>> >>>>>>>>>>>>>> c
>> >>>>>>>>>>>>>> nt);
>> >>>>>>>>>>>>>> + atomic_dec(&ring->adev->vcn.total_submission_cnt);
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> schedule_delayed_work(&ring->adev->vcn.inst[ring->me].idle_work,
>> >>>>>>>>>>>>>>                                    VCN_IDLE_TIMEOUT); diff
>> >>>>>>>>>>>>>> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>> >>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>> >>>>>>>>>>>>>> index b3fb1d0e43fc9..febc3ce8641ff 100644
>> >>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>> >>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
>> >>>>>>>>>>>>>> @@ -352,6 +352,7 @@ struct amdgpu_vcn {
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>              uint16_t inst_mask;
>> >>>>>>>>>>>>>>              uint8_t num_inst_per_aid;
>> >>>>>>>>>>>>>> +    atomic_t                total_submission_cnt;
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>              /* IP reg dump */
>> >>>>>>>>>>>>>>              uint32_t                *ip_dump;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250818/46a20faa/attachment-0001.htm>


More information about the amd-gfx mailing list