[PATCH] drm/amdgpu/vcn: fix video profile race condition (v3)
David Wu
davidwu2 at amd.com
Thu Aug 14 18:42:20 UTC 2025
hmm.. it is my concern for the same instance. but I got it now. Your
patch is good.
Thanks,
David
On 2025-08-14 14:06, Lazar, Lijo wrote:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
>
> I didn't fully understand the question.
>
> For the same instance, begin_thread will set the power state only
> after cancelling any idle worker for the instance. If idle worker is
> already under progress, then it needs to complete as that's a
> cancel_sync (it's the existing logic).
>
> Basically, by the time begin_thread sets the PG state, no idle worker
> for the same vcn instance would be active. If it's about context
> switch to another vcn instance's begin_thread, I think that won't be a
> problem.
>
> Thanks,
> Lijo
> ------------------------------------------------------------------------
> *From:* Wu, David <David.Wu3 at amd.com>
> *Sent:* Thursday, August 14, 2025 11:14:26 PM
> *To:* Lazar, Lijo <Lijo.Lazar at amd.com>; Sundararaju, Sathishkumar
> <Sathishkumar.Sundararaju at amd.com>; Alex Deucher <alexdeucher at gmail.com>
> *Cc:* Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> <amd-gfx at lists.freedesktop.org>
> *Subject:* Re: [PATCH] drm/amdgpu/vcn: fix video profile race
> condition (v3)
> amdgpu_vcn_idle_work_handler():
> if (!fences && !atomic_read(&vcn_inst->total_submission_cnt)) {
> ----------- could it be possible a context switch here to
> amdgpu_vcn_ring_begin_use()?
> if it could then AMD_PG_STATE_GATE will be set by mistake.
>
> David
>
> On 2025-08-14 08:54, Lazar, Lijo wrote:
> > [Public]
> >
> > The request profile can be moved outside the pg_lock in begin_use as
> in the attached patch. It needs set power state -> set profile order.
> >
> > This is the premise -
> >
> > Let's say there are two threads, begin_use thread and idle_work
> threads. begin_use and idle_work will need the workprofile mutex to
> request a profile.
> >
> > Case 1) Idle thread gets the lock first.
> > a) Idle thread sees vinst power state as PG_UNGATE, no harm
> done. It exits without requesting power profile change. begin_use
> thread gets the lock next, it sees profile as active and continues.
> > b) Idle thread sees vinst power state as PG_GATE, it will
> make workprofile_active to false and exit. Now when begin_use thread
> gets the mutex next, it's guaranteed to see the workprofile_active as
> false, hence it will request the profile.
> >
> > Case 2) begin_use thread gets the lock first.
> > a) Workload profile is active, hence it doesn't do anything
> and exits. The change made by begin_use thread to vinst power state
> (state = on) will now be visible to idle thread which gets the lock
> next. It will do nothing and exit.
> > b) Workload profile is inactive, hence it requests a
> profile change. Again, the change made by begin_use thread to vinst
> power state will now be visible to idle thread which gets the lock
> next. It will do nothing and exit.
> >
> > Thanks,
> > Lijo
> >
> > -----Original Message-----
> > From: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>
> > Sent: Thursday, August 14, 2025 6:18 PM
> > To: Lazar, Lijo <Lijo.Lazar at amd.com>; Alex Deucher
> <alexdeucher at gmail.com>
> > Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> > Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race
> condition (v3)
> >
> >
> > On 8/14/2025 5:33 PM, Lazar, Lijo wrote:
> >> [Public]
> >>
> >> There is no need for nested lock. It only needs to follow the order
> >> set instance power_state
> >> set profile (this takes a global lock and hence instance
> power state will be visible to whichever thread that gets the work
> profile lock).
> >>
> >> You are seeing nested lock just because I added the code just after
> power state setting.
> > Pasting your code from the file for ref :
> >
> > @@ -464,32 +509,14 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring
> > *ring)
> >
> > -pg_lock:
> >
> > mutex_lock(&vcn_inst->vcn_pg_lock);
> > vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_UNGATE);
> >
> > + amdgpu_vcn_get_profile(adev);
> >
> > vcn_pg_lock isn't released here yet right ? And in-case you intend
> to only order the locks, then still there is an un-necessary OFF
> followed by ON, but yes that is acceptable,
> >
> > May be you want to move that vcn_pg_lock after
> amdgpu_vcn_get_profile to protect concurrent dpg_state access in
> begin_use.
> >
> > The concern is, this patch access power_state that is protected by
> some other mutex lock hoping it reflects right values also when
> holding powerprofile_lock.
> >
> > Or
> >
> > Have shared a patch with global workload_profile_mutex that
> simplifies this handling, and renamed pg_lock -> dpg_lock and used
> >
> > that only for dpg_state changes per instance.
> >
> > Regards,
> >
> > Sathish
> >
> >> Thanks,
> >> Lijo
> >>
> >> -----Original Message-----
> >> From: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>
> >> Sent: Thursday, August 14, 2025 5:23 PM
> >> To: Lazar, Lijo <Lijo.Lazar at amd.com>; Alex Deucher
> >> <alexdeucher at gmail.com>
> >> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
> >> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> >> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition
> >> (v3)
> >>
> >>
> >> On 8/14/2025 3:16 PM, Lazar, Lijo wrote:
> >>> [Public]
> >>>
> >>> I see your point now. Attached should work, I guess. Is the
> concern more about having to take the lock for every begin?
> >> This is closer, but the thing is, IMO we shouldn't have to use 2
> locks and go into nested locking, we can do with just one global lock.
> >>
> >> Power_state of each instance, and global workload_profile_active are
> >> inter-related, they need to be guarded together,
> >>
> >> nested could work , but why nested if single lock is enough ?
> nested complicates it.
> >>
> >> Regards,
> >>
> >> Sathish
> >>
> >>> Thanks,
> >>> Lijo
> >>>
> >>> -----Original Message-----
> >>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> >>> Lazar, Lijo
> >>> Sent: Thursday, August 14, 2025 2:55 PM
> >>> To: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>;
> >>> Alex Deucher <alexdeucher at gmail.com>
> >>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
> >>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> >>> Subject: RE: [PATCH] drm/amdgpu/vcn: fix video profile race condition
> >>> (v3)
> >>>
> >>> [Public]
> >>>
> >>> That is not required I think. The power profile is set by an
> instance *after* setting itself to power on. Also, it's switched back
> after changing its power state to off. If idle worker is run by
> another instance, it won't be seeing the inst0 as power gated and
> won't change power profile.
> >>>
> >>> Thanks,
> >>> Lijo
> >>> -----Original Message-----
> >>> From: Sundararaju, Sathishkumar <Sathishkumar.Sundararaju at amd.com>
> >>> Sent: Thursday, August 14, 2025 2:41 PM
> >>> To: Lazar, Lijo <Lijo.Lazar at amd.com>; Alex Deucher
> >>> <alexdeucher at gmail.com>
> >>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
> >>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> >>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition
> >>> (v3)
> >>>
> >>> Hi Lijo,
> >>>
> >>> On 8/14/2025 2:11 PM, Lazar, Lijo wrote:
> >>>> [Public]
> >>>>
> >>>> We already have a per instance power state that can be tracked.
> What about something like attached?
> >>> This also has concurrent access of the power state ,
> >>> vcn.inst[i].cur_state is not protected by workload_profile_mutex
> >>>
> >>> every where, it can still change while you are holding
> workload_profile_mutex and checking it.
> >>>
> >>> Regards,
> >>>
> >>> Sathish
> >>>
> >>>> Thanks,
> >>>> Lijo
> >>>> -----Original Message-----
> >>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> >>>> Sundararaju, Sathishkumar
> >>>> Sent: Thursday, August 14, 2025 4:43 AM
> >>>> To: Alex Deucher <alexdeucher at gmail.com>
> >>>> Cc: Wu, David <David.Wu3 at amd.com>; Deucher, Alexander
> >>>> <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
> >>>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race
> >>>> condition
> >>>> (v3)
> >>>>
> >>>>
> >>>> On 8/14/2025 3:38 AM, Alex Deucher wrote:
> >>>>> On Wed, Aug 13, 2025 at 5:1 PM Sundararaju, Sathishkumar
> >>>>> <sathishkumar.sundararaju at amd.com> wrote:
> >>>>>> On 8/14/2025 2:33 AM, Alex Deucher wrote:
> >>>>>>> On Wed, Aug 13, 2025 at 4:58 PM Sundararaju, Sathishkumar
> >>>>>>> <sathishkumar.sundararaju at amd.com> wrote:
> >>>>>>>> On 8/14/2025 1:35 AM, Alex Deucher wrote:
> >>>>>>>>> On Wed, Aug 13, 2025 at 2:23 PM Sundararaju, Sathishkumar
> >>>>>>>>> <sathishkumar.sundararaju at amd.com> wrote:
> >>>>>>>>>> Hi Alex, Hi David,
> >>>>>>>>>>
> >>>>>>>>>> I see David's concern but his suggestion yet wont solve the
> >>>>>>>>>> problem, neither the current form , reason :-
> >>>>>>>>>>
> >>>>>>>>>> The emitted fence count and total submission count are fast
> >>>>>>>>>> transients which frequently become 0 in between video decodes
> >>>>>>>>>> (between jobs) even with the atomics and locks there can be a
> >>>>>>>>>> switch of video power profile, in the current form of patch
> >>>>>>>>>> that window is minimized, but still can happen if stress
> >>>>>>>>>> tested. But power state of any instance becoming zero
> >>>>>>>>> Can you explain how this can happen? I'm not seeing it.
> >>>>>>>> Consider this situation, inst0 and inst1 actively decoding,
> >>>>>>>> inst0 decode completes, delayed idle work starts.
> >>>>>>>> inst0 idle handler can read 0 total fences and 0 total
> >>>>>>>> submission count, even if inst1 is actively decoding, that's
> between the jobs,
> >>>>>>>> - as begin_use increaments vcn.total_submission_cnt and
> >>>>>>>> end_use decreaments vcn.total_submission_cnt that can be 0.
> >>>>>>>> - if outstanding fences are cleared and no new emitted
> >>>>>>>> fence, between jobs , can be 0.
> >>>>>>>> - both of the above conditions do not mean video decode
> >>>>>>>> is complete on inst1, it is actively decoding.
> >>>>>>> How can there be active decoding without an outstanding fence?
> >>>>>>> In that case, total_fences (fences from both instances) would
> be non-0.
> >>>>>> I mean on inst1 the job scheduled is already complete, so 0
> >>>>>> outstanding fences, newer job is yet to be scheduled
> >>>>>>
> >>>>>> and commited to ring (inst1) , this doesn't mean decode has
> >>>>>> stopped on
> >>>>>> inst1 right (I am saying if timing of inst0 idle work coincides
> >>>>>> with this),
> >>>>>>
> >>>>>> Or am I wrong in assuming this ? Can't this ever happen ? Please
> >>>>>> correct my understanding here.
> >>>>> The flow looks like:
> >>>>>
> >>>>> begin_use(inst)
> >>>>> emit_fence(inst)
> >>>>> end_use(inst)
> >>>>>
> >>>>> ...later
> >>>>> fence signals
> >>>>> ...later
> >>>>> work handler
> >>>>>
> >>>>> In begin_use we increment the global and per instance submission.
> >>>>> This protects the power gating and profile until end_use. In end
> >>>>> use we decrement the submissions because we don't need to protect
> >>>>> anything any more as we have the fence that was submitted via the
> >>>>> ring. That fence won't signal until the job is complete.
> >>>> Is a next begin_use always guaranteed to be run before current
> job fence signals ?
> >>>>
> >>>> if not then both total submission and total fence are zero , example
> >>>> delayed job/packet submissions
> >>>>
> >>>> from user space, or next job schedule happens after current job
> fence signals.
> >>>>
> >>>> if this is never possible then (v3) is perfect.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Sathish
> >>>>
> >>>>> For power gating, we
> >>>>> only care about the submission count and fences for that instance,
> >>>>> for the profile, we care about submission count and fences all
> instances.
> >>>>> Once the fences have signalled, the outstanding fences will be 0
> >>>>> and there won't be any active work.
> >>>>>
> >>>>> Alex
> >>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Sathish
> >>>>>>
> >>>>>>> Alex
> >>>>>>>
> >>>>>>>> Whereas if instances are powered off we are sure idle time is
> >>>>>>>> past and it is powered off, no possible way of active video
> >>>>>>>> decode, when all instances are off we can safely assume no
> >>>>>>>> active decode and global lock protects it against new
> begin_use on any instance.
> >>>>>>>> But the only distant concern is global common locks w.r.t perf,
> >>>>>>>> but we are already having a global workprofile mutex , so there
> >>>>>>>> shouldn't be any drop in perf, with just one single global lock
> >>>>>>>> for all instances.
> >>>>>>>>
> >>>>>>>> Just sending out a patch with this fix, will leave it to you to
> >>>>>>>> decide the right method. If you think outstanding total fences
> >>>>>>>> can never be 0 during decode, then your previous version (v3)
> >>>>>>>> itself is good, there is no real benefit of splitting the
> handlers as such.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Sathish
> >>>>>>>>> If it is possible, maybe it would be easier to just split the
> >>>>>>>>> profile and powergating into separate handlers. The profile
> >>>>>>>>> one would be global and the powergating one would be per
> instance.
> >>>>>>>>> See the attached patches.
> >>>>>>>>>
> >>>>>>>>> Alex
> >>>>>>>>>
> >>>>>>>>>> can be a sure shot indication of break in a video decode, the
> >>>>>>>>>> mistake in my patch was using per instance mutex, I should
> >>>>>>>>>> have used a common global mutex, then that covers the
> situation David is trying to bring out.
> >>>>>>>>>>
> >>>>>>>>>> Using one global vcn.pg_lock for idle and begin_use and using
> >>>>>>>>>> flags to track power state could help us totally avoid this
> situation.
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>>
> >>>>>>>>>> Sathish
> >>>>>>>>>>
> >>>>>>>>>> On 8/13/2025 11:46 PM, Wu, David wrote:
> >>>>>>>>>>> On 8/13/2025 12:51 PM, Alex Deucher wrote:
> >>>>>>>>>>>> On Wed, Aug 13, 2025 at 12:39 PM Wu, David
> <davidwu2 at amd.com> wrote:
> >>>>>>>>>>>>> Hi Alex,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The addition of total_submission_cnt should work - in that
> >>>>>>>>>>>>> it is unlikely to have a context switch right after the
> begin_use().
> >>>>>>>>>>>>> The suggestion of moving it inside the lock (which I prefer
> >>>>>>>>>>>>> in case someone adds more before the lock and not reviewed
> >>>>>>>>>>>>> thoroughly)
> >>>>>>>>>>>>> - up to you to decide.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Reviewed-by: David (Ming Qiang) Wu <David.Wu3 at amd.com>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> David
> >>>>>>>>>>>>> On 8/13/2025 9:45 AM, Alex Deucher wrote:
> >>>>>>>>>>>>>> If there are multiple instances of the VCN running, we may
> >>>>>>>>>>>>>> end up switching the video profile while another instance
> >>>>>>>>>>>>>> is active because we only take into account the current
> >>>>>>>>>>>>>> instance's submissions. Look at all outstanding fences
> >>>>>>>>>>>>>> for the video profile.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> v2: drop early exit in begin_use()
> >>>>>>>>>>>>>> v3: handle possible race between begin_use() work handler
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Fixes: 3b669df92c85 ("drm/amdgpu/vcn: adjust workload
> >>>>>>>>>>>>>> profile
> >>>>>>>>>>>>>> handling")
> >>>>>>>>>>>>>> Reviewed-by: Sathishkumar S
> >>>>>>>>>>>>>> <sathishkumar.sundararaju at amd.com> (v1)
> >>>>>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> >>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 40
> >>>>>>>>>>>>>> ++++++++++++-------------
> >>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 +
> >>>>>>>>>>>>>> 2 files changed, 21 insertions(+), 20
> >>>>>>>>>>>>>> deletions(-)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> >>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> >>>>>>>>>>>>>> index 9a76e11d1c184..593c1ddf8819b 100644
> >>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> >>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> >>>>>>>>>>>>>> @@ -415,19 +415,25 @@ static void
> >>>>>>>>>>>>>> amdgpu_vcn_idle_work_handler(struct work_struct *work)
> >>>>>>>>>>>>>> struct amdgpu_vcn_inst *vcn_inst =
> >>>>>>>>>>>>>> container_of(work, struct
> >>>>>>>>>>>>>> amdgpu_vcn_inst, idle_work.work);
> >>>>>>>>>>>>>> struct amdgpu_device *adev = vcn_inst->adev;
> >>>>>>>>>>>>>> - unsigned int fences = 0,
> fence[AMDGPU_MAX_VCN_INSTANCES] = {0};
> >>>>>>>>>>>>>> - unsigned int i = vcn_inst->inst, j;
> >>>>>>>>>>>>>> + unsigned int total_fences = 0,
> >>>>>>>>>>>>>> fence[AMDGPU_MAX_VCN_INSTANCES] = {0};
> >>>>>>>>>>>>>> + unsigned int i, j;
> >>>>>>>>>>>>>> int r = 0;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - if (adev->vcn.harvest_config & (1 << i))
> >>>>>>>>>>>>>> + if (adev->vcn.harvest_config & (1 <<
> >>>>>>>>>>>>>> + vcn_inst->inst))
> >>>>>>>>>>>>>> return;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - for (j = 0; j < adev->vcn.inst[i].num_enc_rings; ++j)
> >>>>>>>>>>>>>> - fence[i] +=
> >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_enc[j]);
> >>>>>>>>>>>>>> + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
> >>>>>>>>>>>>>> + struct amdgpu_vcn_inst *v =
> >>>>>>>>>>>>>> + &adev->vcn.inst[i];
> >>>>>>>>>>>>>> +
> >>>>>>>>>>>>>> + for (j = 0; j < v->num_enc_rings; ++j)
> >>>>>>>>>>>>>> + fence[i] +=
> >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&v->ring_enc[j]);
> >>>>>>>>>>>>>> + fence[i] +=
> amdgpu_fence_count_emitted(&v->ring_dec);
> >>>>>>>>>>>>>> + total_fences += fence[i];
> >>>>>>>>>>>>>> + }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> /* Only set DPG pause for VCN3 or below, VCN4
> >>>>>>>>>>>>>> and above will be handled by FW */
> >>>>>>>>>>>>>> if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG &&
> >>>>>>>>>>>>>> - !adev->vcn.inst[i].using_unified_queue) {
> >>>>>>>>>>>>>> + !vcn_inst->using_unified_queue) {
> >>>>>>>>>>>>>> struct dpg_pause_state new_state;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> if (fence[i] || @@ -436,18 +442,18 @@
> >>>>>>>>>>>>>> static void amdgpu_vcn_idle_work_handler(struct
> >>>>>>>>>>>>>> work_struct
> >>>>>>>>>>>>>> *work)
> >>>>>>>>>>>>>> else
> >>>>>>>>>>>>>> new_state.fw_based =
> >>>>>>>>>>>>>> VCN_DPG_STATE__UNPAUSE;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - adev->vcn.inst[i].pause_dpg_mode(vcn_inst, &new_state);
> >>>>>>>>>>>>>> + vcn_inst->pause_dpg_mode(vcn_inst,
> >>>>>>>>>>>>>> + &new_state);
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - fence[i] +=
> amdgpu_fence_count_emitted(&vcn_inst->ring_dec);
> >>>>>>>>>>>>>> - fences += fence[i];
> >>>>>>>>>>>>>> -
> >>>>>>>>>>>>>> - if (!fences &&
> !atomic_read(&vcn_inst->total_submission_cnt)) {
> >>>>>>>>>>>>>> + if (!fence[vcn_inst->inst] &&
> >>>>>>>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) {
> >>>>>>>>>>>>>> + /* This is specific to this instance */
> >>>>>>>>>>>>>> mutex_lock(&vcn_inst->vcn_pg_lock);
> >>>>>>>>>>>>>> vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_GATE);
> >>>>>>>>>>>>>> mutex_unlock(&vcn_inst->vcn_pg_lock);
> >>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
> >>>>>>>>>>>>>> - if (adev->vcn.workload_profile_active) {
> >>>>>>>>>>>>>> + /* This is global and depends on all VCN
> instances */
> >>>>>>>>>>>>>> + if (adev->vcn.workload_profile_active &&
> >>>>>>>>>>>>>> !total_fences &&
> >>>>>>>>>>>>>> + !atomic_read(&adev->vcn.total_submission_cnt)) {
> >>>>>>>>>>>>>> r =
> >>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev,
> >>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, false);
> >>>>>>>>>>>>>> if (r) @@ -467,16 +473,10 @@
> >>>>>>>>>>>>>> void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
> >>>>>>>>>>>>>> int r = 0;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> atomic_inc(&vcn_inst->total_submission_cnt);
> >>>>>>>>>>>>>> + atomic_inc(&adev->vcn.total_submission_cnt);
> >>>>>>>>>>>>> move this addition down inside the mutex lock
> >>>>>>>>>>>>>> cancel_delayed_work_sync(&vcn_inst->idle_work);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - /* We can safely return early here because we've
> cancelled the
> >>>>>>>>>>>>>> - * the delayed work so there is no one else to
> set it to false
> >>>>>>>>>>>>>> - * and we don't care if someone else sets it to true.
> >>>>>>>>>>>>>> - */
> >>>>>>>>>>>>>> - if (adev->vcn.workload_profile_active)
> >>>>>>>>>>>>>> - goto pg_lock;
> >>>>>>>>>>>>>> -
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex);
> >>>>>>>>>>>>> move to here:
> >>>>>>>>>>>>> atomic_inc(&adev->vcn.total_submission_cnt);
> >>>>>>>>>>>>> I think this should work for multiple instances.
> >>>>>>>>>>>> Why does this need to be protected by the mutex?
> >>>>>>>>>>> hmm.. OK - no need and it is actually better before the mutex.
> >>>>>>>>>>> David
> >>>>>>>>>>>> Alex
> >>>>>>>>>>>>
> >>>>>>>>>>>>> David
> >>>>>>>>>>>>>> if (!adev->vcn.workload_profile_active) {
> >>>>>>>>>>>>>> r =
> >>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev,
> >>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, @@ -487,7 +487,6 @@ void
> >>>>>>>>>>>>>> amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>> mutex_unlock(&adev->vcn.workload_profile_mutex);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -pg_lock:
> >>>>>>>>>>>>>> mutex_lock(&vcn_inst->vcn_pg_lock);
> >>>>>>>>>>>>>> vcn_inst->set_pg_state(vcn_inst,
> >>>>>>>>>>>>>> AMD_PG_STATE_UNGATE);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> @@ -528,6 +527,7 @@ void amdgpu_vcn_ring_end_use(struct
> >>>>>>>>>>>>>> amdgpu_ring
> >>>>>>>>>>>>>> *ring)
> >>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].dpg_enc_submiss
> >>>>>>>>>>>>>> i
> >>>>>>>>>>>>>> o
> >>>>>>>>>>>>>> n
> >>>>>>>>>>>>>> _cnt);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].total_submissio
> >>>>>>>>>>>>>> n
> >>>>>>>>>>>>>> _
> >>>>>>>>>>>>>> c
> >>>>>>>>>>>>>> nt);
> >>>>>>>>>>>>>> + atomic_dec(&ring->adev->vcn.total_submission_cnt);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> schedule_delayed_work(&ring->adev->vcn.inst[ring->me].idle_work,
> >>>>>>>>>>>>>> VCN_IDLE_TIMEOUT); diff
> >>>>>>>>>>>>>> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
> >>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
> >>>>>>>>>>>>>> index b3fb1d0e43fc9..febc3ce8641ff 100644
> >>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
> >>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
> >>>>>>>>>>>>>> @@ -352,6 +352,7 @@ struct amdgpu_vcn {
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> uint16_t inst_mask;
> >>>>>>>>>>>>>> uint8_t num_inst_per_aid;
> >>>>>>>>>>>>>> + atomic_t total_submission_cnt;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> /* IP reg dump */
> >>>>>>>>>>>>>> uint32_t *ip_dump;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20250814/2b35bc51/attachment-0001.htm>
More information about the amd-gfx
mailing list