<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>Hi Lijo and Alex,</p>
    I prefer Lijo's method as it should not cause unexpected power
    profile change for the active session.<br>
    Liji, could you make some adjustments for your patch such as
    removing extra blank line and function <br>
    declaration such as using "void" instead of "int" for the new
    functions (should they be static?) <br>
    <p>Thanks,</p>
    <p>David<br>
    </p>
    <div class="moz-cite-prefix">On 2025-08-14 14:42, David Wu wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:2d89d9f9-4dff-4de7-96c6-6176df98312f@amd.com">
      
      <div class="moz-cite-prefix">hmm.. it is my concern for the same
        instance. but I got it now. Your patch is good. <br>
      </div>
      <div class="moz-cite-prefix">Thanks,</div>
      <div class="moz-cite-prefix">David<br>
      </div>
      <div class="moz-cite-prefix">On 2025-08-14 14:06, Lazar, Lijo
        wrote:<br>
      </div>
      <blockquote type="cite" cite="mid:DS0PR12MB7804E1DB7C7882D3F0EA7EED9735A@DS0PR12MB7804.namprd12.prod.outlook.com">
        <p style="font-family:Calibri;font-size:10pt;color:#0000FF;margin:5pt;font-style:normal;font-weight:normal;text-decoration:none;" align="Left"> [AMD Official Use Only - AMD Internal
          Distribution Only]<br>
        </p>
        <br>
        <div>
          <div dir="auto" style="font-family: Aptos, Aptos_MSFontService, -apple-system, Roboto, Arial, Helvetica, sans-serif; font-size: 12pt;">
            I didn't fully understand the question. </div>
          <div dir="auto" style="font-family: Aptos, Aptos_MSFontService, -apple-system, Roboto, Arial, Helvetica, sans-serif; font-size: 12pt;">
            <br>
          </div>
          <div dir="auto" style="font-family: Aptos, Aptos_MSFontService, -apple-system, Roboto, Arial, Helvetica, sans-serif; font-size: 12pt;">
            For the same instance, begin_thread will set the power state
            only after cancelling any idle worker for the instance. If
            idle worker is already under progress, then it needs to
            complete as that's a cancel_sync (it's the existing logic).</div>
          <div dir="auto" style="font-family: Aptos, Aptos_MSFontService, -apple-system, Roboto, Arial, Helvetica, sans-serif; font-size: 12pt;">
            <br>
          </div>
          <div dir="auto" style="font-family: Aptos, Aptos_MSFontService, -apple-system, Roboto, Arial, Helvetica, sans-serif; font-size: 12pt;">
            Basically, by the time begin_thread sets the PG state, no
            idle worker for the same vcn instance would be active. If
            it's about context switch to another vcn instance's
            begin_thread, I think that won't be a problem.</div>
          <div id="ms-outlook-mobile-body-separator-line" dir="auto"><br>
          </div>
          <div id="ms-outlook-mobile-signature" style="font-family: Aptos, Aptos_MSFontService, -apple-system, Roboto, Arial, Helvetica, sans-serif; font-size: 12pt;" dir="auto">
            <div dir="auto">Thanks,</div>
            <div dir="auto">Lijo</div>
          </div>
          <hr style="display:inline-block;width:98%" tabindex="-1">
          <div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Wu, David <a class="moz-txt-link-rfc2396E" href="mailto:David.Wu3@amd.com" moz-do-not-send="true"><David.Wu3@amd.com></a><br>
              <b>Sent:</b> Thursday, August 14, 2025 11:14:26 PM<br>
              <b>To:</b> Lazar, Lijo <a class="moz-txt-link-rfc2396E" href="mailto:Lijo.Lazar@amd.com" moz-do-not-send="true"><Lijo.Lazar@amd.com></a>;
              Sundararaju, Sathishkumar <a class="moz-txt-link-rfc2396E" href="mailto:Sathishkumar.Sundararaju@amd.com" moz-do-not-send="true"><Sathishkumar.Sundararaju@amd.com></a>;
              Alex Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexdeucher@gmail.com" moz-do-not-send="true"><alexdeucher@gmail.com></a><br>
              <b>Cc:</b> Wu, David <a class="moz-txt-link-rfc2396E" href="mailto:David.Wu3@amd.com" moz-do-not-send="true"><David.Wu3@amd.com></a>;
              Deucher, Alexander <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true"><Alexander.Deucher@amd.com></a>;
              <a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>
              <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true"><amd-gfx@lists.freedesktop.org></a><br>
              <b>Subject:</b> Re: [PATCH] drm/amdgpu/vcn: fix video
              profile race condition (v3)</font>
            <div> </div>
          </div>
          <div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
                <div class="PlainText">amdgpu_vcn_idle_work_handler():<br>
                       if (!fences &&
                  !atomic_read(&vcn_inst->total_submission_cnt))
                  {<br>
                  ----------- could it be possible a context switch here
                  to <br>
                  amdgpu_vcn_ring_begin_use()?<br>
                    if it could then AMD_PG_STATE_GATE will be set by
                  mistake.<br>
                  <br>
                  David<br>
                  <br>
                  On 2025-08-14 08:54, Lazar, Lijo wrote:<br>
                  > [Public]<br>
                  ><br>
                  > The request profile can be moved outside the
                  pg_lock in begin_use as in the attached patch. It
                  needs  set power state -> set profile order.<br>
                  ><br>
                  > This is the premise -<br>
                  ><br>
                  > Let's say there are two threads, begin_use thread
                  and idle_work threads. begin_use and idle_work will
                  need the workprofile mutex to request a profile.<br>
                  ><br>
                  > Case 1) Idle thread gets the lock first.<br>
                  >          a) Idle thread sees vinst power state as
                  PG_UNGATE, no harm done. It exits without requesting
                  power profile change. begin_use thread gets the lock
                  next, it sees profile as active and continues.<br>
                  >          b) Idle thread sees vinst power state as
                  PG_GATE, it will make workprofile_active to false and
                  exit. Now when begin_use thread gets the mutex next,
                  it's guaranteed to see the workprofile_active as
                  false, hence it will request the profile.<br>
                  ><br>
                  > Case 2) begin_use thread gets the lock first.<br>
                  >          a) Workload profile is active, hence it
                  doesn't do anything and exits. The change made by
                  begin_use thread to vinst power state (state = on)
                  will now be visible to idle thread which gets the lock
                  next. It will do nothing and exit.<br>
                  >          b) Workload profile is inactive, hence
                  it requests a profile change. Again, the change made
                  by begin_use thread to vinst power state will now be
                  visible to idle thread which gets the lock next. It
                  will do nothing and exit.<br>
                  ><br>
                  > Thanks,<br>
                  > Lijo<br>
                  ><br>
                  > -----Original Message-----<br>
                  > From: Sundararaju, Sathishkumar <a class="moz-txt-link-rfc2396E" href="mailto:Sathishkumar.Sundararaju@amd.com" moz-do-not-send="true"><Sathishkumar.Sundararaju@amd.com></a><br>
                  > Sent: Thursday, August 14, 2025 6:18 PM<br>
                  > To: Lazar, Lijo <a class="moz-txt-link-rfc2396E" href="mailto:Lijo.Lazar@amd.com" moz-do-not-send="true"><Lijo.Lazar@amd.com></a>;
                  Alex Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexdeucher@gmail.com" moz-do-not-send="true"><alexdeucher@gmail.com></a><br>
                  > Cc: Wu, David <a class="moz-txt-link-rfc2396E" href="mailto:David.Wu3@amd.com" moz-do-not-send="true"><David.Wu3@amd.com></a>;
                  Deucher, Alexander <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true"><Alexander.Deucher@amd.com></a>;
                  <a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                  > Subject: Re: [PATCH] drm/amdgpu/vcn: fix video
                  profile race condition (v3)<br>
                  ><br>
                  ><br>
                  > On 8/14/2025 5:33 PM, Lazar, Lijo wrote:<br>
                  >> [Public]<br>
                  >><br>
                  >> There is no need for nested lock. It only
                  needs to follow the order<br>
                  >>           set instance power_state<br>
                  >>           set profile (this takes a global
                  lock and hence instance power state will be visible to
                  whichever thread that gets the work profile lock).<br>
                  >><br>
                  >> You are seeing nested lock just because I
                  added the code just after power state setting.<br>
                  > Pasting your code from the file for ref :<br>
                  ><br>
                  > @@ -464,32 +509,14 @@ void
                  amdgpu_vcn_ring_begin_use(struct amdgpu_ring<br>
                  > *ring)<br>
                  ><br>
                  > -pg_lock:<br>
                  ><br>
                  >        mutex_lock(&vcn_inst->vcn_pg_lock);<br>
                  >        vcn_inst->set_pg_state(vcn_inst,
                  AMD_PG_STATE_UNGATE);<br>
                  ><br>
                  > +   amdgpu_vcn_get_profile(adev);<br>
                  ><br>
                  > vcn_pg_lock isn't  released here yet right ? And
                  in-case you intend to only order the locks, then still
                  there is an un-necessary OFF followed by ON, but yes
                  that is acceptable,<br>
                  ><br>
                  > May be you want to move that vcn_pg_lock after
                  amdgpu_vcn_get_profile to protect concurrent dpg_state
                  access in begin_use.<br>
                  ><br>
                  > The concern is, this patch access power_state
                  that is protected by some other mutex lock hoping it
                  reflects right values also when holding
                  powerprofile_lock.<br>
                  ><br>
                  > Or<br>
                  ><br>
                  > Have shared a patch with global
                  workload_profile_mutex that simplifies this handling,
                  and renamed pg_lock -> dpg_lock  and used<br>
                  ><br>
                  > that only for dpg_state changes per instance.<br>
                  ><br>
                  > Regards,<br>
                  ><br>
                  > Sathish<br>
                  ><br>
                  >> Thanks,<br>
                  >> Lijo<br>
                  >><br>
                  >> -----Original Message-----<br>
                  >> From: Sundararaju, Sathishkumar <a class="moz-txt-link-rfc2396E" href="mailto:Sathishkumar.Sundararaju@amd.com" moz-do-not-send="true"><Sathishkumar.Sundararaju@amd.com></a><br>
                  >> Sent: Thursday, August 14, 2025 5:23 PM<br>
                  >> To: Lazar, Lijo <a class="moz-txt-link-rfc2396E" href="mailto:Lijo.Lazar@amd.com" moz-do-not-send="true"><Lijo.Lazar@amd.com></a>;
                  Alex Deucher<br>
                  >> <a class="moz-txt-link-rfc2396E" href="mailto:alexdeucher@gmail.com" moz-do-not-send="true"><alexdeucher@gmail.com></a><br>
                  >> Cc: Wu, David <a class="moz-txt-link-rfc2396E" href="mailto:David.Wu3@amd.com" moz-do-not-send="true"><David.Wu3@amd.com></a>;
                  Deucher, Alexander<br>
                  >> <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true"><Alexander.Deucher@amd.com></a>;
                  <a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                  >> Subject: Re: [PATCH] drm/amdgpu/vcn: fix
                  video profile race condition<br>
                  >> (v3)<br>
                  >><br>
                  >><br>
                  >> On 8/14/2025 3:16 PM, Lazar, Lijo wrote:<br>
                  >>> [Public]<br>
                  >>><br>
                  >>> I see your point now. Attached should
                  work, I guess. Is the concern more about having to
                  take the lock for every begin?<br>
                  >> This is closer,  but the thing is, IMO we
                  shouldn't have to use 2 locks and go into nested
                  locking, we can do with just one global lock.<br>
                  >><br>
                  >> Power_state of each instance, and global
                  workload_profile_active are<br>
                  >> inter-related, they need to be guarded
                  together,<br>
                  >><br>
                  >> nested could work , but why nested if single
                  lock is enough ? nested complicates it.<br>
                  >><br>
                  >> Regards,<br>
                  >><br>
                  >> Sathish<br>
                  >><br>
                  >>> Thanks,<br>
                  >>> Lijo<br>
                  >>><br>
                  >>> -----Original Message-----<br>
                  >>> From: amd-gfx <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true"><amd-gfx-bounces@lists.freedesktop.org></a>
                  On Behalf Of<br>
                  >>> Lazar, Lijo<br>
                  >>> Sent: Thursday, August 14, 2025 2:55 PM<br>
                  >>> To: Sundararaju, Sathishkumar <a class="moz-txt-link-rfc2396E" href="mailto:Sathishkumar.Sundararaju@amd.com" moz-do-not-send="true"><Sathishkumar.Sundararaju@amd.com></a>;<br>
                  >>> Alex Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexdeucher@gmail.com" moz-do-not-send="true"><alexdeucher@gmail.com></a><br>
                  >>> Cc: Wu, David <a class="moz-txt-link-rfc2396E" href="mailto:David.Wu3@amd.com" moz-do-not-send="true"><David.Wu3@amd.com></a>;
                  Deucher, Alexander<br>
                  >>> <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true"><Alexander.Deucher@amd.com></a>;
                  <a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                  >>> Subject: RE: [PATCH] drm/amdgpu/vcn: fix
                  video profile race condition<br>
                  >>> (v3)<br>
                  >>><br>
                  >>> [Public]<br>
                  >>><br>
                  >>> That is not required I think. The power
                  profile is set by an instance *after* setting itself
                  to power on. Also, it's switched back after changing
                  its power state to off.  If idle worker is run by
                  another instance, it won't be seeing the inst0 as
                  power gated and won't change power profile.<br>
                  >>><br>
                  >>> Thanks,<br>
                  >>> Lijo<br>
                  >>> -----Original Message-----<br>
                  >>> From: Sundararaju, Sathishkumar <a class="moz-txt-link-rfc2396E" href="mailto:Sathishkumar.Sundararaju@amd.com" moz-do-not-send="true"><Sathishkumar.Sundararaju@amd.com></a><br>
                  >>> Sent: Thursday, August 14, 2025 2:41 PM<br>
                  >>> To: Lazar, Lijo <a class="moz-txt-link-rfc2396E" href="mailto:Lijo.Lazar@amd.com" moz-do-not-send="true"><Lijo.Lazar@amd.com></a>;
                  Alex Deucher<br>
                  >>> <a class="moz-txt-link-rfc2396E" href="mailto:alexdeucher@gmail.com" moz-do-not-send="true"><alexdeucher@gmail.com></a><br>
                  >>> Cc: Wu, David <a class="moz-txt-link-rfc2396E" href="mailto:David.Wu3@amd.com" moz-do-not-send="true"><David.Wu3@amd.com></a>;
                  Deucher, Alexander<br>
                  >>> <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true"><Alexander.Deucher@amd.com></a>;
                  <a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                  >>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix
                  video profile race condition<br>
                  >>> (v3)<br>
                  >>><br>
                  >>> Hi Lijo,<br>
                  >>><br>
                  >>> On 8/14/2025 2:11 PM, Lazar, Lijo wrote:<br>
                  >>>> [Public]<br>
                  >>>><br>
                  >>>> We already have a per instance power
                  state that can be tracked. What about something like
                  attached?<br>
                  >>> This also has concurrent access of the
                  power state ,<br>
                  >>> vcn.inst[i].cur_state is not protected by
                  workload_profile_mutex<br>
                  >>><br>
                  >>> every where, it can still change while
                  you are holding workload_profile_mutex and checking
                  it.<br>
                  >>><br>
                  >>> Regards,<br>
                  >>><br>
                  >>> Sathish<br>
                  >>><br>
                  >>>> Thanks,<br>
                  >>>> Lijo<br>
                  >>>> -----Original Message-----<br>
                  >>>> From: amd-gfx <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx-bounces@lists.freedesktop.org" moz-do-not-send="true"><amd-gfx-bounces@lists.freedesktop.org></a>
                  On Behalf Of<br>
                  >>>> Sundararaju, Sathishkumar<br>
                  >>>> Sent: Thursday, August 14, 2025 4:43
                  AM<br>
                  >>>> To: Alex Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexdeucher@gmail.com" moz-do-not-send="true"><alexdeucher@gmail.com></a><br>
                  >>>> Cc: Wu, David <a class="moz-txt-link-rfc2396E" href="mailto:David.Wu3@amd.com" moz-do-not-send="true"><David.Wu3@amd.com></a>;
                  Deucher, Alexander<br>
                  >>>> <a class="moz-txt-link-rfc2396E" href="mailto:Alexander.Deucher@amd.com" moz-do-not-send="true"><Alexander.Deucher@amd.com></a>;
                  <a class="moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:amd-gfx@lists.freedesktop.org" moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                  >>>> Subject: Re: [PATCH] drm/amdgpu/vcn:
                  fix video profile race<br>
                  >>>> condition<br>
                  >>>> (v3)<br>
                  >>>><br>
                  >>>><br>
                  >>>> On 8/14/2025 3:38 AM, Alex Deucher
                  wrote:<br>
                  >>>>> On Wed, Aug 13, 2025 at 5:1 PM
                  Sundararaju, Sathishkumar<br>
                  >>>>> <a class="moz-txt-link-rfc2396E" href="mailto:sathishkumar.sundararaju@amd.com" moz-do-not-send="true"><sathishkumar.sundararaju@amd.com></a>
                  wrote:<br>
                  >>>>>> On 8/14/2025 2:33 AM, Alex
                  Deucher wrote:<br>
                  >>>>>>> On Wed, Aug 13, 2025 at
                  4:58 PM Sundararaju, Sathishkumar<br>
                  >>>>>>> <a class="moz-txt-link-rfc2396E" href="mailto:sathishkumar.sundararaju@amd.com" moz-do-not-send="true"><sathishkumar.sundararaju@amd.com></a>
                  wrote:<br>
                  >>>>>>>> On 8/14/2025 1:35 AM,
                  Alex Deucher wrote:<br>
                  >>>>>>>>> On Wed, Aug 13,
                  2025 at 2:23 PM Sundararaju, Sathishkumar<br>
                  >>>>>>>>> <a class="moz-txt-link-rfc2396E" href="mailto:sathishkumar.sundararaju@amd.com" moz-do-not-send="true"><sathishkumar.sundararaju@amd.com></a>
                  wrote:<br>
                  >>>>>>>>>> Hi Alex, Hi
                  David,<br>
                  >>>>>>>>>><br>
                  >>>>>>>>>> I see David's
                  concern but his suggestion yet wont solve the<br>
                  >>>>>>>>>> problem,
                  neither the current form , reason :-<br>
                  >>>>>>>>>><br>
                  >>>>>>>>>> The emitted
                  fence count and total submission count are fast<br>
                  >>>>>>>>>> transients
                  which frequently become 0 in between video decodes<br>
                  >>>>>>>>>> (between
                  jobs) even with the atomics and locks there can be a<br>
                  >>>>>>>>>> switch of
                  video power profile, in the current form of patch<br>
                  >>>>>>>>>> that window
                  is minimized, but still can happen if stress<br>
                  >>>>>>>>>> tested. But
                  power state of any instance becoming zero<br>
                  >>>>>>>>> Can you explain
                  how this can happen?  I'm not seeing it.<br>
                  >>>>>>>> Consider this
                  situation, inst0 and inst1 actively decoding,<br>
                  >>>>>>>> inst0 decode
                  completes, delayed idle work starts.<br>
                  >>>>>>>> inst0 idle handler
                  can read 0 total fences and 0 total<br>
                  >>>>>>>> submission count,
                  even if inst1 is actively decoding, that's between the
                  jobs,<br>
                  >>>>>>>>         - as
                  begin_use increaments vcn.total_submission_cnt and<br>
                  >>>>>>>> end_use decreaments
                  vcn.total_submission_cnt that can be 0.<br>
                  >>>>>>>>         - if
                  outstanding fences are cleared and no new emitted<br>
                  >>>>>>>> fence, between jobs ,
                  can be 0.<br>
                  >>>>>>>>         - both of the
                  above conditions do not mean video decode<br>
                  >>>>>>>> is complete on inst1,
                  it is actively decoding.<br>
                  >>>>>>> How can there be active
                  decoding without an outstanding fence?<br>
                  >>>>>>> In that case,
                  total_fences (fences from both instances) would be
                  non-0.<br>
                  >>>>>> I mean on inst1 the job
                  scheduled is already complete, so 0<br>
                  >>>>>> outstanding fences, newer job
                  is yet to be scheduled<br>
                  >>>>>><br>
                  >>>>>> and commited to ring (inst1)
                  , this doesn't mean decode has<br>
                  >>>>>> stopped on<br>
                  >>>>>> inst1 right (I am saying if
                  timing of inst0 idle work coincides<br>
                  >>>>>> with this),<br>
                  >>>>>><br>
                  >>>>>> Or am I wrong in assuming
                  this ? Can't this ever happen ? Please<br>
                  >>>>>> correct my understanding
                  here.<br>
                  >>>>> The flow looks like:<br>
                  >>>>><br>
                  >>>>> begin_use(inst)<br>
                  >>>>> emit_fence(inst)<br>
                  >>>>> end_use(inst)<br>
                  >>>>><br>
                  >>>>> ...later<br>
                  >>>>> fence signals<br>
                  >>>>> ...later<br>
                  >>>>> work handler<br>
                  >>>>><br>
                  >>>>> In begin_use we increment the
                  global and per instance submission.<br>
                  >>>>> This protects the power gating
                  and profile until end_use.  In end<br>
                  >>>>> use we decrement the submissions
                  because we don't need to protect<br>
                  >>>>> anything any more as we have the
                  fence that was submitted via the<br>
                  >>>>> ring.  That fence won't signal
                  until the job is complete.<br>
                  >>>> Is a next begin_use always guaranteed
                  to be run before current job fence signals ?<br>
                  >>>><br>
                  >>>> if not then both total submission and
                  total fence are zero , example<br>
                  >>>> delayed job/packet submissions<br>
                  >>>><br>
                  >>>> from user space, or next job schedule
                  happens after current job fence signals.<br>
                  >>>><br>
                  >>>> if this is never possible then (v3)
                  is perfect.<br>
                  >>>><br>
                  >>>> Regards,<br>
                  >>>><br>
                  >>>> Sathish<br>
                  >>>><br>
                  >>>>> For power gating, we<br>
                  >>>>> only care about the submission
                  count and fences for that instance,<br>
                  >>>>> for the profile, we care about
                  submission count and fences all instances.<br>
                  >>>>> Once the fences have signalled,
                  the outstanding fences will be 0<br>
                  >>>>> and there won't be any active
                  work.<br>
                  >>>>><br>
                  >>>>> Alex<br>
                  >>>>><br>
                  >>>>>> Regards,<br>
                  >>>>>><br>
                  >>>>>> Sathish<br>
                  >>>>>><br>
                  >>>>>>> Alex<br>
                  >>>>>>><br>
                  >>>>>>>> Whereas if instances
                  are powered off we are sure idle time is<br>
                  >>>>>>>> past and it is
                  powered off, no possible way of active video<br>
                  >>>>>>>> decode, when all
                  instances are off we can safely assume no<br>
                  >>>>>>>> active decode and
                  global lock protects it against new begin_use on any
                  instance.<br>
                  >>>>>>>> But the only distant
                  concern is global common locks w.r.t perf,<br>
                  >>>>>>>> but we are already
                  having a global workprofile mutex , so there<br>
                  >>>>>>>> shouldn't be any drop
                  in perf, with just one single global lock<br>
                  >>>>>>>> for all instances.<br>
                  >>>>>>>><br>
                  >>>>>>>> Just sending out a
                  patch with this fix, will leave it to you to<br>
                  >>>>>>>> decide the right
                  method. If you think outstanding total fences<br>
                  >>>>>>>> can never be 0 during
                  decode, then your previous version (v3)<br>
                  >>>>>>>> itself is good, there
                  is no real benefit of splitting the handlers as such.<br>
                  >>>>>>>><br>
                  >>>>>>>> Regards,<br>
                  >>>>>>>> Sathish<br>
                  >>>>>>>>> If it is
                  possible, maybe it would be easier to just split the<br>
                  >>>>>>>>> profile and
                  powergating into separate handlers.  The profile<br>
                  >>>>>>>>> one would be
                  global and the powergating one would be per instance.<br>
                  >>>>>>>>> See the attached
                  patches.<br>
                  >>>>>>>>><br>
                  >>>>>>>>> Alex<br>
                  >>>>>>>>><br>
                  >>>>>>>>>> can be a sure
                  shot indication of break in a video decode, the<br>
                  >>>>>>>>>> mistake in my
                  patch was using per instance mutex, I should<br>
                  >>>>>>>>>> have used a
                  common global mutex, then that covers the situation
                  David is trying to bring out.<br>
                  >>>>>>>>>><br>
                  >>>>>>>>>> Using one
                  global vcn.pg_lock for idle and begin_use and using<br>
                  >>>>>>>>>> flags to
                  track power state could help us totally avoid this
                  situation.<br>
                  >>>>>>>>>><br>
                  >>>>>>>>>> Regards,<br>
                  >>>>>>>>>><br>
                  >>>>>>>>>> Sathish<br>
                  >>>>>>>>>><br>
                  >>>>>>>>>> On 8/13/2025
                  11:46 PM, Wu, David wrote:<br>
                  >>>>>>>>>>> On
                  8/13/2025 12:51 PM, Alex Deucher wrote:<br>
                  >>>>>>>>>>>> On
                  Wed, Aug 13, 2025 at 12:39 PM Wu, David <a class="moz-txt-link-rfc2396E" href="mailto:davidwu2@amd.com" moz-do-not-send="true"><davidwu2@amd.com></a>
                  wrote:<br>
                  >>>>>>>>>>>>>
                  Hi Alex,<br>
                  >>>>>>>>>>>>><br>
                  >>>>>>>>>>>>>
                  The addition of  total_submission_cnt should work - in
                  that<br>
                  >>>>>>>>>>>>>
                  it is unlikely to have a context switch right after
                  the begin_use().<br>
                  >>>>>>>>>>>>>
                  The suggestion of moving it inside the lock (which I
                  prefer<br>
                  >>>>>>>>>>>>>
                  in case someone adds more before the lock and not
                  reviewed<br>
                  >>>>>>>>>>>>>
                  thoroughly)<br>
>>>>>>>>>>>>>           - up to
                  you to decide.<br>
                  >>>>>>>>>>>>><br>
                  >>>>>>>>>>>>>
                  Reviewed-by: David (Ming Qiang) Wu <a class="moz-txt-link-rfc2396E" href="mailto:David.Wu3@amd.com" moz-do-not-send="true"><David.Wu3@amd.com></a><br>
                  >>>>>>>>>>>>><br>
                  >>>>>>>>>>>>>
                  Thanks,<br>
                  >>>>>>>>>>>>>
                  David<br>
                  >>>>>>>>>>>>>
                  On 8/13/2025 9:45 AM, Alex Deucher wrote:<br>
>>>>>>>>>>>>>> If there are
                  multiple instances of the VCN running, we may<br>
>>>>>>>>>>>>>> end up
                  switching the video profile while another instance<br>
>>>>>>>>>>>>>> is active
                  because we only take into account the current<br>
>>>>>>>>>>>>>> instance's
                  submissions.  Look at all outstanding fences<br>
>>>>>>>>>>>>>> for the video
                  profile.<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> v2: drop early
                  exit in begin_use()<br>
>>>>>>>>>>>>>> v3: handle
                  possible race between begin_use() work handler<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> Fixes:
                  3b669df92c85 ("drm/amdgpu/vcn: adjust workload<br>
>>>>>>>>>>>>>> profile<br>
>>>>>>>>>>>>>> handling")<br>
>>>>>>>>>>>>>> Reviewed-by:
                  Sathishkumar S<br>
>>>>>>>>>>>>>> <a class="moz-txt-link-rfc2396E" href="mailto:sathishkumar.sundararaju@amd.com" moz-do-not-send="true"><sathishkumar.sundararaju@amd.com></a>
                  (v1)<br>
>>>>>>>>>>>>>> Signed-off-by:
                  Alex Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexander.deucher@amd.com" moz-do-not-send="true"><alexander.deucher@amd.com></a><br>
>>>>>>>>>>>>>> ---<br>
>>>>>>>>>>>>>>          
                  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 40<br>
>>>>>>>>>>>>>>
                  ++++++++++++-------------<br>
>>>>>>>>>>>>>>          
                  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h |  1 +<br>
>>>>>>>>>>>>>>           2
                  files changed, 21 insertions(+), 20<br>
>>>>>>>>>>>>>> deletions(-)<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> diff --git
                  a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c<br>
>>>>>>>>>>>>>>
                  b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c<br>
>>>>>>>>>>>>>> index
                  9a76e11d1c184..593c1ddf8819b 100644<br>
>>>>>>>>>>>>>> ---
                  a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c<br>
>>>>>>>>>>>>>> +++
                  b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c<br>
>>>>>>>>>>>>>> @@ -415,19
                  +415,25 @@ static void<br>
>>>>>>>>>>>>>>
                  amdgpu_vcn_idle_work_handler(struct work_struct *work)<br>
>>>>>>>>>>>>>>              
                  struct amdgpu_vcn_inst *vcn_inst =<br>
>>>>>>>>>>>>>>                      
                  container_of(work, struct<br>
>>>>>>>>>>>>>>
                  amdgpu_vcn_inst, idle_work.work);<br>
>>>>>>>>>>>>>>              
                  struct amdgpu_device *adev = vcn_inst->adev;<br>
>>>>>>>>>>>>>> -     unsigned
                  int fences = 0, fence[AMDGPU_MAX_VCN_INSTANCES] = {0};<br>
>>>>>>>>>>>>>> -     unsigned
                  int i = vcn_inst->inst, j;<br>
>>>>>>>>>>>>>> +     unsigned
                  int total_fences = 0,<br>
>>>>>>>>>>>>>>
                  fence[AMDGPU_MAX_VCN_INSTANCES] = {0};<br>
>>>>>>>>>>>>>> +     unsigned
                  int i, j;<br>
>>>>>>>>>>>>>>              
                  int r = 0;<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> -     if
                  (adev->vcn.harvest_config & (1 << i))<br>
>>>>>>>>>>>>>> +     if
                  (adev->vcn.harvest_config & (1 <<<br>
>>>>>>>>>>>>>> +
                  vcn_inst->inst))<br>
>>>>>>>>>>>>>>                      
                  return;<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> -     for (j =
                  0; j < adev->vcn.inst[i].num_enc_rings; ++j)<br>
>>>>>>>>>>>>>> -            
                  fence[i] +=<br>
>>>>>>>>>>>>>>
amdgpu_fence_count_emitted(&vcn_inst->ring_enc[j]);<br>
>>>>>>>>>>>>>> +     for (i =
                  0; i < adev->vcn.num_vcn_inst; ++i) {<br>
>>>>>>>>>>>>>> +            
                  struct amdgpu_vcn_inst *v =<br>
>>>>>>>>>>>>>> +
                  &adev->vcn.inst[i];<br>
>>>>>>>>>>>>>> +<br>
>>>>>>>>>>>>>> +            
                  for (j = 0; j < v->num_enc_rings; ++j)<br>
>>>>>>>>>>>>>>
                  +                     fence[i] +=<br>
>>>>>>>>>>>>>>
                  amdgpu_fence_count_emitted(&v->ring_enc[j]);<br>
>>>>>>>>>>>>>> +            
                  fence[i] +=
                  amdgpu_fence_count_emitted(&v->ring_dec);<br>
>>>>>>>>>>>>>> +            
                  total_fences += fence[i];<br>
>>>>>>>>>>>>>> +     }<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>              
                  /* Only set DPG pause for VCN3 or below, VCN4<br>
>>>>>>>>>>>>>> and above will
                  be handled by FW */<br>
>>>>>>>>>>>>>>              
                  if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG
                  &&<br>
>>>>>>>>>>>>>> -        
                  !adev->vcn.inst[i].using_unified_queue) {<br>
>>>>>>>>>>>>>> +        
                  !vcn_inst->using_unified_queue) {<br>
>>>>>>>>>>>>>>                      
                  struct dpg_pause_state new_state;<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>                      
                  if (fence[i] || @@ -436,18 +442,18 @@<br>
>>>>>>>>>>>>>> static void
                  amdgpu_vcn_idle_work_handler(struct<br>
>>>>>>>>>>>>>> work_struct<br>
>>>>>>>>>>>>>> *work)<br>
>>>>>>>>>>>>>>                      
                  else<br>
>>>>>>>>>>>>>>                              
                  new_state.fw_based =<br>
>>>>>>>>>>>>>>
                  VCN_DPG_STATE__UNPAUSE;<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> -            
                  adev->vcn.inst[i].pause_dpg_mode(vcn_inst,
                  &new_state);<br>
>>>>>>>>>>>>>> +            
                  vcn_inst->pause_dpg_mode(vcn_inst,<br>
>>>>>>>>>>>>>> +
                  &new_state);<br>
>>>>>>>>>>>>>>               }<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> -     fence[i]
                  +=
                  amdgpu_fence_count_emitted(&vcn_inst->ring_dec);<br>
>>>>>>>>>>>>>> -     fences +=
                  fence[i];<br>
>>>>>>>>>>>>>> -<br>
>>>>>>>>>>>>>> -     if
                  (!fences &&
                  !atomic_read(&vcn_inst->total_submission_cnt))
                  {<br>
>>>>>>>>>>>>>> +     if
                  (!fence[vcn_inst->inst] &&<br>
>>>>>>>>>>>>>>
                  !atomic_read(&vcn_inst->total_submission_cnt))
                  {<br>
>>>>>>>>>>>>>> +            
                  /* This is specific to this instance */<br>
>>>>>>>>>>>>>>                      
                  mutex_lock(&vcn_inst->vcn_pg_lock);<br>
>>>>>>>>>>>>>>                      
                  vcn_inst->set_pg_state(vcn_inst,
                  AMD_PG_STATE_GATE);<br>
>>>>>>>>>>>>>>                      
                  mutex_unlock(&vcn_inst->vcn_pg_lock);<br>
>>>>>>>>>>>>>>
                  mutex_lock(&adev->vcn.workload_profile_mutex);<br>
>>>>>>>>>>>>>> -            
                  if (adev->vcn.workload_profile_active) {<br>
>>>>>>>>>>>>>> +            
                  /* This is global and depends on all VCN instances */<br>
>>>>>>>>>>>>>> +            
                  if (adev->vcn.workload_profile_active &&<br>
>>>>>>>>>>>>>> !total_fences
                  &&<br>
>>>>>>>>>>>>>> +
                  !atomic_read(&adev->vcn.total_submission_cnt))
                  {<br>
>>>>>>>>>>>>>>                              
                  r =<br>
>>>>>>>>>>>>>>
                  amdgpu_dpm_switch_power_profile(adev,<br>
>>>>>>>>>>>>>>
                  PP_SMC_POWER_PROFILE_VIDEO, false);<br>
>>>>>>>>>>>>>>                              
                  if (r) @@ -467,16 +473,10 @@<br>
>>>>>>>>>>>>>> void
                  amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)<br>
>>>>>>>>>>>>>>              
                  int r = 0;<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>              
                  atomic_inc(&vcn_inst->total_submission_cnt);<br>
>>>>>>>>>>>>>> +    
                  atomic_inc(&adev->vcn.total_submission_cnt);<br>
                  >>>>>>>>>>>>>
                  move this addition down inside the mutex lock<br>
>>>>>>>>>>>>>>
                  cancel_delayed_work_sync(&vcn_inst->idle_work);<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> -     /* We can
                  safely return early here because we've cancelled the<br>
>>>>>>>>>>>>>> -      * the
                  delayed work so there is no one else to set it to
                  false<br>
>>>>>>>>>>>>>> -      * and we
                  don't care if someone else sets it to true.<br>
>>>>>>>>>>>>>> -      */<br>
>>>>>>>>>>>>>> -     if
                  (adev->vcn.workload_profile_active)<br>
>>>>>>>>>>>>>> -            
                  goto pg_lock;<br>
>>>>>>>>>>>>>> -<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>
                  mutex_lock(&adev->vcn.workload_profile_mutex);<br>
                  >>>>>>>>>>>>>
                  move to here:<br>
                  >>>>>>>>>>>>>
                  atomic_inc(&adev->vcn.total_submission_cnt);<br>
                  >>>>>>>>>>>>> I
                  think this should work for multiple instances.<br>
                  >>>>>>>>>>>> Why
                  does this need to be protected by the mutex?<br>
                  >>>>>>>>>>> hmm.. OK
                  - no need and it is actually better before the mutex.<br>
                  >>>>>>>>>>> David<br>
                  >>>>>>>>>>>> Alex<br>
                  >>>>>>>>>>>><br>
                  >>>>>>>>>>>>>
                  David<br>
>>>>>>>>>>>>>>              
                  if (!adev->vcn.workload_profile_active) {<br>
>>>>>>>>>>>>>>                      
                  r =<br>
>>>>>>>>>>>>>>
                  amdgpu_dpm_switch_power_profile(adev,<br>
>>>>>>>>>>>>>>
                  PP_SMC_POWER_PROFILE_VIDEO, @@ -487,7 +487,6 @@ void<br>
>>>>>>>>>>>>>>
                  amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring)<br>
>>>>>>>>>>>>>>               }<br>
>>>>>>>>>>>>>>
                  mutex_unlock(&adev->vcn.workload_profile_mutex);<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> -pg_lock:<br>
>>>>>>>>>>>>>>              
                  mutex_lock(&vcn_inst->vcn_pg_lock);<br>
>>>>>>>>>>>>>>              
                  vcn_inst->set_pg_state(vcn_inst,<br>
>>>>>>>>>>>>>>
                  AMD_PG_STATE_UNGATE);<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> @@ -528,6
                  +527,7 @@ void amdgpu_vcn_ring_end_use(struct<br>
>>>>>>>>>>>>>> amdgpu_ring<br>
>>>>>>>>>>>>>> *ring)<br>
>>>>>>>>>>>>>>
atomic_dec(&ring->adev->vcn.inst[ring->me].dpg_enc_submiss<br>
>>>>>>>>>>>>>> i<br>
>>>>>>>>>>>>>> o<br>
>>>>>>>>>>>>>> n<br>
>>>>>>>>>>>>>> _cnt);<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>
atomic_dec(&ring->adev->vcn.inst[ring->me].total_submissio<br>
>>>>>>>>>>>>>> n<br>
>>>>>>>>>>>>>> _<br>
>>>>>>>>>>>>>> c<br>
>>>>>>>>>>>>>> nt);<br>
>>>>>>>>>>>>>> +
                  atomic_dec(&ring->adev->vcn.total_submission_cnt);<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>
schedule_delayed_work(&ring->adev->vcn.inst[ring->me].idle_work,<br>
>>>>>>>>>>>>>>                                    
                  VCN_IDLE_TIMEOUT); diff<br>
>>>>>>>>>>>>>> --git
                  a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h<br>
>>>>>>>>>>>>>>
                  b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h<br>
>>>>>>>>>>>>>> index
                  b3fb1d0e43fc9..febc3ce8641ff 100644<br>
>>>>>>>>>>>>>> ---
                  a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h<br>
>>>>>>>>>>>>>> +++
                  b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h<br>
>>>>>>>>>>>>>> @@ -352,6
                  +352,7 @@ struct amdgpu_vcn {<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>              
                  uint16_t inst_mask;<br>
>>>>>>>>>>>>>>              
                  uint8_t num_inst_per_aid;<br>
>>>>>>>>>>>>>> +    
                  atomic_t                total_submission_cnt;<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>>              
                  /* IP reg dump */<br>
>>>>>>>>>>>>>>              
                  uint32_t                *ip_dump;<br>
                </div>
              </span></font></div>
        </div>
      </blockquote>
    </blockquote>
  </body>
</html>