<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>Definitely, we need to move cancel_delayed_work_sync moved to
      before power gate.<br>
    </p>
    <p>Should "save_bo" be step 4 before power gate ?<br>
    </p>
    <p>Regards,<br>
    </p>
    <p>Leo</p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2021-05-17 1:59 p.m., James Zhu
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:8659d46e-58b6-d090-f95f-2ad9d4533f0e@amd.com">
      
      <p>Then we forgot the proposal I provided before.</p>
      <p>I think the below seq may fixed the race condition issue that
        we are facing.<br>
      </p>
      <p>1. stop scheduling new jobs <br>
      </p>
      <p>    for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {<br>
                if (adev->vcn.harvest_config & (1 << i))<br>
                    continue;<br>
        <br>
                ring = &adev->vcn.inst[i].ring_dec;<br>
                ring->sched.ready = false;<br>
        <br>
                for (j = 0; j < adev->vcn.num_enc_rings; ++j) {<br>
                    ring = &adev->vcn.inst[i].ring_enc[j];<br>
                    ring->sched.ready = false;<br>
                }<br>
            }</p>
      <p>2.    cancel_delayed_work_sync(&adev->vcn.idle_work);</p>
      <p>3.    <font size="2"><span style="font-size:11pt;">SOC15_WAIT_ON_RREG(VCN,
            inst_idx, mmUVD_POWER_STATUS, 1,<br>
                     UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);</span></font></p>
      <p><font size="2"><span style="font-size:11pt;">4.    </span></font>amdgpu_device_ip_set_powergating_state(adev,
        AMD_IP_BLOCK_TYPE_VCN,   AMD_PG_STATE_GATE);</p>
      <p>5.  saved_bo</p>
      <p>Best Regards!</p>
      <p>James<br>
      </p>
      <div class="moz-cite-prefix">On 2021-05-17 1:43 p.m., Leo Liu
        wrote:<br>
      </div>
      <blockquote type="cite" cite="mid:3c6a0bf3-b4b4-0a93-573e-fd9ae02f16a8@amd.com"> <br>
        On 2021-05-17 12:54 p.m., James Zhu wrote: <br>
        <blockquote type="cite">I am wondering if there are still some
          jobs kept in the queue, it is lucky to check <br>
        </blockquote>
        <br>
        Yes it's possible, in this case delayed handler is set, so
        cancelling once is enough. <br>
        <br>
        <br>
        <blockquote type="cite"> <br>
          UVD_POWER_STATUS done, but after, fw start a new job that list
          in the queue. <br>
          <br>
          To handle this situation perfectly, we need add mechanism to
          suspend fw first. <br>
        </blockquote>
        <br>
        I think that should be handled by the sequence from
        vcn_v3_0_stop_dpg_mode(). <br>
        <br>
        <br>
        <blockquote type="cite"> <br>
          Another case, if it is unlucky, that  vcn fw hung at that
          time, UVD_POWER_STATUS <br>
          <br>
          always keeps busy.   then it needs force powering gate the vcn
          hw after certain time waiting. <br>
        </blockquote>
        <br>
        Yep, we still need to gate VCN power after certain timeout. <br>
        <br>
        <br>
        Regards, <br>
        <br>
        Leo <br>
        <br>
        <br>
        <br>
        <blockquote type="cite"> <br>
          Best Regards! <br>
          <br>
          James <br>
          <br>
          On 2021-05-17 12:34 p.m., Leo Liu wrote: <br>
          <blockquote type="cite"> <br>
            On 2021-05-17 11:52 a.m., James Zhu wrote: <br>
            <blockquote type="cite">During vcn suspends, stop ring
              continue to receive new requests, <br>
              and try to wait for all vcn jobs to finish gracefully. <br>
              <br>
              v2: Forced powering gate vcn hardware after few wainting
              retry. <br>
              <br>
              Signed-off-by: James Zhu <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com" moz-do-not-send="true"><James.Zhu@amd.com></a>
              <br>
              --- <br>
                drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 22
              +++++++++++++++++++++- <br>
                1 file changed, 21 insertions(+), 1 deletion(-) <br>
              <br>
              diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
              b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c <br>
              index 2016459..9f3a6e7 100644 <br>
              --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c <br>
              +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c <br>
              @@ -275,9 +275,29 @@ int amdgpu_vcn_suspend(struct
              amdgpu_device *adev) <br>
                { <br>
                    unsigned size; <br>
                    void *ptr; <br>
              +    int retry_max = 6; <br>
                    int i; <br>
                -   
              cancel_delayed_work_sync(&adev->vcn.idle_work); <br>
              +    for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
              <br>
              +        if (adev->vcn.harvest_config & (1 <<
              i)) <br>
              +            continue; <br>
              +        ring = &adev->vcn.inst[i].ring_dec; <br>
              +        ring->sched.ready = false; <br>
              + <br>
              +        for (j = 0; j < adev->vcn.num_enc_rings;
              ++j) { <br>
              +            ring = &adev->vcn.inst[i].ring_enc[j];
              <br>
              +            ring->sched.ready = false; <br>
              +        } <br>
              +    } <br>
              + <br>
              +    while (retry_max-- &&
              cancel_delayed_work_sync(&adev->vcn.idle_work)) <br>
              +        mdelay(5); <br>
            </blockquote>
            <br>
            I think it's possible to have one pending job unprocessed
            with VCN when suspend sequence getting here, but it
            shouldn't be more than one, cancel_delayed_work_sync
            probably return false after the first time, so calling
            cancel_delayed_work_sync once should be enough here. we
            probably need to wait longer from: <br>
            <br>
            SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1, <br>
                    UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); <br>
            <br>
            to make sure the unprocessed job get done. <br>
            <br>
            <br>
            Regards, <br>
            <br>
            Leo <br>
            <br>
            <br>
            <blockquote type="cite">+    if (!retry_max &&
              !amdgpu_sriov_vf(adev)) { <br>
              +        if (RREG32_SOC15(VCN, i, mmUVD_STATUS)) { <br>
              +            dev_warn(adev->dev, "Forced powering gate
              vcn hardware!"); <br>
              +            vcn_v3_0_set_powergating_state(adev,
              AMD_PG_STATE_GATE); <br>
              +        } <br>
              +    } <br>
                      for (i = 0; i < adev->vcn.num_vcn_inst; ++i)
              { <br>
                        if (adev->vcn.harvest_config & (1
              << i)) <br>
            </blockquote>
          </blockquote>
        </blockquote>
      </blockquote>
    </blockquote>
  </body>
</html>