<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>Then we forgot the proposal I provided before.</p>
    <p>I think the below seq may fixed the race condition issue that we
      are facing.<br>
    </p>
    <p>1. stop scheduling new jobs <br>
    </p>
    <p>    for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {<br>
              if (adev->vcn.harvest_config & (1 << i))<br>
                  continue;<br>
      <br>
              ring = &adev->vcn.inst[i].ring_dec;<br>
              ring->sched.ready = false;<br>
      <br>
              for (j = 0; j < adev->vcn.num_enc_rings; ++j) {<br>
                  ring = &adev->vcn.inst[i].ring_enc[j];<br>
                  ring->sched.ready = false;<br>
              }<br>
          }</p>
    <p>2.    cancel_delayed_work_sync(&adev->vcn.idle_work);</p>
    <p>3.    <font size="2"><span style="font-size:11pt;">SOC15_WAIT_ON_RREG(VCN,
          inst_idx, mmUVD_POWER_STATUS, 1,<br>
                   UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);</span></font></p>
    <p><font size="2"><span style="font-size:11pt;">4.    </span></font>amdgpu_device_ip_set_powergating_state(adev,
      AMD_IP_BLOCK_TYPE_VCN,   AMD_PG_STATE_GATE);</p>
    <p>5.  saved_bo</p>
    <p>Best Regards!</p>
    <p>James<br>
    </p>
    <div class="moz-cite-prefix">On 2021-05-17 1:43 p.m., Leo Liu wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:3c6a0bf3-b4b4-0a93-573e-fd9ae02f16a8@amd.com">
      <br>
      On 2021-05-17 12:54 p.m., James Zhu wrote:
      <br>
      <blockquote type="cite">I am wondering if there are still some
        jobs kept in the queue, it is lucky to check
        <br>
      </blockquote>
      <br>
      Yes it's possible, in this case delayed handler is set, so
      cancelling once is enough.
      <br>
      <br>
      <br>
      <blockquote type="cite">
        <br>
        UVD_POWER_STATUS done, but after, fw start a new job that list
        in the queue.
        <br>
        <br>
        To handle this situation perfectly, we need add mechanism to
        suspend fw first.
        <br>
      </blockquote>
      <br>
      I think that should be handled by the sequence from
      vcn_v3_0_stop_dpg_mode().
      <br>
      <br>
      <br>
      <blockquote type="cite">
        <br>
        Another case, if it is unlucky, that  vcn fw hung at that time,
        UVD_POWER_STATUS
        <br>
        <br>
        always keeps busy.   then it needs force powering gate the vcn
        hw after certain time waiting.
        <br>
      </blockquote>
      <br>
      Yep, we still need to gate VCN power after certain timeout.
      <br>
      <br>
      <br>
      Regards,
      <br>
      <br>
      Leo
      <br>
      <br>
      <br>
      <br>
      <blockquote type="cite">
        <br>
        Best Regards!
        <br>
        <br>
        James
        <br>
        <br>
        On 2021-05-17 12:34 p.m., Leo Liu wrote:
        <br>
        <blockquote type="cite">
          <br>
          On 2021-05-17 11:52 a.m., James Zhu wrote:
          <br>
          <blockquote type="cite">During vcn suspends, stop ring
            continue to receive new requests,
            <br>
            and try to wait for all vcn jobs to finish gracefully.
            <br>
            <br>
            v2: Forced powering gate vcn hardware after few wainting
            retry.
            <br>
            <br>
            Signed-off-by: James Zhu <a class="moz-txt-link-rfc2396E" href="mailto:James.Zhu@amd.com"><James.Zhu@amd.com></a>
            <br>
            ---
            <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 22
            +++++++++++++++++++++-
            <br>
              1 file changed, 21 insertions(+), 1 deletion(-)
            <br>
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
            <br>
            index 2016459..9f3a6e7 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
            <br>
            @@ -275,9 +275,29 @@ int amdgpu_vcn_suspend(struct
            amdgpu_device *adev)
            <br>
              {
            <br>
                  unsigned size;
            <br>
                  void *ptr;
            <br>
            +    int retry_max = 6;
            <br>
                  int i;
            <br>
              -   
            cancel_delayed_work_sync(&adev->vcn.idle_work);
            <br>
            +    for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
            <br>
            +        if (adev->vcn.harvest_config & (1 <<
            i))
            <br>
            +            continue;
            <br>
            +        ring = &adev->vcn.inst[i].ring_dec;
            <br>
            +        ring->sched.ready = false;
            <br>
            +
            <br>
            +        for (j = 0; j < adev->vcn.num_enc_rings; ++j)
            {
            <br>
            +            ring = &adev->vcn.inst[i].ring_enc[j];
            <br>
            +            ring->sched.ready = false;
            <br>
            +        }
            <br>
            +    }
            <br>
            +
            <br>
            +    while (retry_max-- &&
            cancel_delayed_work_sync(&adev->vcn.idle_work))
            <br>
            +        mdelay(5);
            <br>
          </blockquote>
          <br>
          I think it's possible to have one pending job unprocessed with
          VCN when suspend sequence getting here, but it shouldn't be
          more than one, cancel_delayed_work_sync probably return false
          after the first time, so calling cancel_delayed_work_sync once
          should be enough here. we probably need to wait longer from:
          <br>
          <br>
          SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
          <br>
                  UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
          <br>
          <br>
          to make sure the unprocessed job get done.
          <br>
          <br>
          <br>
          Regards,
          <br>
          <br>
          Leo
          <br>
          <br>
          <br>
          <blockquote type="cite">+    if (!retry_max &&
            !amdgpu_sriov_vf(adev)) {
            <br>
            +        if (RREG32_SOC15(VCN, i, mmUVD_STATUS)) {
            <br>
            +            dev_warn(adev->dev, "Forced powering gate
            vcn hardware!");
            <br>
            +            vcn_v3_0_set_powergating_state(adev,
            AMD_PG_STATE_GATE);
            <br>
            +        }
            <br>
            +    }
            <br>
                    for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
            <br>
                      if (adev->vcn.harvest_config & (1 <<
            i))
            <br>
          </blockquote>
        </blockquote>
      </blockquote>
    </blockquote>
  </body>
</html>