<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2021-11-22 6:34 a.m., Christian
      König wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:64e46791-cbe5-1bcb-93e2-be52d2dcce0c@amd.com">Am
      21.11.21 um 03:13 schrieb philip yang:
      <br>
      <blockquote type="cite">
        <br>
        On 2021-11-19 5:59 p.m., Felix Kuehling wrote:
        <br>
        <br>
        <blockquote type="cite">On 2021-11-19 3:22 p.m., Philip Yang
          wrote:
          <br>
          <blockquote type="cite">IH ring1 is used to process GPU retry
            fault, overflow is enabled to
            <br>
            drain retry fault because we want receive other interrupts
            while
            <br>
            handling retry fault to recover range. There is no overflow
            flag set
            <br>
            when wptr pass rptr. Use timestamp of rptr and wptr to
            handle overflow
            <br>
            and drain retry fault.
            <br>
            <br>
            Add helper function amdgpu_ih_decode_iv_ts to get 48bit
            timestamp from
            <br>
            IV entry. drain retry fault check timestamp of rptr is
            larger than
            <br>
            timestamp of (checkpoint_wptr - 32).
            <br>
            <br>
            Add function amdgpu_ih_process1 to process IH ring1 until
            timestamp of
            <br>
            rptr is larger then timestamp of (rptr + 32).
            <br>
            <br>
            Signed-off-by: Philip Yang <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>
            <br>
            ---
            <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c  | 98
            +++++++++++++++++++------
            <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h  |  6 +-
            <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c |  2 +-
            <br>
              drivers/gpu/drm/amd/amdkfd/kfd_svm.c    |  2 +-
            <br>
              4 files changed, 80 insertions(+), 28 deletions(-)
            <br>
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
            <br>
            index f3d62e196901..ad12f9d5d86a 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
            <br>
            @@ -165,51 +165,41 @@ void amdgpu_ih_ring_write(struct
            amdgpu_ih_ring *ih, const uint32_t *iv,
            <br>
              }
            <br>
                /* Waiter helper that checks current rptr matches or
            passes checkpoint wptr */
            <br>
            -static bool amdgpu_ih_has_checkpoint_processed(struct
            amdgpu_device *adev,
            <br>
            +static bool amdgpu_ih_has_checkpoint_processed_ts(struct
            amdgpu_device *adev,
            <br>
                                  struct amdgpu_ih_ring *ih,
            <br>
            -                    uint32_t checkpoint_wptr,
            <br>
            -                    uint32_t *prev_rptr)
            <br>
            +                    uint64_t checkpoint_ts)
            <br>
              {
            <br>
            -    uint32_t cur_rptr = ih->rptr | (*prev_rptr &
            ~ih->ptr_mask);
            <br>
            -
            <br>
            -    /* rptr has wrapped. */
            <br>
            -    if (cur_rptr < *prev_rptr)
            <br>
            -        cur_rptr += ih->ptr_mask + 1;
            <br>
            -    *prev_rptr = cur_rptr;
            <br>
            -
            <br>
            -    /* check ring is empty to workaround missing wptr
            overflow flag */
            <br>
            -    return cur_rptr >= checkpoint_wptr ||
            <br>
            -           (cur_rptr & ih->ptr_mask) ==
            amdgpu_ih_get_wptr(adev, ih);
            <br>
            +    /* After wakeup, ih->rptr is the entry which is
            being processed, check
            <br>
            +     * the timestamp of previous entry which is processed.
            <br>
            +     */
            <br>
            +    return checkpoint_ts <= amdgpu_ih_decode_iv_ts(ih,
            ih->rptr - 32);
            <br>
          </blockquote>
          <br>
          This assumes a IV size of 32 bytes, which is not true for all
          ASICs. On GFXv8 and older GPUs it's on 16. OTOH, those chips
          don't have a ring1 and may not have a timestamp in the IV at
          all.
          <br>
          <br>
        </blockquote>
        ring1 process is enabled for vega10/20, navi10, not for older
        GPUs, as it is scheduled from self-irq.
        <br>
      </blockquote>
      <br>
      I think we should try to avoid accessing the IH ring buffer
      outside of the handler anyway.
      <br>
      <br>
      So instead of that here we should probably just always update the
      last decoded timestamp after a call to
      amdgpu_ih_decode_iv_helper().
      <br>
    </blockquote>
    Good point, add ih->processed_timstamp in v4 as the last decoded
    timestamp.<br>
    <blockquote type="cite" cite="mid:64e46791-cbe5-1bcb-93e2-be52d2dcce0c@amd.com">
      <br>
      <blockquote type="cite">
        <blockquote type="cite">And I think you need to be better at
          handling when the time stamps wrap. Keep in mind that the
          number of valid bits may vary between ASICs.
          <br>
          <br>
        </blockquote>
        yes, 48bit time stamp will wrap around after 1 year, add
        function amdgpu_ih_ts_after to compare time stamp with wrap
        around case.
        <br>
      </blockquote>
      <br>
      Yeah, I think we need to handle that gracefully.
      <br>
    </blockquote>
    <p>v4 also uses le32_to_cpu to read timestamp to support big endian
      platform.</p>
    <p>Regards,</p>
    <p>Philip<br>
    </p>
    <blockquote type="cite" cite="mid:64e46791-cbe5-1bcb-93e2-be52d2dcce0c@amd.com">
      <br>
      Regards,
      <br>
      Christian.
      <br>
      <br>
      <blockquote type="cite">
        <blockquote type="cite">
          <br>
          <blockquote type="cite">  }
            <br>
                /**
            <br>
            - * amdgpu_ih_wait_on_checkpoint_process - wait to process
            IVs up to checkpoint
            <br>
            + * amdgpu_ih_wait_on_checkpoint_process_ts - wait to
            process IVs up to checkpoint
            <br>
               *
            <br>
               * @adev: amdgpu_device pointer
            <br>
               * @ih: ih ring to process
            <br>
               *
            <br>
               * Used to ensure ring has processed IVs up to the
            checkpoint write pointer.
            <br>
               */
            <br>
            -int amdgpu_ih_wait_on_checkpoint_process(struct
            amdgpu_device *adev,
            <br>
            +int amdgpu_ih_wait_on_checkpoint_process_ts(struct
            amdgpu_device *adev,
            <br>
                                  struct amdgpu_ih_ring *ih)
            <br>
          </blockquote>
          <br>
          If this function is only meant to work on ring1 now, we should
          probably ensure that by checking that ih is really ring1.
          <br>
          <br>
        </blockquote>
        Add dev_WARN_ONCE(adev->dev, ih != &adev->irq.ih1,
        "not ring1")) to ensure this is only for ring1.
        <br>
        <br>
        <blockquote type="cite">Do we need to keep the old solution for
          Vega20, which doesn't reroute interrupts to ring1?
          <br>
          <br>
        </blockquote>
        Vega20 is not changed, because Vega20 retry fault delegate to
        ih_soft, and wptr overflow is not enabled on ih_soft, no stale
        retry fault issue, keep old interrupt handler for ih_soft.
        <br>
        <blockquote type="cite">
          <br>
          <blockquote type="cite">  {
            <br>
            -    uint32_t checkpoint_wptr, rptr;
            <br>
            +    uint32_t checkpoint_wptr;
            <br>
            +    uint64_t checkpoint_ts;
            <br>
                    if (!ih->enabled || adev->shutdown)
            <br>
                      return -ENODEV;
            <br>
                    checkpoint_wptr = amdgpu_ih_get_wptr(adev, ih);
            <br>
            -    /* Order wptr with rptr. */
            <br>
            +    /* Order wptr with ring data. */
            <br>
                  rmb();
            <br>
            -    rptr = READ_ONCE(ih->rptr);
            <br>
            -
            <br>
            -    /* wptr has wrapped. */
            <br>
            -    if (rptr > checkpoint_wptr)
            <br>
            -        checkpoint_wptr += ih->ptr_mask + 1;
            <br>
            +    checkpoint_ts = amdgpu_ih_decode_iv_ts(ih,
            checkpoint_wptr - 32);
            <br>
          </blockquote>
          <br>
          Same as above.
          <br>
        </blockquote>
        done in v3
        <br>
        <blockquote type="cite">
          <br>
          <br>
          <blockquote type="cite">        return
            wait_event_interruptible(ih->wait_process,
            <br>
            -                amdgpu_ih_has_checkpoint_processed(adev,
            ih,
            <br>
            -                        checkpoint_wptr, &rptr));
            <br>
            +                amdgpu_ih_has_checkpoint_processed_ts(adev,
            ih,
            <br>
            +                        checkpoint_ts));
            <br>
              }
            <br>
                /**
            <br>
            @@ -253,6 +243,56 @@ int amdgpu_ih_process(struct
            amdgpu_device *adev, struct amdgpu_ih_ring *ih)
            <br>
                  return IRQ_HANDLED;
            <br>
              }
            <br>
              +/**
            <br>
            + * amdgpu_ih_process1 - interrupt handler work for IH ring1
            <br>
            + *
            <br>
            + * @adev: amdgpu_device pointer
            <br>
            + * @ih: ih ring to process
            <br>
            + *
            <br>
            + * Interrupt handler of IH ring1, walk the IH ring1.
            <br>
            + * Returns irq process return code.
            <br>
            + */
            <br>
            +int amdgpu_ih_process1(struct amdgpu_device *adev, struct
            amdgpu_ih_ring *ih)
            <br>
            +{
            <br>
            +    uint64_t ts, ts_next;
            <br>
            +    unsigned int count;
            <br>
            +    u32 wptr;
            <br>
            +
            <br>
            +    if (!ih->enabled || adev->shutdown)
            <br>
            +        return IRQ_NONE;
            <br>
            +
            <br>
            +    wptr = amdgpu_ih_get_wptr(adev, ih);
            <br>
            +    if (ih->rptr == wptr)
            <br>
            +        return 0;
            <br>
            +
            <br>
            +restart_ih:
            <br>
            +    count = AMDGPU_IH_MAX_NUM_IVS;
            <br>
            +
            <br>
            +    ts = amdgpu_ih_decode_iv_ts(ih, ih->rptr);
            <br>
            +    ts_next = amdgpu_ih_decode_iv_ts(ih, ih->rptr + 32);
            <br>
          </blockquote>
          <br>
          Same as above.
          <br>
          <br>
        </blockquote>
        Done in v3
        <br>
        <blockquote type="cite">
          <br>
          <blockquote type="cite">+    while (ts < ts_next &&
            --count) {
            <br>
            +        amdgpu_irq_dispatch(adev, ih);
            <br>
            +        ih->rptr &= ih->ptr_mask;
            <br>
            +        ts = ts_next;
            <br>
            +        ts_next = amdgpu_ih_decode_iv_ts(ih, ih->rptr +
            32);
            <br>
            +    }
            <br>
            +    /*
            <br>
            +     * Process the last timestamp updated entry or one more
            entry
            <br>
            +     * if count = 0, ts is timestamp of the entry.
            <br>
            +     */
            <br>
            +    amdgpu_irq_dispatch(adev, ih);
            <br>
            +    amdgpu_ih_set_rptr(adev, ih);
            <br>
            +    wake_up_all(&ih->wait_process);
            <br>
            +
            <br>
            +    wptr = amdgpu_ih_get_wptr(adev, ih);
            <br>
            +    /* Order reading of wptr vs. reading of IH ring data */
            <br>
            +    rmb();
            <br>
            +    if (ts < amdgpu_ih_decode_iv_ts(ih, wptr - 32))
            <br>
            +        goto restart_ih;
            <br>
            +
            <br>
            +    return IRQ_HANDLED;
            <br>
            +}
            <br>
            +
            <br>
              /**
            <br>
               * amdgpu_ih_decode_iv_helper - decode an interrupt vector
            <br>
               *
            <br>
            @@ -298,3 +338,13 @@ void amdgpu_ih_decode_iv_helper(struct
            amdgpu_device *adev,
            <br>
                  /* wptr/rptr are in bytes! */
            <br>
                  ih->rptr += 32;
            <br>
              }
            <br>
            +
            <br>
            +uint64_t amdgpu_ih_decode_iv_ts(struct amdgpu_ih_ring *ih,
            u32 rptr)
            <br>
          </blockquote>
          <br>
          This function needs to be in IP-version-specific code. Maybe
          add an offset parameter, that way you can handle different IV
          sizes in different ASIC generations.
          <br>
          <br>
        </blockquote>
        Add decode_iv_ts function interface to amdgpu_ih_function, this
        will be used for different ASICs, to handle different IV size
        and time stamp offset. vega*, navi* ASICs use this as helper
        function.
        <br>
        <br>
        Regards,
        <br>
        <br>
        Philip
        <br>
        <br>
        <blockquote type="cite">
          <br>
          <blockquote type="cite">+{
            <br>
            +    uint32_t index = (rptr & ih->ptr_mask) >>
            2;
            <br>
            +    uint32_t dw1, dw2;
            <br>
            +
            <br>
            +    dw1 = ih->ring[index + 1];
            <br>
            +    dw2 = ih->ring[index + 2];
            <br>
            +    return dw1 | ((u64)(dw2 & 0xffff) << 32);
            <br>
            +}
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
            <br>
            index 0649b59830a5..15e8fe0e5e40 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
            <br>
            @@ -89,10 +89,12 @@ int amdgpu_ih_ring_init(struct
            amdgpu_device *adev, struct amdgpu_ih_ring *ih,
            <br>
              void amdgpu_ih_ring_fini(struct amdgpu_device *adev,
            struct amdgpu_ih_ring *ih);
            <br>
              void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const
            uint32_t *iv,
            <br>
                            unsigned int num_dw);
            <br>
            -int amdgpu_ih_wait_on_checkpoint_process(struct
            amdgpu_device *adev,
            <br>
            -                    struct amdgpu_ih_ring *ih);
            <br>
            +int amdgpu_ih_wait_on_checkpoint_process_ts(struct
            amdgpu_device *adev,
            <br>
            +                        struct amdgpu_ih_ring *ih);
            <br>
              int amdgpu_ih_process(struct amdgpu_device *adev, struct
            amdgpu_ih_ring *ih);
            <br>
            +int amdgpu_ih_process1(struct amdgpu_device *adev, struct
            amdgpu_ih_ring *ih);
            <br>
              void amdgpu_ih_decode_iv_helper(struct amdgpu_device
            *adev,
            <br>
                              struct amdgpu_ih_ring *ih,
            <br>
                              struct amdgpu_iv_entry *entry);
            <br>
            +uint64_t amdgpu_ih_decode_iv_ts(struct amdgpu_ih_ring *ih,
            u32 rptr);
            <br>
              #endif
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
            <br>
            index e9023687dc9a..891486cca94b 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
            <br>
            @@ -224,7 +224,7 @@ static void amdgpu_irq_handle_ih1(struct
            work_struct *work)
            <br>
                  struct amdgpu_device *adev = container_of(work, struct
            amdgpu_device,
            <br>
                                        irq.ih1_work);
            <br>
              -    amdgpu_ih_process(adev, &adev->irq.ih1);
            <br>
            +    amdgpu_ih_process1(adev, &adev->irq.ih1);
            <br>
              }
            <br>
                /**
            <br>
            diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
            b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
            <br>
            index 88360f23eb61..9e566ec54cf5 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
            <br>
            @@ -1968,7 +1968,7 @@ static void
            svm_range_drain_retry_fault(struct svm_range_list *svms)
            <br>
                        pr_debug("drain retry fault gpu %d svms %p\n",
            i, svms);
            <br>
              -
            amdgpu_ih_wait_on_checkpoint_process(pdd->dev->adev,
            <br>
            +
            amdgpu_ih_wait_on_checkpoint_process_ts(pdd->dev->adev,
            <br>
            &pdd->dev->adev->irq.ih1);
            <br>
                      pr_debug("drain retry fault gpu %d svms 0x%p
            done\n", i, svms);
            <br>
                  }
            <br>
          </blockquote>
        </blockquote>
      </blockquote>
      <br>
    </blockquote>
  </body>
</html>