<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2021-11-25 2:11 a.m., Christian
      König wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:845473a5-7c90-06ec-fb24-5e8c640aab0f@amd.com">Am
      25.11.21 um 02:56 schrieb Felix Kuehling:
      <br>
      <blockquote type="cite">Am 2021-11-24 um 5:58 p.m. schrieb Philip
        Yang:
        <br>
        [SNIP]
        <br>
        <blockquote type="cite">  #define amdgpu_ih_get_wptr(adev, ih)
          (adev)->irq.ih_funcs->get_wptr((adev), (ih))
          <br>
            #define amdgpu_ih_decode_iv(adev, iv) \
          <br>
                (adev)->irq.ih_funcs->decode_iv((adev), (ih),
          (iv))
          <br>
          +#define amdgpu_ih_decode_iv_ts(adev, ih, rptr, offset) \
          <br>
          +    (WARN_ON_ONCE(!(adev)->irq.ih_funcs->decode_iv_ts)
          ? 0 : \
          <br>
          +    (adev)->irq.ih_funcs->decode_iv_ts((ih), (rptr),
          (offset)))
          <br>
            #define amdgpu_ih_set_rptr(adev, ih)
          (adev)->irq.ih_funcs->set_rptr((adev), (ih))
          <br>
              int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct
          amdgpu_ih_ring *ih,
          <br>
          @@ -89,10 +99,12 @@ int amdgpu_ih_ring_init(struct
          amdgpu_device *adev, struct amdgpu_ih_ring *ih,
          <br>
            void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct
          amdgpu_ih_ring *ih);
          <br>
            void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const
          uint32_t *iv,
          <br>
                          unsigned int num_dw);
          <br>
          -int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device
          *adev,
          <br>
          -                    struct amdgpu_ih_ring *ih);
          <br>
          +int amdgpu_ih_wait_on_checkpoint_process_ts(struct
          amdgpu_device *adev,
          <br>
          +                        struct amdgpu_ih_ring *ih);
          <br>
            int amdgpu_ih_process(struct amdgpu_device *adev, struct
          amdgpu_ih_ring *ih);
          <br>
            void amdgpu_ih_decode_iv_helper(struct amdgpu_device *adev,
          <br>
                            struct amdgpu_ih_ring *ih,
          <br>
                            struct amdgpu_iv_entry *entry);
          <br>
          +uint64_t amdgpu_ih_decode_iv_ts_helper(struct amdgpu_ih_ring
          *ih, u32 rptr,
          <br>
          +                       signed int offset);
          <br>
            #endif
          <br>
          diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
          b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
          <br>
          index 3ec5ff5a6dbe..b129898db433 100644
          <br>
          --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
          <br>
          +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
          <br>
          @@ -119,6 +119,11 @@ static int
          gmc_v10_0_process_interrupt(struct amdgpu_device *adev,
          <br>
                        return 1;
          <br>
                    }
          <br>
            +        /* Stale retry fault if timestamp goes backward */
          <br>
          +        if (entry->ih == &adev->irq.ih1 &&
          <br>
          +            amdgpu_ih_ts_after(entry->timestamp,
          entry->ih->processed_timestamp))
          <br>
          +            return 1;
          <br>
          +
          <br>
        </blockquote>
        This check should go before amdgpu_gmc_filter_faults. Otherwise
        <br>
        amdgpu_gmc_filter_faults may later drop a real fault because it
        added a
        <br>
        stale fault in its hash table.
        <br>
      </blockquote>
      <br>
      I was already wondering if we shouldn't move all of this
      completely into amdgpu_gmc_filter_faults().
      <br>
      <br>
      I mean essentially we are filtering faults here once more, just
      based on a different criteria.
      <br>
    </blockquote>
    <p>It is good idea, it also removes duplicate code in different
      interrupt handler. And retry fault timestamp check for both ring0
      and ring1.<br>
    </p>
    <p>Thanks,</p>
    <p>Philip<br>
    </p>
    <blockquote type="cite" cite="mid:845473a5-7c90-06ec-fb24-5e8c640aab0f@amd.com">
      <br>
      Regards,
      <br>
      Christian.
      <br>
      <br>
    </blockquote>
  </body>
</html>