<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2021-04-28 5:00 a.m., Christian
      König wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:47a28337-1abe-4f94-b96d-3ebba32a82aa@gmail.com">
      <br>
      <br>
      Am 28.04.21 um 09:53 schrieb Felix Kuehling:
      <br>
      <blockquote type="cite">Am 2021-04-28 um 2:54 a.m. schrieb
        Christian König:
        <br>
        <blockquote type="cite">Am 27.04.21 um 20:21 schrieb Felix
          Kuehling:
          <br>
          <blockquote type="cite">On 2021-04-27 10:51 a.m., Philip Yang
            wrote:
            <br>
            <blockquote type="cite">Add interface to remove address from
              fault filter ring by resetting
              <br>
              fault ring entry key, then future vm fault on the address
              will be
              <br>
              processed to recover.
              <br>
              <br>
              Define fault key as atomic64_t type to use atomic
              read/set/cmpxchg key
              <br>
              to protect fault ring access by interrupt handler and
              interrupt
              <br>
              deferred
              <br>
              work for vg20. Change fault->timestamp to 48-bit to
              share same uint64_t
              <br>
              with 8-bit fault->next, it is enough for 48bit IH
              timestamp.
              <br>
              <br>
              Signed-off-by: Philip Yang <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>
              <br>
              ---
              <br>
                 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 48
              <br>
              ++++++++++++++++++++++---
              <br>
                 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  6 ++--
              <br>
                 2 files changed, 48 insertions(+), 6 deletions(-)
              <br>
              <br>
              diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
              <br>
              b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
              <br>
              index c39ed9eb0987..a2e81e913abb 100644
              <br>
              --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
              <br>
              +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
              <br>
              @@ -332,6 +332,17 @@ void amdgpu_gmc_agp_location(struct
              <br>
              amdgpu_device *adev, struct amdgpu_gmc *mc)
              <br>
                             mc->agp_size >> 20,
              mc->agp_start, mc->agp_end);
              <br>
                 }
              <br>
                 +/**
              <br>
              + * amdgpu_gmc_fault_key - get hask key from vm fault
              address and pasid
              <br>
              + *
              <br>
              + * @addr: 48bit physical address
              <br>
              + * @pasid: 4 bit
              <br>
            </blockquote>
            This comment is misleading. The PASID is not 4-bit. It's
            16-bit. But
            <br>
            shifting the address by 4 bit is sufficient because the
            address is
            <br>
            page-aligned, which means the low 12 bits are 0. So this
            would be
            <br>
            more accurate:
            <br>
            <br>
            @addr: 48 bit physical address, page aligned (36 significant
            bits)
            <br>
            @pasid: 16 bit process address space identifier
            <br>
            <br>
            With that fixed, the patch is
            <br>
            <br>
            Reviewed-by: Felix Kuehling <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>
            <br>
            <br>
            <br>
            <blockquote type="cite">+ */
              <br>
              +static inline uint64_t amdgpu_gmc_fault_key(uint64_t
              addr, uint16_t
              <br>
              pasid)
              <br>
              +{
              <br>
              +    return addr << 4 | pasid;
              <br>
              +}
              <br>
            </blockquote>
          </blockquote>
          Maybe changing this so that we have "addr * ((1 << 16) /
          <br>
          AMDGPU_PAGE_SIZE) | pasid" would help to better document that?
          <br>
        </blockquote>
        I find this a mix of multiplication, division and bit-shift more
        <br>
        confusing. How about "addr << (16 - AMDGPU_GPU_PAGE_SHIFT)
        | pasid"?
        <br>
      </blockquote>
      <br>
      Yeah, that is even better.
      <br>
    </blockquote>
    <p>I have pushed the patch yesterday, didn't pick this, should wait
      longer before push in future.</p>
    <p>Thanks,</p>
    <p>Philip<br>
    </p>
    <blockquote type="cite" cite="mid:47a28337-1abe-4f94-b96d-3ebba32a82aa@gmail.com">
      <br>
      Regards,
      <br>
      Christian.
      <br>
      <br>
      <blockquote type="cite">
        <br>
        Regards,
        <br>
           Felix
        <br>
        <br>
        <br>
        <blockquote type="cite">Christian.
          <br>
          <br>
          <blockquote type="cite">
            <blockquote type="cite">+
              <br>
                 /**
              <br>
                  * amdgpu_gmc_filter_faults - filter VM faults
              <br>
                  *
              <br>
              @@ -348,8 +359,7 @@ bool amdgpu_gmc_filter_faults(struct
              <br>
              amdgpu_device *adev, uint64_t addr,
              <br>
                                   uint16_t pasid, uint64_t timestamp)
              <br>
                 {
              <br>
                     struct amdgpu_gmc *gmc = &adev->gmc;
              <br>
              -
              <br>
              -    uint64_t stamp, key = addr << 4 | pasid;
              <br>
              +    uint64_t stamp, key = amdgpu_gmc_fault_key(addr,
              pasid);
              <br>
                     struct amdgpu_gmc_fault *fault;
              <br>
                     uint32_t hash;
              <br>
                 @@ -365,7 +375,7 @@ bool
              amdgpu_gmc_filter_faults(struct
              <br>
              amdgpu_device *adev, uint64_t addr,
              <br>
                     while (fault->timestamp >= stamp) {
              <br>
                         uint64_t tmp;
              <br>
                 -        if (fault->key == key)
              <br>
              +        if (atomic64_read(&fault->key) == key)
              <br>
                             return true;
              <br>
                           tmp = fault->timestamp;
              <br>
              @@ -378,7 +388,7 @@ bool amdgpu_gmc_filter_faults(struct
              <br>
              amdgpu_device *adev, uint64_t addr,
              <br>
                       /* Add the fault to the ring */
              <br>
                     fault =
              &gmc->fault_ring[gmc->last_fault];
              <br>
              -    fault->key = key;
              <br>
              +    atomic64_set(&fault->key, key);
              <br>
                     fault->timestamp = timestamp;
              <br>
                       /* And update the hash */
              <br>
              @@ -387,6 +397,36 @@ bool amdgpu_gmc_filter_faults(struct
              <br>
              amdgpu_device *adev, uint64_t addr,
              <br>
                     return false;
              <br>
                 }
              <br>
                 +/**
              <br>
              + * amdgpu_gmc_filter_faults_remove - remove address from
              VM faults
              <br>
              filter
              <br>
              + *
              <br>
              + * @adev: amdgpu device structure
              <br>
              + * @addr: address of the VM fault
              <br>
              + * @pasid: PASID of the process causing the fault
              <br>
              + *
              <br>
              + * Remove the address from fault filter, then future vm
              fault on
              <br>
              this address
              <br>
              + * will pass to retry fault handler to recover.
              <br>
              + */
              <br>
              +void amdgpu_gmc_filter_faults_remove(struct amdgpu_device
              *adev,
              <br>
              uint64_t addr,
              <br>
              +                     uint16_t pasid)
              <br>
              +{
              <br>
              +    struct amdgpu_gmc *gmc = &adev->gmc;
              <br>
              +    uint64_t key = amdgpu_gmc_fault_key(addr, pasid);
              <br>
              +    struct amdgpu_gmc_fault *fault;
              <br>
              +    uint32_t hash;
              <br>
              +    uint64_t tmp;
              <br>
              +
              <br>
              +    hash = hash_64(key, AMDGPU_GMC_FAULT_HASH_ORDER);
              <br>
              +    fault =
              &gmc->fault_ring[gmc->fault_hash[hash].idx];
              <br>
              +    do {
              <br>
              +        if (atomic64_cmpxchg(&fault->key, key, 0)
              == key)
              <br>
              +            break;
              <br>
              +
              <br>
              +        tmp = fault->timestamp;
              <br>
              +        fault = &gmc->fault_ring[fault->next];
              <br>
              +    } while (fault->timestamp < tmp);
              <br>
              +}
              <br>
              +
              <br>
                 int amdgpu_gmc_ras_late_init(struct amdgpu_device
              *adev)
              <br>
                 {
              <br>
                     int r;
              <br>
              diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
              <br>
              b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
              <br>
              index 9d11c02a3938..6aa1d52d3aee 100644
              <br>
              --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
              <br>
              +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
              <br>
              @@ -66,9 +66,9 @@ struct firmware;
              <br>
                  * GMC page fault information
              <br>
                  */
              <br>
                 struct amdgpu_gmc_fault {
              <br>
              -    uint64_t    timestamp;
              <br>
              +    uint64_t    timestamp:48;
              <br>
                     uint64_t    next:AMDGPU_GMC_FAULT_RING_ORDER;
              <br>
              -    uint64_t    key:52;
              <br>
              +    atomic64_t    key;
              <br>
                 };
              <br>
                   /*
              <br>
              @@ -318,6 +318,8 @@ void amdgpu_gmc_agp_location(struct
              <br>
              amdgpu_device *adev,
              <br>
                                  struct amdgpu_gmc *mc);
              <br>
                 bool amdgpu_gmc_filter_faults(struct amdgpu_device
              *adev, uint64_t
              <br>
              addr,
              <br>
                                   uint16_t pasid, uint64_t timestamp);
              <br>
              +void amdgpu_gmc_filter_faults_remove(struct amdgpu_device
              *adev,
              <br>
              uint64_t addr,
              <br>
              +                     uint16_t pasid);
              <br>
                 int amdgpu_gmc_ras_late_init(struct amdgpu_device
              *adev);
              <br>
                 void amdgpu_gmc_ras_fini(struct amdgpu_device *adev);
              <br>
                 int amdgpu_gmc_allocate_vm_inv_eng(struct amdgpu_device
              *adev);
              <br>
            </blockquote>
            _______________________________________________
            <br>
            amd-gfx mailing list
            <br>
            <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
            <br>
<a class="moz-txt-link-freetext" href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CPhilip.Yang%40amd.com%7C2299a55733334713a8cd08d90a241de7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637551972517063081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=s20dz4IfFjBDvMJeq1ozZdr3MvE7v7HrxOoqhRInLfg%3D&amp;reserved=0">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CPhilip.Yang%40amd.com%7C2299a55733334713a8cd08d90a241de7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637551972517063081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=s20dz4IfFjBDvMJeq1ozZdr3MvE7v7HrxOoqhRInLfg%3D&amp;reserved=0</a>
            <br>
          </blockquote>
        </blockquote>
      </blockquote>
      <br>
    </blockquote>
  </body>
</html>