<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2021-04-21 11:29 a.m., Felix
Kuehling wrote:<br>
</div>
<blockquote type="cite" cite="mid:9ea89549-afb4-ed48-4ff2-0ae740251d59@amd.com">On
2021-04-21 3:55 a.m., Christian König wrote:
<br>
<blockquote type="cite">Am 21.04.21 um 03:20 schrieb Felix
Kuehling:
<br>
<blockquote type="cite">Am 2021-04-20 um 4:21 p.m. schrieb
Philip Yang:
<br>
<blockquote type="cite">Add interface to remove address from
fault filter ring by resetting
<br>
fault ring entry of the fault address timestamp to 0, then
future vm
<br>
fault on the address will be processed to recover.
<br>
<br>
Check fault address from fault ring, add address into fault
ring and
<br>
remove address from fault ring are serialized in same
interrupt deferred
<br>
work, don't have race condition.
<br>
<br>
Signed-off-by: Philip Yang <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>
<br>
---
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 24
++++++++++++++++++++++++
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 2 ++
<br>
2 files changed, 26 insertions(+)
<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
<br>
index c39ed9eb0987..338e45fa66cb 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
<br>
@@ -387,6 +387,30 @@ bool amdgpu_gmc_filter_faults(struct
amdgpu_device *adev, uint64_t addr,
<br>
return false;
<br>
}
<br>
+/**
<br>
+ * amdgpu_gmc_filter_faults_remove - remove address from VM
faults filter
<br>
+ *
<br>
+ * @adev: amdgpu device structure
<br>
+ * @addr: address of the VM fault
<br>
+ * @pasid: PASID of the process causing the fault
<br>
+ *
<br>
+ * Remove the address from fault filter, then future vm
fault on this address
<br>
+ * will pass to retry fault handler to recover.
<br>
+ */
<br>
+void amdgpu_gmc_filter_faults_remove(struct amdgpu_device
*adev, uint64_t addr,
<br>
+ uint16_t pasid)
<br>
+{
<br>
+ struct amdgpu_gmc *gmc = &adev->gmc;
<br>
+
<br>
+ uint64_t key = addr << 4 | pasid;
<br>
+ struct amdgpu_gmc_fault *fault;
<br>
+ uint32_t hash;
<br>
+
<br>
+ hash = hash_64(key, AMDGPU_GMC_FAULT_HASH_ORDER);
<br>
+ fault =
&gmc->fault_ring[gmc->fault_hash[hash].idx];
<br>
</blockquote>
You need to loop over the fault ring to find a fault with the
matching
<br>
key since there may be hash collisions.
<br>
<br>
You also need to make sure you don't break the single link
list of keys
<br>
with the same hash when you remove an entry. I think the
easier way to
<br>
remove an entry without breaking this ring+closed hashing
structure is
<br>
to reset the fault->key rather than the
fault->timestamp.
<br>
<br>
Finally, you need to add locking to the fault ring structure.
Currently
<br>
it's not protected by any locks because only one thread (the
interrupt
<br>
handler) accesses it. Now you have another thread that can
remove
<br>
entries, so you need to protect it with a lock. If you are
handling
<br>
retry faults, you know that the interrupt handler is really a
worker
<br>
thread, so you can use a mutex or a spin-lock, but it doesn't
need to be
<br>
interrupt-safe.
<br>
</blockquote>
<br>
I don't think you need a lock at all.
<br>
<br>
Just using cmpxchg() to update the key should do it.
<br>
<br>
Something like this:
<br>
<br>
hash = hash_64(key, AMDGPU_GMC_FAULT_HASH_ORDER);
<br>
fault =
&gmc->fault_ring[gmc->fault_hash[hash].idx];
<br>
while (fault->timestamp >= stamp) {
<br>
uint64_t tmp;
<br>
<br>
cmpxchg(&fault->key, key, 0);
<br>
</blockquote>
<br>
Good idea. Then we should probably use READ_ONCE and WRITE_ONCE to
access fault->key in other places.
<br>
</blockquote>
<p>I will use spinlock to protect fault ring and fault hash table
access, as atomic_cmpxchg cannot work for 52bit key and VG20 need
access fault ring in interrupt handler and deferred work.</p>
<p>Regards,</p>
<p>Philip<br>
</p>
<blockquote type="cite" cite="mid:9ea89549-afb4-ed48-4ff2-0ae740251d59@amd.com">
<br>
Regards,
<br>
Felix
<br>
<br>
<br>
<blockquote type="cite">
<br>
tmp = fault->timestamp;
<br>
fault = &gmc->fault_ring[fault->next];
<br>
<br>
/* Check if the entry was reused */
<br>
if (fault->timestamp >= tmp)
<br>
break;
<br>
}
<br>
<br>
Regards,
<br>
Christian.
<br>
<br>
<blockquote type="cite">
<br>
Regards,
<br>
Felix
<br>
<br>
<br>
<blockquote type="cite">+ fault->timestamp = 0;
<br>
+}
<br>
+
<br>
int amdgpu_gmc_ras_late_init(struct amdgpu_device *adev)
<br>
{
<br>
int r;
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
<br>
index 9d11c02a3938..498a7a0d5a9e 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
<br>
@@ -318,6 +318,8 @@ void amdgpu_gmc_agp_location(struct
amdgpu_device *adev,
<br>
struct amdgpu_gmc *mc);
<br>
bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev,
uint64_t addr,
<br>
uint16_t pasid, uint64_t timestamp);
<br>
+void amdgpu_gmc_filter_faults_remove(struct amdgpu_device
*adev, uint64_t addr,
<br>
+ uint16_t pasid);
<br>
int amdgpu_gmc_ras_late_init(struct amdgpu_device *adev);
<br>
void amdgpu_gmc_ras_fini(struct amdgpu_device *adev);
<br>
int amdgpu_gmc_allocate_vm_inv_eng(struct amdgpu_device
*adev);
<br>
</blockquote>
_______________________________________________
<br>
amd-gfx mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<br>
<a class="moz-txt-link-freetext" href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cfelix.kuehling%40amd.com%7C7d19870014ff40b6d80b08d9049ae8a0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637545885642357593%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=b0LnO9SQPgYskIRH0vCjKebvh%2FYzFfidRne15q2WIXw%3D&reserved=0">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cfelix.kuehling%40amd.com%7C7d19870014ff40b6d80b08d9049ae8a0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637545885642357593%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=b0LnO9SQPgYskIRH0vCjKebvh%2FYzFfidRne15q2WIXw%3D&reserved=0</a>
<br>
</blockquote>
<br>
</blockquote>
</blockquote>
</body>
</html>