<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2021-04-28 5:00 a.m., Christian
König wrote:<br>
</div>
<blockquote type="cite" cite="mid:47a28337-1abe-4f94-b96d-3ebba32a82aa@gmail.com">
<br>
<br>
Am 28.04.21 um 09:53 schrieb Felix Kuehling:
<br>
<blockquote type="cite">Am 2021-04-28 um 2:54 a.m. schrieb
Christian König:
<br>
<blockquote type="cite">Am 27.04.21 um 20:21 schrieb Felix
Kuehling:
<br>
<blockquote type="cite">On 2021-04-27 10:51 a.m., Philip Yang
wrote:
<br>
<blockquote type="cite">Add interface to remove address from
fault filter ring by resetting
<br>
fault ring entry key, then future vm fault on the address
will be
<br>
processed to recover.
<br>
<br>
Define fault key as atomic64_t type to use atomic
read/set/cmpxchg key
<br>
to protect fault ring access by interrupt handler and
interrupt
<br>
deferred
<br>
work for vg20. Change fault->timestamp to 48-bit to
share same uint64_t
<br>
with 8-bit fault->next, it is enough for 48bit IH
timestamp.
<br>
<br>
Signed-off-by: Philip Yang <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>
<br>
---
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 48
<br>
++++++++++++++++++++++---
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 6 ++--
<br>
2 files changed, 48 insertions(+), 6 deletions(-)
<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
<br>
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
<br>
index c39ed9eb0987..a2e81e913abb 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
<br>
@@ -332,6 +332,17 @@ void amdgpu_gmc_agp_location(struct
<br>
amdgpu_device *adev, struct amdgpu_gmc *mc)
<br>
mc->agp_size >> 20,
mc->agp_start, mc->agp_end);
<br>
}
<br>
+/**
<br>
+ * amdgpu_gmc_fault_key - get hask key from vm fault
address and pasid
<br>
+ *
<br>
+ * @addr: 48bit physical address
<br>
+ * @pasid: 4 bit
<br>
</blockquote>
This comment is misleading. The PASID is not 4-bit. It's
16-bit. But
<br>
shifting the address by 4 bit is sufficient because the
address is
<br>
page-aligned, which means the low 12 bits are 0. So this
would be
<br>
more accurate:
<br>
<br>
@addr: 48 bit physical address, page aligned (36 significant
bits)
<br>
@pasid: 16 bit process address space identifier
<br>
<br>
With that fixed, the patch is
<br>
<br>
Reviewed-by: Felix Kuehling <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>
<br>
<br>
<br>
<blockquote type="cite">+ */
<br>
+static inline uint64_t amdgpu_gmc_fault_key(uint64_t
addr, uint16_t
<br>
pasid)
<br>
+{
<br>
+ return addr << 4 | pasid;
<br>
+}
<br>
</blockquote>
</blockquote>
Maybe changing this so that we have "addr * ((1 << 16) /
<br>
AMDGPU_PAGE_SIZE) | pasid" would help to better document that?
<br>
</blockquote>
I find this a mix of multiplication, division and bit-shift more
<br>
confusing. How about "addr << (16 - AMDGPU_GPU_PAGE_SHIFT)
| pasid"?
<br>
</blockquote>
<br>
Yeah, that is even better.
<br>
</blockquote>
<p>I have pushed the patch yesterday, didn't pick this, should wait
longer before push in future.</p>
<p>Thanks,</p>
<p>Philip<br>
</p>
<blockquote type="cite" cite="mid:47a28337-1abe-4f94-b96d-3ebba32a82aa@gmail.com">
<br>
Regards,
<br>
Christian.
<br>
<br>
<blockquote type="cite">
<br>
Regards,
<br>
Felix
<br>
<br>
<br>
<blockquote type="cite">Christian.
<br>
<br>
<blockquote type="cite">
<blockquote type="cite">+
<br>
/**
<br>
* amdgpu_gmc_filter_faults - filter VM faults
<br>
*
<br>
@@ -348,8 +359,7 @@ bool amdgpu_gmc_filter_faults(struct
<br>
amdgpu_device *adev, uint64_t addr,
<br>
uint16_t pasid, uint64_t timestamp)
<br>
{
<br>
struct amdgpu_gmc *gmc = &adev->gmc;
<br>
-
<br>
- uint64_t stamp, key = addr << 4 | pasid;
<br>
+ uint64_t stamp, key = amdgpu_gmc_fault_key(addr,
pasid);
<br>
struct amdgpu_gmc_fault *fault;
<br>
uint32_t hash;
<br>
@@ -365,7 +375,7 @@ bool
amdgpu_gmc_filter_faults(struct
<br>
amdgpu_device *adev, uint64_t addr,
<br>
while (fault->timestamp >= stamp) {
<br>
uint64_t tmp;
<br>
- if (fault->key == key)
<br>
+ if (atomic64_read(&fault->key) == key)
<br>
return true;
<br>
tmp = fault->timestamp;
<br>
@@ -378,7 +388,7 @@ bool amdgpu_gmc_filter_faults(struct
<br>
amdgpu_device *adev, uint64_t addr,
<br>
/* Add the fault to the ring */
<br>
fault =
&gmc->fault_ring[gmc->last_fault];
<br>
- fault->key = key;
<br>
+ atomic64_set(&fault->key, key);
<br>
fault->timestamp = timestamp;
<br>
/* And update the hash */
<br>
@@ -387,6 +397,36 @@ bool amdgpu_gmc_filter_faults(struct
<br>
amdgpu_device *adev, uint64_t addr,
<br>
return false;
<br>
}
<br>
+/**
<br>
+ * amdgpu_gmc_filter_faults_remove - remove address from
VM faults
<br>
filter
<br>
+ *
<br>
+ * @adev: amdgpu device structure
<br>
+ * @addr: address of the VM fault
<br>
+ * @pasid: PASID of the process causing the fault
<br>
+ *
<br>
+ * Remove the address from fault filter, then future vm
fault on
<br>
this address
<br>
+ * will pass to retry fault handler to recover.
<br>
+ */
<br>
+void amdgpu_gmc_filter_faults_remove(struct amdgpu_device
*adev,
<br>
uint64_t addr,
<br>
+ uint16_t pasid)
<br>
+{
<br>
+ struct amdgpu_gmc *gmc = &adev->gmc;
<br>
+ uint64_t key = amdgpu_gmc_fault_key(addr, pasid);
<br>
+ struct amdgpu_gmc_fault *fault;
<br>
+ uint32_t hash;
<br>
+ uint64_t tmp;
<br>
+
<br>
+ hash = hash_64(key, AMDGPU_GMC_FAULT_HASH_ORDER);
<br>
+ fault =
&gmc->fault_ring[gmc->fault_hash[hash].idx];
<br>
+ do {
<br>
+ if (atomic64_cmpxchg(&fault->key, key, 0)
== key)
<br>
+ break;
<br>
+
<br>
+ tmp = fault->timestamp;
<br>
+ fault = &gmc->fault_ring[fault->next];
<br>
+ } while (fault->timestamp < tmp);
<br>
+}
<br>
+
<br>
int amdgpu_gmc_ras_late_init(struct amdgpu_device
*adev)
<br>
{
<br>
int r;
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
<br>
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
<br>
index 9d11c02a3938..6aa1d52d3aee 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
<br>
@@ -66,9 +66,9 @@ struct firmware;
<br>
* GMC page fault information
<br>
*/
<br>
struct amdgpu_gmc_fault {
<br>
- uint64_t timestamp;
<br>
+ uint64_t timestamp:48;
<br>
uint64_t next:AMDGPU_GMC_FAULT_RING_ORDER;
<br>
- uint64_t key:52;
<br>
+ atomic64_t key;
<br>
};
<br>
/*
<br>
@@ -318,6 +318,8 @@ void amdgpu_gmc_agp_location(struct
<br>
amdgpu_device *adev,
<br>
struct amdgpu_gmc *mc);
<br>
bool amdgpu_gmc_filter_faults(struct amdgpu_device
*adev, uint64_t
<br>
addr,
<br>
uint16_t pasid, uint64_t timestamp);
<br>
+void amdgpu_gmc_filter_faults_remove(struct amdgpu_device
*adev,
<br>
uint64_t addr,
<br>
+ uint16_t pasid);
<br>
int amdgpu_gmc_ras_late_init(struct amdgpu_device
*adev);
<br>
void amdgpu_gmc_ras_fini(struct amdgpu_device *adev);
<br>
int amdgpu_gmc_allocate_vm_inv_eng(struct amdgpu_device
*adev);
<br>
</blockquote>
_______________________________________________
<br>
amd-gfx mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<br>
<a class="moz-txt-link-freetext" href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CPhilip.Yang%40amd.com%7C2299a55733334713a8cd08d90a241de7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637551972517063081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=s20dz4IfFjBDvMJeq1ozZdr3MvE7v7HrxOoqhRInLfg%3D&reserved=0">https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CPhilip.Yang%40amd.com%7C2299a55733334713a8cd08d90a241de7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637551972517063081%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=s20dz4IfFjBDvMJeq1ozZdr3MvE7v7HrxOoqhRInLfg%3D&reserved=0</a>
<br>
</blockquote>
</blockquote>
</blockquote>
<br>
</blockquote>
</body>
</html>