<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2021-11-25 2:11 a.m., Christian
König wrote:<br>
</div>
<blockquote type="cite" cite="mid:845473a5-7c90-06ec-fb24-5e8c640aab0f@amd.com">Am
25.11.21 um 02:56 schrieb Felix Kuehling:
<br>
<blockquote type="cite">Am 2021-11-24 um 5:58 p.m. schrieb Philip
Yang:
<br>
[SNIP]
<br>
<blockquote type="cite"> #define amdgpu_ih_get_wptr(adev, ih)
(adev)->irq.ih_funcs->get_wptr((adev), (ih))
<br>
#define amdgpu_ih_decode_iv(adev, iv) \
<br>
(adev)->irq.ih_funcs->decode_iv((adev), (ih),
(iv))
<br>
+#define amdgpu_ih_decode_iv_ts(adev, ih, rptr, offset) \
<br>
+ (WARN_ON_ONCE(!(adev)->irq.ih_funcs->decode_iv_ts)
? 0 : \
<br>
+ (adev)->irq.ih_funcs->decode_iv_ts((ih), (rptr),
(offset)))
<br>
#define amdgpu_ih_set_rptr(adev, ih)
(adev)->irq.ih_funcs->set_rptr((adev), (ih))
<br>
int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct
amdgpu_ih_ring *ih,
<br>
@@ -89,10 +99,12 @@ int amdgpu_ih_ring_init(struct
amdgpu_device *adev, struct amdgpu_ih_ring *ih,
<br>
void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct
amdgpu_ih_ring *ih);
<br>
void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const
uint32_t *iv,
<br>
unsigned int num_dw);
<br>
-int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device
*adev,
<br>
- struct amdgpu_ih_ring *ih);
<br>
+int amdgpu_ih_wait_on_checkpoint_process_ts(struct
amdgpu_device *adev,
<br>
+ struct amdgpu_ih_ring *ih);
<br>
int amdgpu_ih_process(struct amdgpu_device *adev, struct
amdgpu_ih_ring *ih);
<br>
void amdgpu_ih_decode_iv_helper(struct amdgpu_device *adev,
<br>
struct amdgpu_ih_ring *ih,
<br>
struct amdgpu_iv_entry *entry);
<br>
+uint64_t amdgpu_ih_decode_iv_ts_helper(struct amdgpu_ih_ring
*ih, u32 rptr,
<br>
+ signed int offset);
<br>
#endif
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
<br>
index 3ec5ff5a6dbe..b129898db433 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
<br>
@@ -119,6 +119,11 @@ static int
gmc_v10_0_process_interrupt(struct amdgpu_device *adev,
<br>
return 1;
<br>
}
<br>
+ /* Stale retry fault if timestamp goes backward */
<br>
+ if (entry->ih == &adev->irq.ih1 &&
<br>
+ amdgpu_ih_ts_after(entry->timestamp,
entry->ih->processed_timestamp))
<br>
+ return 1;
<br>
+
<br>
</blockquote>
This check should go before amdgpu_gmc_filter_faults. Otherwise
<br>
amdgpu_gmc_filter_faults may later drop a real fault because it
added a
<br>
stale fault in its hash table.
<br>
</blockquote>
<br>
I was already wondering if we shouldn't move all of this
completely into amdgpu_gmc_filter_faults().
<br>
<br>
I mean essentially we are filtering faults here once more, just
based on a different criteria.
<br>
</blockquote>
<p>It is good idea, it also removes duplicate code in different
interrupt handler. And retry fault timestamp check for both ring0
and ring1.<br>
</p>
<p>Thanks,</p>
<p>Philip<br>
</p>
<blockquote type="cite" cite="mid:845473a5-7c90-06ec-fb24-5e8c640aab0f@amd.com">
<br>
Regards,
<br>
Christian.
<br>
<br>
</blockquote>
</body>
</html>