<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2021-11-19 5:59 p.m., Felix Kuehling
wrote:<br>
</div>
<blockquote type="cite" cite="mid:2a610519-689f-c3d0-3675-59f7c64eaf4f@amd.com">On
2021-11-19 3:22 p.m., Philip Yang wrote:
<br>
<blockquote type="cite">IH ring1 is used to process GPU retry
fault, overflow is enabled to
<br>
drain retry fault because we want receive other interrupts while
<br>
handling retry fault to recover range. There is no overflow flag
set
<br>
when wptr pass rptr. Use timestamp of rptr and wptr to handle
overflow
<br>
and drain retry fault.
<br>
<br>
Add helper function amdgpu_ih_decode_iv_ts to get 48bit
timestamp from
<br>
IV entry. drain retry fault check timestamp of rptr is larger
than
<br>
timestamp of (checkpoint_wptr - 32).
<br>
<br>
Add function amdgpu_ih_process1 to process IH ring1 until
timestamp of
<br>
rptr is larger then timestamp of (rptr + 32).
<br>
<br>
Signed-off-by: Philip Yang <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>
<br>
---
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 98
+++++++++++++++++++------
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 6 +-
<br>
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +-
<br>
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
<br>
4 files changed, 80 insertions(+), 28 deletions(-)
<br>
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
<br>
index f3d62e196901..ad12f9d5d86a 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
<br>
@@ -165,51 +165,41 @@ void amdgpu_ih_ring_write(struct
amdgpu_ih_ring *ih, const uint32_t *iv,
<br>
}
<br>
/* Waiter helper that checks current rptr matches or passes
checkpoint wptr */
<br>
-static bool amdgpu_ih_has_checkpoint_processed(struct
amdgpu_device *adev,
<br>
+static bool amdgpu_ih_has_checkpoint_processed_ts(struct
amdgpu_device *adev,
<br>
struct amdgpu_ih_ring *ih,
<br>
- uint32_t checkpoint_wptr,
<br>
- uint32_t *prev_rptr)
<br>
+ uint64_t checkpoint_ts)
<br>
{
<br>
- uint32_t cur_rptr = ih->rptr | (*prev_rptr &
~ih->ptr_mask);
<br>
-
<br>
- /* rptr has wrapped. */
<br>
- if (cur_rptr < *prev_rptr)
<br>
- cur_rptr += ih->ptr_mask + 1;
<br>
- *prev_rptr = cur_rptr;
<br>
-
<br>
- /* check ring is empty to workaround missing wptr overflow
flag */
<br>
- return cur_rptr >= checkpoint_wptr ||
<br>
- (cur_rptr & ih->ptr_mask) ==
amdgpu_ih_get_wptr(adev, ih);
<br>
+ /* After wakeup, ih->rptr is the entry which is being
processed, check
<br>
+ * the timestamp of previous entry which is processed.
<br>
+ */
<br>
+ return checkpoint_ts <= amdgpu_ih_decode_iv_ts(ih,
ih->rptr - 32);
<br>
</blockquote>
<br>
This assumes a IV size of 32 bytes, which is not true for all
ASICs. On GFXv8 and older GPUs it's on 16. OTOH, those chips don't
have a ring1 and may not have a timestamp in the IV at all.
<br>
<br>
</blockquote>
ring1 process is enabled for vega10/20, navi10, not for older GPUs,
as it is scheduled from self-irq.<br>
<blockquote type="cite" cite="mid:2a610519-689f-c3d0-3675-59f7c64eaf4f@amd.com">And I
think you need to be better at handling when the time stamps wrap.
Keep in mind that the number of valid bits may vary between ASICs.
<br>
<br>
</blockquote>
yes, 48bit time stamp will wrap around after 1 year, add function
amdgpu_ih_ts_after to compare time stamp with wrap around case.<br>
<blockquote type="cite" cite="mid:2a610519-689f-c3d0-3675-59f7c64eaf4f@amd.com">
<br>
<blockquote type="cite"> }
<br>
/**
<br>
- * amdgpu_ih_wait_on_checkpoint_process - wait to process IVs
up to checkpoint
<br>
+ * amdgpu_ih_wait_on_checkpoint_process_ts - wait to process
IVs up to checkpoint
<br>
*
<br>
* @adev: amdgpu_device pointer
<br>
* @ih: ih ring to process
<br>
*
<br>
* Used to ensure ring has processed IVs up to the checkpoint
write pointer.
<br>
*/
<br>
-int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device
*adev,
<br>
+int amdgpu_ih_wait_on_checkpoint_process_ts(struct
amdgpu_device *adev,
<br>
struct amdgpu_ih_ring *ih)
<br>
</blockquote>
<br>
If this function is only meant to work on ring1 now, we should
probably ensure that by checking that ih is really ring1.
<br>
<br>
</blockquote>
Add dev_WARN_ONCE(adev->dev, ih != &adev->irq.ih1, "not
ring1")) to ensure this is only for ring1.<br>
<br>
<blockquote type="cite" cite="mid:2a610519-689f-c3d0-3675-59f7c64eaf4f@amd.com">Do we need
to keep the old solution for Vega20, which doesn't reroute
interrupts to ring1?
<br>
<br>
</blockquote>
Vega20 is not changed, because Vega20 retry fault delegate to
ih_soft, and wptr overflow is not enabled on ih_soft, no stale retry
fault issue, keep old interrupt handler for ih_soft.<br>
<blockquote type="cite" cite="mid:2a610519-689f-c3d0-3675-59f7c64eaf4f@amd.com">
<br>
<blockquote type="cite"> {
<br>
- uint32_t checkpoint_wptr, rptr;
<br>
+ uint32_t checkpoint_wptr;
<br>
+ uint64_t checkpoint_ts;
<br>
if (!ih->enabled || adev->shutdown)
<br>
return -ENODEV;
<br>
checkpoint_wptr = amdgpu_ih_get_wptr(adev, ih);
<br>
- /* Order wptr with rptr. */
<br>
+ /* Order wptr with ring data. */
<br>
rmb();
<br>
- rptr = READ_ONCE(ih->rptr);
<br>
-
<br>
- /* wptr has wrapped. */
<br>
- if (rptr > checkpoint_wptr)
<br>
- checkpoint_wptr += ih->ptr_mask + 1;
<br>
+ checkpoint_ts = amdgpu_ih_decode_iv_ts(ih, checkpoint_wptr
- 32);
<br>
</blockquote>
<br>
Same as above.
<br>
</blockquote>
done in v3<br>
<blockquote type="cite" cite="mid:2a610519-689f-c3d0-3675-59f7c64eaf4f@amd.com">
<br>
<br>
<blockquote type="cite"> return
wait_event_interruptible(ih->wait_process,
<br>
- amdgpu_ih_has_checkpoint_processed(adev, ih,
<br>
- checkpoint_wptr, &rptr));
<br>
+ amdgpu_ih_has_checkpoint_processed_ts(adev, ih,
<br>
+ checkpoint_ts));
<br>
}
<br>
/**
<br>
@@ -253,6 +243,56 @@ int amdgpu_ih_process(struct amdgpu_device
*adev, struct amdgpu_ih_ring *ih)
<br>
return IRQ_HANDLED;
<br>
}
<br>
+/**
<br>
+ * amdgpu_ih_process1 - interrupt handler work for IH ring1
<br>
+ *
<br>
+ * @adev: amdgpu_device pointer
<br>
+ * @ih: ih ring to process
<br>
+ *
<br>
+ * Interrupt handler of IH ring1, walk the IH ring1.
<br>
+ * Returns irq process return code.
<br>
+ */
<br>
+int amdgpu_ih_process1(struct amdgpu_device *adev, struct
amdgpu_ih_ring *ih)
<br>
+{
<br>
+ uint64_t ts, ts_next;
<br>
+ unsigned int count;
<br>
+ u32 wptr;
<br>
+
<br>
+ if (!ih->enabled || adev->shutdown)
<br>
+ return IRQ_NONE;
<br>
+
<br>
+ wptr = amdgpu_ih_get_wptr(adev, ih);
<br>
+ if (ih->rptr == wptr)
<br>
+ return 0;
<br>
+
<br>
+restart_ih:
<br>
+ count = AMDGPU_IH_MAX_NUM_IVS;
<br>
+
<br>
+ ts = amdgpu_ih_decode_iv_ts(ih, ih->rptr);
<br>
+ ts_next = amdgpu_ih_decode_iv_ts(ih, ih->rptr + 32);
<br>
</blockquote>
<br>
Same as above.
<br>
<br>
</blockquote>
Done in v3<br>
<blockquote type="cite" cite="mid:2a610519-689f-c3d0-3675-59f7c64eaf4f@amd.com">
<br>
<blockquote type="cite">+ while (ts < ts_next &&
--count) {
<br>
+ amdgpu_irq_dispatch(adev, ih);
<br>
+ ih->rptr &= ih->ptr_mask;
<br>
+ ts = ts_next;
<br>
+ ts_next = amdgpu_ih_decode_iv_ts(ih, ih->rptr + 32);
<br>
+ }
<br>
+ /*
<br>
+ * Process the last timestamp updated entry or one more
entry
<br>
+ * if count = 0, ts is timestamp of the entry.
<br>
+ */
<br>
+ amdgpu_irq_dispatch(adev, ih);
<br>
+ amdgpu_ih_set_rptr(adev, ih);
<br>
+ wake_up_all(&ih->wait_process);
<br>
+
<br>
+ wptr = amdgpu_ih_get_wptr(adev, ih);
<br>
+ /* Order reading of wptr vs. reading of IH ring data */
<br>
+ rmb();
<br>
+ if (ts < amdgpu_ih_decode_iv_ts(ih, wptr - 32))
<br>
+ goto restart_ih;
<br>
+
<br>
+ return IRQ_HANDLED;
<br>
+}
<br>
+
<br>
/**
<br>
* amdgpu_ih_decode_iv_helper - decode an interrupt vector
<br>
*
<br>
@@ -298,3 +338,13 @@ void amdgpu_ih_decode_iv_helper(struct
amdgpu_device *adev,
<br>
/* wptr/rptr are in bytes! */
<br>
ih->rptr += 32;
<br>
}
<br>
+
<br>
+uint64_t amdgpu_ih_decode_iv_ts(struct amdgpu_ih_ring *ih, u32
rptr)
<br>
</blockquote>
<br>
This function needs to be in IP-version-specific code. Maybe add
an offset parameter, that way you can handle different IV sizes in
different ASIC generations.
<br>
<br>
</blockquote>
<p>Add decode_iv_ts function interface to amdgpu_ih_function, this
will be used for different ASICs, to handle different IV size and
time stamp offset. vega*, navi* ASICs use this as helper function.</p>
<p>Regards,</p>
<p>Philip<br>
</p>
<blockquote type="cite" cite="mid:2a610519-689f-c3d0-3675-59f7c64eaf4f@amd.com">
<br>
<blockquote type="cite">+{
<br>
+ uint32_t index = (rptr & ih->ptr_mask) >> 2;
<br>
+ uint32_t dw1, dw2;
<br>
+
<br>
+ dw1 = ih->ring[index + 1];
<br>
+ dw2 = ih->ring[index + 2];
<br>
+ return dw1 | ((u64)(dw2 & 0xffff) << 32);
<br>
+}
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
<br>
index 0649b59830a5..15e8fe0e5e40 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h
<br>
@@ -89,10 +89,12 @@ int amdgpu_ih_ring_init(struct amdgpu_device
*adev, struct amdgpu_ih_ring *ih,
<br>
void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct
amdgpu_ih_ring *ih);
<br>
void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const
uint32_t *iv,
<br>
unsigned int num_dw);
<br>
-int amdgpu_ih_wait_on_checkpoint_process(struct amdgpu_device
*adev,
<br>
- struct amdgpu_ih_ring *ih);
<br>
+int amdgpu_ih_wait_on_checkpoint_process_ts(struct
amdgpu_device *adev,
<br>
+ struct amdgpu_ih_ring *ih);
<br>
int amdgpu_ih_process(struct amdgpu_device *adev, struct
amdgpu_ih_ring *ih);
<br>
+int amdgpu_ih_process1(struct amdgpu_device *adev, struct
amdgpu_ih_ring *ih);
<br>
void amdgpu_ih_decode_iv_helper(struct amdgpu_device *adev,
<br>
struct amdgpu_ih_ring *ih,
<br>
struct amdgpu_iv_entry *entry);
<br>
+uint64_t amdgpu_ih_decode_iv_ts(struct amdgpu_ih_ring *ih, u32
rptr);
<br>
#endif
<br>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
<br>
index e9023687dc9a..891486cca94b 100644
<br>
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
<br>
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
<br>
@@ -224,7 +224,7 @@ static void amdgpu_irq_handle_ih1(struct
work_struct *work)
<br>
struct amdgpu_device *adev = container_of(work, struct
amdgpu_device,
<br>
irq.ih1_work);
<br>
- amdgpu_ih_process(adev, &adev->irq.ih1);
<br>
+ amdgpu_ih_process1(adev, &adev->irq.ih1);
<br>
}
<br>
/**
<br>
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
index 88360f23eb61..9e566ec54cf5 100644
<br>
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
<br>
@@ -1968,7 +1968,7 @@ static void
svm_range_drain_retry_fault(struct svm_range_list *svms)
<br>
pr_debug("drain retry fault gpu %d svms %p\n", i,
svms);
<br>
-
amdgpu_ih_wait_on_checkpoint_process(pdd->dev->adev,
<br>
+
amdgpu_ih_wait_on_checkpoint_process_ts(pdd->dev->adev,
<br>
&pdd->dev->adev->irq.ih1);
<br>
pr_debug("drain retry fault gpu %d svms 0x%p done\n",
i, svms);
<br>
}
<br>
</blockquote>
</blockquote>
</body>
</html>