[Intel-xe] [PATCH] drm/xe/mmio: Make xe_mmio_wait32() aware of interrupts
Lucas De Marchi
lucas.demarchi at intel.com
Wed Nov 15 16:51:41 UTC 2023
On Tue, Nov 14, 2023 at 07:09:22PM -0300, Gustavo Sousa wrote:
>With the current implementation, a preemption or other kind of interrupt
>might happen between xe_mmio_read32() and ktime_get_raw(). Such an
>interruption (specially in the case of preemption) might be long enough
>to cause a timeout without giving a chance of a new check on the
>register value on a next iteration, which would have happened otherwise.
>
>This issue causes some sporadic timeouts in some code paths. As an
>example, we were experiencing some rare timeouts when waiting for PLL
>unlock for C10/C20 PHYs (see intel_cx0pll_disable()). After debugging,
>we found out that the PLL unlock was happening within the expected time
>period (20us), which suggested a bug in xe_mmio_wait32().
>
>To fix the issue, ensure that we call ktime_get_raw() to get the current
>time before we check the register value. This allows for a last check
>after a timeout and covers the case where the it is caused by some
>preemption/interrupt.
>
>This change was tested with the aforementioned PLL unlocking code path.
>Experiments showed that, before this change, we observed reported
>timeouts in 54 of 5000 runs; and, after this change, no timeouts were
>reported in 5000 runs.
good find :)
>
>Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
>---
> drivers/gpu/drm/xe/xe_mmio.h | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>index ae09f777d711..4d0e816a81ac 100644
>--- a/drivers/gpu/drm/xe/xe_mmio.h
>+++ b/drivers/gpu/drm/xe/xe_mmio.h
>@@ -98,13 +98,22 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
> u32 read;
>
> for (;;) {
>+ cur = ktime_get_raw();
>+
>+ /*
>+ * Keep the compiler from re-ordering calls to ktime_get_raw()
>+ * and xe_mmio_read32(): reading the current time after reading
>+ * register has the potential for "fake timeouts" due to
>+ * preemption/interrupts in between the two.
>+ */
>+ barrier();
we are only protecting here against the compiler reordering this. I
don't think it will due to the external call to ktime_get_raw(). Do you
get any failures if you remove the barrier?
Anyway, we probably don't need a full barrier and just using
WRITE_ONCE() for both would be more appropriate.
I wonder if we could do this logic better and have a header / wait /
tail approach. In both header and tail we read the mmio and don't
care if timeout occurred.
+Rodrigo, shouldn't we move this to a .c? IMO it's very big for a
inline.
Lucas De Marchi
>+
> read = xe_mmio_read32(gt, reg);
> if ((read & mask) == val) {
> ret = 0;
> break;
> }
>
>- cur = ktime_get_raw();
> if (!ktime_before(cur, end))
> break;
>
>--
>2.42.0
>
More information about the Intel-xe
mailing list