[Intel-xe] [PATCH] drm/xe/mmio: Make xe_mmio_wait32() aware of interrupts

Lucas De Marchi lucas.demarchi at intel.com
Wed Nov 15 16:51:41 UTC 2023


On Tue, Nov 14, 2023 at 07:09:22PM -0300, Gustavo Sousa wrote:
>With the current implementation, a preemption or other kind of interrupt
>might happen between xe_mmio_read32() and ktime_get_raw(). Such an
>interruption (specially in the case of preemption) might be long enough
>to cause a timeout without giving a chance of a new check on the
>register value on a next iteration, which would have happened otherwise.
>
>This issue causes some sporadic timeouts in some code paths. As an
>example, we were experiencing some rare timeouts when waiting for PLL
>unlock for C10/C20 PHYs (see intel_cx0pll_disable()). After debugging,
>we found out that the PLL unlock was happening within the expected time
>period (20us), which suggested a bug in xe_mmio_wait32().
>
>To fix the issue, ensure that we call ktime_get_raw() to get the current
>time before we check the register value. This allows for a last check
>after a timeout and covers the case where the it is caused by some
>preemption/interrupt.
>
>This change was tested with the aforementioned PLL unlocking code path.
>Experiments showed that, before this change, we observed reported
>timeouts in 54 of 5000 runs; and, after this change, no timeouts were
>reported in 5000 runs.

good find :)

>
>Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
>---
> drivers/gpu/drm/xe/xe_mmio.h | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>index ae09f777d711..4d0e816a81ac 100644
>--- a/drivers/gpu/drm/xe/xe_mmio.h
>+++ b/drivers/gpu/drm/xe/xe_mmio.h
>@@ -98,13 +98,22 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
> 	u32 read;
>
> 	for (;;) {
>+		cur = ktime_get_raw();
>+
>+		/*
>+		 * Keep the compiler from re-ordering calls to ktime_get_raw()
>+		 * and xe_mmio_read32(): reading the current time after reading
>+		 * register has the potential for "fake timeouts" due to
>+		 * preemption/interrupts in between the two.
>+		 */
>+		barrier();

we are only protecting here against the compiler reordering this. I
don't think it will due to the external call to ktime_get_raw(). Do you
get any failures if you remove the barrier?

Anyway, we probably don't need a full barrier and just using
WRITE_ONCE() for both would be more appropriate.

I wonder if we could do this logic better and have a header / wait /
tail approach. In both header and tail we read the mmio and don't
care if timeout occurred.

+Rodrigo, shouldn't we move this to a .c? IMO it's very big for a
inline.

Lucas De Marchi

>+
> 		read = xe_mmio_read32(gt, reg);
> 		if ((read & mask) == val) {
> 			ret = 0;
> 			break;
> 		}
>
>-		cur = ktime_get_raw();
> 		if (!ktime_before(cur, end))
> 			break;
>
>-- 
>2.42.0
>


More information about the Intel-xe mailing list