[Intel-xe] [PATCH] drm/xe/mmio: Make xe_mmio_wait32() aware of interrupts

Gustavo Sousa gustavo.sousa at intel.com
Wed Nov 15 17:58:37 UTC 2023


Quoting Lucas De Marchi (2023-11-15 13:51:41-03:00)
>On Tue, Nov 14, 2023 at 07:09:22PM -0300, Gustavo Sousa wrote:
>>With the current implementation, a preemption or other kind of interrupt
>>might happen between xe_mmio_read32() and ktime_get_raw(). Such an
>>interruption (specially in the case of preemption) might be long enough
>>to cause a timeout without giving a chance of a new check on the
>>register value on a next iteration, which would have happened otherwise.
>>
>>This issue causes some sporadic timeouts in some code paths. As an
>>example, we were experiencing some rare timeouts when waiting for PLL
>>unlock for C10/C20 PHYs (see intel_cx0pll_disable()). After debugging,
>>we found out that the PLL unlock was happening within the expected time
>>period (20us), which suggested a bug in xe_mmio_wait32().
>>
>>To fix the issue, ensure that we call ktime_get_raw() to get the current
>>time before we check the register value. This allows for a last check
>>after a timeout and covers the case where the it is caused by some
>>preemption/interrupt.
>>
>>This change was tested with the aforementioned PLL unlocking code path.
>>Experiments showed that, before this change, we observed reported
>>timeouts in 54 of 5000 runs; and, after this change, no timeouts were
>>reported in 5000 runs.
>
>good find :)
>
>>
>>Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
>>---
>> drivers/gpu/drm/xe/xe_mmio.h | 11 ++++++++++-
>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>>index ae09f777d711..4d0e816a81ac 100644
>>--- a/drivers/gpu/drm/xe/xe_mmio.h
>>+++ b/drivers/gpu/drm/xe/xe_mmio.h
>>@@ -98,13 +98,22 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
>>         u32 read;
>>
>>         for (;;) {
>>+                cur = ktime_get_raw();
>>+
>>+                /*
>>+                 * Keep the compiler from re-ordering calls to ktime_get_raw()
>>+                 * and xe_mmio_read32(): reading the current time after reading
>>+                 * register has the potential for "fake timeouts" due to
>>+                 * preemption/interrupts in between the two.
>>+                 */
>>+                barrier();
>
>we are only protecting here against the compiler reordering this. I
>don't think it will due to the external call to ktime_get_raw(). Do you
>get any failures if you remove the barrier?

Makes sense.

I put the barrier here to be sure that the compiler doesn't surprise us, but I
haven't tested without it. Maybe it is not really necessary indeed...

>
>Anyway, we probably don't need a full barrier and just using
>WRITE_ONCE() for both would be more appropriate.
>
>I wonder if we could do this logic better and have a header / wait /
>tail approach. In both header and tail we read the mmio and don't
>care if timeout occurred.

Yeah. I was thinking about a something similar, but ended up sending this way to
have a single place doing the register read.

The difference from your suggestion is that is that I was thinking about
adding an extra read once out of the loop in case ret is still non-zero.
Basically something like below (the patch below would be a substitute for this
one):

diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
index ae09f777d711..2b65d8387cd1 100644
--- a/drivers/gpu/drm/xe/xe_mmio.h
+++ b/drivers/gpu/drm/xe/xe_mmio.h
@@ -118,6 +118,12 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
 		wait <<= 1;
 	}
 
+	if (ret != 0) {
+		read = xe_mmio_read32(gt, reg);
+		if ((read & mask) == val)
+			ret = 0;
+	}
+
 	if (out_val)
 		*out_val = read;
 
--
Gustavo Sousa

>
>+Rodrigo, shouldn't we move this to a .c? IMO it's very big for a
>inline.
>
>Lucas De Marchi
>
>>+
>>                 read = xe_mmio_read32(gt, reg);
>>                 if ((read & mask) == val) {
>>                         ret = 0;
>>                         break;
>>                 }
>>
>>-                cur = ktime_get_raw();
>>                 if (!ktime_before(cur, end))
>>                         break;
>>
>>-- 
>>2.42.0
>>


More information about the Intel-xe mailing list