[Intel-xe] [PATCH] drm/xe/mmio: Make xe_mmio_wait32() aware of interrupts

Wed Nov 15 20:04:37 UTC 2023

On Wed, Nov 15, 2023 at 02:58:37PM -0300, Gustavo Sousa wrote:
>Quoting Lucas De Marchi (2023-11-15 13:51:41-03:00)
>>On Tue, Nov 14, 2023 at 07:09:22PM -0300, Gustavo Sousa wrote:
>>>With the current implementation, a preemption or other kind of interrupt
>>>might happen between xe_mmio_read32() and ktime_get_raw(). Such an
>>>interruption (specially in the case of preemption) might be long enough
>>>to cause a timeout without giving a chance of a new check on the
>>>register value on a next iteration, which would have happened otherwise.
>>>
>>>This issue causes some sporadic timeouts in some code paths. As an
>>>example, we were experiencing some rare timeouts when waiting for PLL
>>>unlock for C10/C20 PHYs (see intel_cx0pll_disable()). After debugging,
>>>we found out that the PLL unlock was happening within the expected time
>>>period (20us), which suggested a bug in xe_mmio_wait32().
>>>
>>>To fix the issue, ensure that we call ktime_get_raw() to get the current
>>>time before we check the register value. This allows for a last check
>>>after a timeout and covers the case where the it is caused by some
>>>preemption/interrupt.
>>>
>>>This change was tested with the aforementioned PLL unlocking code path.
>>>Experiments showed that, before this change, we observed reported
>>>timeouts in 54 of 5000 runs; and, after this change, no timeouts were
>>>reported in 5000 runs.
>>
>>good find :)
>>
>>>
>>>Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
>>>---
>>> drivers/gpu/drm/xe/xe_mmio.h | 11 ++++++++++-
>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>>diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>>>index ae09f777d711..4d0e816a81ac 100644
>>>--- a/drivers/gpu/drm/xe/xe_mmio.h
>>>+++ b/drivers/gpu/drm/xe/xe_mmio.h
>>>@@ -98,13 +98,22 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
>>>         u32 read;
>>>
>>>         for (;;) {
>>>+                cur = ktime_get_raw();
>>>+
>>>+                /*
>>>+                 * Keep the compiler from re-ordering calls to ktime_get_raw()
>>>+                 * and xe_mmio_read32(): reading the current time after reading
>>>+                 * register has the potential for "fake timeouts" due to
>>>+                 * preemption/interrupts in between the two.
>>>+                 */
>>>+                barrier();
>>
>>we are only protecting here against the compiler reordering this. I
>>don't think it will due to the external call to ktime_get_raw(). Do you
>>get any failures if you remove the barrier?
>
>Makes sense.
>
>I put the barrier here to be sure that the compiler doesn't surprise us, but I
>haven't tested without it. Maybe it is not really necessary indeed...
>
>>
>>Anyway, we probably don't need a full barrier and just using
>>WRITE_ONCE() for both would be more appropriate.
>>
>>I wonder if we could do this logic better and have a header / wait /
>>tail approach. In both header and tail we read the mmio and don't
>>care if timeout occurred.
>
>Yeah. I was thinking about a something similar, but ended up sending this way to
>have a single place doing the register read.
>
>The difference from your suggestion is that is that I was thinking about
>adding an extra read once out of the loop in case ret is still non-zero.
>Basically something like below (the patch below would be a substitute for this
>one):
>
>diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>index ae09f777d711..2b65d8387cd1 100644
>--- a/drivers/gpu/drm/xe/xe_mmio.h
>+++ b/drivers/gpu/drm/xe/xe_mmio.h
>@@ -118,6 +118,12 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
> 		wait <<= 1;
> 	}
>
>+	if (ret != 0) {
>+		read = xe_mmio_read32(gt, reg);
>+		if ((read & mask) == val)
>+			ret = 0;
>+	}


yeah, I prefer this no barrier version. Small tweak on that maybe

diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
index ae09f777d711..bb8f079e3cb6 100644
--- a/drivers/gpu/drm/xe/xe_mmio.h
+++ b/drivers/gpu/drm/xe/xe_mmio.h
@@ -93,16 +93,13 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
  {
  	ktime_t cur = ktime_get_raw();
  	const ktime_t end = ktime_add_us(cur, timeout_us);
-	int ret = -ETIMEDOUT;
  	s64 wait = 10;
  	u32 read;
  
  	for (;;) {
  		read = xe_mmio_read32(gt, reg);
-		if ((read & mask) == val) {
-			ret = 0;
-			break;
-		}
+		if ((read & mask) == val)
+			goto done;
  
  		cur = ktime_get_raw();
  		if (!ktime_before(cur, end))
@@ -118,10 +115,17 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
  		wait <<= 1;
  	}
  
+	read = xe_mmio_read32(gt, reg);
+	if ((read & mask) == val)
+		goto done;
+
+	return -ETIMEDOUT;
+
+done:
  	if (out_val)
  		*out_val = read;
  
-	return ret;
+	return 0;
  }
  
  int xe_mmio_ioctl(struct drm_device *dev, void *data,
  int xe_mmio_ioctl(struct drm_device *dev, void *data,


.... it doesn't remove the double call to ktime_get_raw() at the
beginning on the fast path, but makes this shorter.

Lucas De Marchi