[Intel-xe] [PATCH] drm/xe/mmio: Make xe_mmio_wait32() aware of interrupts

Thu Nov 16 13:42:32 UTC 2023

Quoting Lucas De Marchi (2023-11-15 17:04:37-03:00)
>On Wed, Nov 15, 2023 at 02:58:37PM -0300, Gustavo Sousa wrote:
>>Quoting Lucas De Marchi (2023-11-15 13:51:41-03:00)
>>>On Tue, Nov 14, 2023 at 07:09:22PM -0300, Gustavo Sousa wrote:
>>>>With the current implementation, a preemption or other kind of interrupt
>>>>might happen between xe_mmio_read32() and ktime_get_raw(). Such an
>>>>interruption (specially in the case of preemption) might be long enough
>>>>to cause a timeout without giving a chance of a new check on the
>>>>register value on a next iteration, which would have happened otherwise.
>>>>
>>>>This issue causes some sporadic timeouts in some code paths. As an
>>>>example, we were experiencing some rare timeouts when waiting for PLL
>>>>unlock for C10/C20 PHYs (see intel_cx0pll_disable()). After debugging,
>>>>we found out that the PLL unlock was happening within the expected time
>>>>period (20us), which suggested a bug in xe_mmio_wait32().
>>>>
>>>>To fix the issue, ensure that we call ktime_get_raw() to get the current
>>>>time before we check the register value. This allows for a last check
>>>>after a timeout and covers the case where the it is caused by some
>>>>preemption/interrupt.
>>>>
>>>>This change was tested with the aforementioned PLL unlocking code path.
>>>>Experiments showed that, before this change, we observed reported
>>>>timeouts in 54 of 5000 runs; and, after this change, no timeouts were
>>>>reported in 5000 runs.
>>>
>>>good find :)
>>>
>>>>
>>>>Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
>>>>---
>>>> drivers/gpu/drm/xe/xe_mmio.h | 11 ++++++++++-
>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>>>>index ae09f777d711..4d0e816a81ac 100644
>>>>--- a/drivers/gpu/drm/xe/xe_mmio.h
>>>>+++ b/drivers/gpu/drm/xe/xe_mmio.h
>>>>@@ -98,13 +98,22 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
>>>>         u32 read;
>>>>
>>>>         for (;;) {
>>>>+                cur = ktime_get_raw();
>>>>+
>>>>+                /*
>>>>+                 * Keep the compiler from re-ordering calls to ktime_get_raw()
>>>>+                 * and xe_mmio_read32(): reading the current time after reading
>>>>+                 * register has the potential for "fake timeouts" due to
>>>>+                 * preemption/interrupts in between the two.
>>>>+                 */
>>>>+                barrier();
>>>
>>>we are only protecting here against the compiler reordering this. I
>>>don't think it will due to the external call to ktime_get_raw(). Do you
>>>get any failures if you remove the barrier?
>>
>>Makes sense.
>>
>>I put the barrier here to be sure that the compiler doesn't surprise us, but I
>>haven't tested without it. Maybe it is not really necessary indeed...
>>
>>>
>>>Anyway, we probably don't need a full barrier and just using
>>>WRITE_ONCE() for both would be more appropriate.
>>>
>>>I wonder if we could do this logic better and have a header / wait /
>>>tail approach. In both header and tail we read the mmio and don't
>>>care if timeout occurred.
>>
>>Yeah. I was thinking about a something similar, but ended up sending this way to
>>have a single place doing the register read.
>>
>>The difference from your suggestion is that is that I was thinking about
>>adding an extra read once out of the loop in case ret is still non-zero.
>>Basically something like below (the patch below would be a substitute for this
>>one):
>>
>>diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>>index ae09f777d711..2b65d8387cd1 100644
>>--- a/drivers/gpu/drm/xe/xe_mmio.h
>>+++ b/drivers/gpu/drm/xe/xe_mmio.h
>>@@ -118,6 +118,12 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
>>                 wait <<= 1;
>>         }
>>
>>+        if (ret != 0) {
>>+                read = xe_mmio_read32(gt, reg);
>>+                if ((read & mask) == val)
>>+                        ret = 0;
>>+        }
>
>
>yeah, I prefer this no barrier version. Small tweak on that maybe
>
>diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>index ae09f777d711..bb8f079e3cb6 100644
>--- a/drivers/gpu/drm/xe/xe_mmio.h
>+++ b/drivers/gpu/drm/xe/xe_mmio.h
>@@ -93,16 +93,13 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
>  {
>          ktime_t cur = ktime_get_raw();
>          const ktime_t end = ktime_add_us(cur, timeout_us);
>-        int ret = -ETIMEDOUT;
>          s64 wait = 10;
>          u32 read;
>  
>          for (;;) {
>                  read = xe_mmio_read32(gt, reg);
>-                if ((read & mask) == val) {
>-                        ret = 0;
>-                        break;
>-                }
>+                if ((read & mask) == val)
>+                        goto done;
>  
>                  cur = ktime_get_raw();
>                  if (!ktime_before(cur, end))
>@@ -118,10 +115,17 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
>                  wait <<= 1;
>          }
>  
>+        read = xe_mmio_read32(gt, reg);
>+        if ((read & mask) == val)
>+                goto done;
>+
>+        return -ETIMEDOUT;

I believe we still want *out_val to be filled even when timing out.

--
Gustavo Sousa

>+
>+done:
>          if (out_val)
>                  *out_val = read;
>  
>-        return ret;
>+        return 0;
>  }
>  
>  int xe_mmio_ioctl(struct drm_device *dev, void *data,
>  int xe_mmio_ioctl(struct drm_device *dev, void *data,
>
>
>.... it doesn't remove the double call to ktime_get_raw() at the
>beginning on the fast path, but makes this shorter.
>
>Lucas De Marchi