[Intel-xe] [PATCH] drm/xe/mmio: Make xe_mmio_wait32() aware of interrupts
Lucas De Marchi
lucas.demarchi at intel.com
Thu Nov 16 17:56:42 UTC 2023
On Thu, Nov 16, 2023 at 10:42:32AM -0300, Gustavo Sousa wrote:
>Quoting Lucas De Marchi (2023-11-15 17:04:37-03:00)
>>On Wed, Nov 15, 2023 at 02:58:37PM -0300, Gustavo Sousa wrote:
>>>Quoting Lucas De Marchi (2023-11-15 13:51:41-03:00)
>>>>On Tue, Nov 14, 2023 at 07:09:22PM -0300, Gustavo Sousa wrote:
>>>>>With the current implementation, a preemption or other kind of interrupt
>>>>>might happen between xe_mmio_read32() and ktime_get_raw(). Such an
>>>>>interruption (specially in the case of preemption) might be long enough
>>>>>to cause a timeout without giving a chance of a new check on the
>>>>>register value on a next iteration, which would have happened otherwise.
>>>>>
>>>>>This issue causes some sporadic timeouts in some code paths. As an
>>>>>example, we were experiencing some rare timeouts when waiting for PLL
>>>>>unlock for C10/C20 PHYs (see intel_cx0pll_disable()). After debugging,
>>>>>we found out that the PLL unlock was happening within the expected time
>>>>>period (20us), which suggested a bug in xe_mmio_wait32().
>>>>>
>>>>>To fix the issue, ensure that we call ktime_get_raw() to get the current
>>>>>time before we check the register value. This allows for a last check
>>>>>after a timeout and covers the case where the it is caused by some
>>>>>preemption/interrupt.
>>>>>
>>>>>This change was tested with the aforementioned PLL unlocking code path.
>>>>>Experiments showed that, before this change, we observed reported
>>>>>timeouts in 54 of 5000 runs; and, after this change, no timeouts were
>>>>>reported in 5000 runs.
>>>>
>>>>good find :)
>>>>
>>>>>
>>>>>Signed-off-by: Gustavo Sousa <gustavo.sousa at intel.com>
>>>>>---
>>>>> drivers/gpu/drm/xe/xe_mmio.h | 11 ++++++++++-
>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>>diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>>>>>index ae09f777d711..4d0e816a81ac 100644
>>>>>--- a/drivers/gpu/drm/xe/xe_mmio.h
>>>>>+++ b/drivers/gpu/drm/xe/xe_mmio.h
>>>>>@@ -98,13 +98,22 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
>>>>> u32 read;
>>>>>
>>>>> for (;;) {
>>>>>+ cur = ktime_get_raw();
>>>>>+
>>>>>+ /*
>>>>>+ * Keep the compiler from re-ordering calls to ktime_get_raw()
>>>>>+ * and xe_mmio_read32(): reading the current time after reading
>>>>>+ * register has the potential for "fake timeouts" due to
>>>>>+ * preemption/interrupts in between the two.
>>>>>+ */
>>>>>+ barrier();
>>>>
>>>>we are only protecting here against the compiler reordering this. I
>>>>don't think it will due to the external call to ktime_get_raw(). Do you
>>>>get any failures if you remove the barrier?
>>>
>>>Makes sense.
>>>
>>>I put the barrier here to be sure that the compiler doesn't surprise us, but I
>>>haven't tested without it. Maybe it is not really necessary indeed...
>>>
>>>>
>>>>Anyway, we probably don't need a full barrier and just using
>>>>WRITE_ONCE() for both would be more appropriate.
>>>>
>>>>I wonder if we could do this logic better and have a header / wait /
>>>>tail approach. In both header and tail we read the mmio and don't
>>>>care if timeout occurred.
>>>
>>>Yeah. I was thinking about a something similar, but ended up sending this way to
>>>have a single place doing the register read.
>>>
>>>The difference from your suggestion is that is that I was thinking about
>>>adding an extra read once out of the loop in case ret is still non-zero.
>>>Basically something like below (the patch below would be a substitute for this
>>>one):
>>>
>>>diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>>>index ae09f777d711..2b65d8387cd1 100644
>>>--- a/drivers/gpu/drm/xe/xe_mmio.h
>>>+++ b/drivers/gpu/drm/xe/xe_mmio.h
>>>@@ -118,6 +118,12 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
>>> wait <<= 1;
>>> }
>>>
>>>+ if (ret != 0) {
>>>+ read = xe_mmio_read32(gt, reg);
>>>+ if ((read & mask) == val)
>>>+ ret = 0;
>>>+ }
>>
>>
>>yeah, I prefer this no barrier version. Small tweak on that maybe
>>
>>diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h
>>index ae09f777d711..bb8f079e3cb6 100644
>>--- a/drivers/gpu/drm/xe/xe_mmio.h
>>+++ b/drivers/gpu/drm/xe/xe_mmio.h
>>@@ -93,16 +93,13 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
>> {
>> ktime_t cur = ktime_get_raw();
>> const ktime_t end = ktime_add_us(cur, timeout_us);
>>- int ret = -ETIMEDOUT;
>> s64 wait = 10;
>> u32 read;
>>
>> for (;;) {
>> read = xe_mmio_read32(gt, reg);
>>- if ((read & mask) == val) {
>>- ret = 0;
>>- break;
>>- }
>>+ if ((read & mask) == val)
>>+ goto done;
>>
>> cur = ktime_get_raw();
>> if (!ktime_before(cur, end))
>>@@ -118,10 +115,17 @@ static inline int xe_mmio_wait32(struct xe_gt *gt, struct xe_reg reg, u32 mask,
>> wait <<= 1;
>> }
>>
>>+ read = xe_mmio_read32(gt, reg);
>>+ if ((read & mask) == val)
>>+ goto done;
>>+
>>+ return -ETIMEDOUT;
>
>I believe we still want *out_val to be filled even when timing out.
right... it looks like xe_guc.c is the biggest user of it and will
use the returned value in error messages if there's a timeout.
Lucas De Marchi
More information about the Intel-xe
mailing list