[Intel-xe] [PATCH] drm/xe/mmio: Make xe_mmio_wait32() aware of interrupts
Rodrigo Vivi
rodrigo.vivi at intel.com
Wed Nov 15 18:31:25 UTC 2023
On Wed, Nov 15, 2023 at 02:58:37PM -0300, Gustavo Sousa wrote:
> Quoting Lucas De Marchi (2023-11-15 13:51:41-03:00)
> >On Tue, Nov 14, 2023 at 07:09:22PM -0300, Gustavo Sousa wrote:
> >>With the current implementation, a preemption or other kind of interrupt
> >>might happen between xe_mmio_read32() and ktime_get_raw(). Such an
> >>interruption (specially in the case of preemption) might be long enough
> >>to cause a timeout without giving a chance of a new check on the
> >>register value on a next iteration, which would have happened otherwise.
> >>
> >>This issue causes some sporadic timeouts in some code paths. As an
> >>example, we were experiencing some rare timeouts when waiting for PLL
> >>unlock for C10/C20 PHYs (see intel_cx0pll_disable()). After debugging,
> >>we found out that the PLL unlock was happening within the expected time
> >>period (20us), which suggested a bug in xe_mmio_wait32().
> >>
> >>To fix the issue, ensure that we call ktime_get_raw() to get the current
> >>time before we check the register value. This allows for a last check
> >>after a timeout and covers the case where the it is caused by some
> >>preemption/interrupt.
> >>
> >>This change was tested with the aforementioned PLL unlocking code path.
> >>Experiments showed that, before this change, we observed reported
> >>timeouts in 54 of 5000 runs; and, after this change, no timeouts were
> >>reported in 5000 runs.
> >
> >good find :)
indeed!
> >> for (;;) {
> >>+ cur = ktime_get_raw();
> >>+
> >>+ /*
> >>+ * Keep the compiler from re-ordering calls to ktime_get_raw()
> >>+ * and xe_mmio_read32(): reading the current time after reading
> >>+ * register has the potential for "fake timeouts" due to
> >>+ * preemption/interrupts in between the two.
> >>+ */
> >>+ barrier();
> >
> >we are only protecting here against the compiler reordering this. I
> >don't think it will due to the external call to ktime_get_raw(). Do you
> >get any failures if you remove the barrier?
>
> Makes sense.
>
> I put the barrier here to be sure that the compiler doesn't surprise us, but I
> haven't tested without it. Maybe it is not really necessary indeed...
yeap, probably better without the barrier if it works.
>
> >
> >Anyway, we probably don't need a full barrier and just using
> >WRITE_ONCE() for both would be more appropriate.
> >
> >I wonder if we could do this logic better and have a header / wait /
> >tail approach. In both header and tail we read the mmio and don't
> >care if timeout occurred.
>
> Yeah. I was thinking about a something similar, but ended up sending this way to
> have a single place doing the register read.
yeap, same reason why I had kept like this in the past. but I wouldn't mind
if you believe there are other better ways. as long as we respect the timeout
requests.
> >
> >+Rodrigo, shouldn't we move this to a .c? IMO it's very big for a
> >inline.
indeed... although I believe the compiler would really ignore our inline
request if this gets called from multiple places anyway.
Iirc I had seen other inlines in the kernel with the same size so I thought
it would be okay. But also fine by me if this gets moved to the .c
Thanks for taking care of this,
Rodrigo.
More information about the Intel-xe
mailing list