[PATCH v5 2/2] drm/xe/guc: Port over the slow GuC loading support from i915

Fri May 17 21:57:51 UTC 2024

On Tue, Apr 09, 2024 at 12:05:14PM GMT, John.C.Harrison at Intel.com wrote:
>From: John Harrison <John.C.Harrison at Intel.com>
>
>GuC loading can take longer than it is supposed to for various
>reasons. So add in the code to cope with that and to report it when it
>happens. There are also many different reasons why GuC loading can
>fail, so add in the code for checking for those and for reporting
>issues in a meaningful manner rather than just hitting a timeout and
>saying 'fail: status = %x'.
>
>Also, remove the 'FIXME' comment about an i915 bug that has never been
>applicable to Xe!
>
>v2: Actually report the requested and granted frequencies rather than
>showing granted twice (review feedback from Badal).
>v3: Locally code all the timeout and end condition handling because a
>helper function is not allowed (review feedback from Lucas/Rodrigo).
>v4: Add more documentation comments and rename a define to add units
>(review feedback from Lucas).
>
>Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>---

...

> static int guc_wait_ucode(struct xe_guc *guc)
> {
> 	struct xe_gt *gt = guc_to_gt(guc);
>-	u32 status;
>-	int ret;
>-
>+	struct xe_guc_pc *guc_pc = &gt->uc.guc.pc;
>+	ktime_t before, after, delta;
>+	int load_done;
>+	u32 status = 0;
>+	int ret, count;
>+	u64 delta_ms;
>+	u32 before_freq;
>+
>+	before_freq = xe_guc_pc_get_act_freq(guc_pc);
>+	before = ktime_get();
> 	/*
>-	 * Wait for the GuC to start up.
>-	 * NB: Docs recommend not using the interrupt for completion.
>-	 * Measurements indicate this should take no more than 20ms
>-	 * (assuming the GT clock is at maximum frequency). So, a
>-	 * timeout here indicates that the GuC has failed and is unusable.
>-	 * (Higher levels of the driver may decide to reset the GuC and
>-	 * attempt the ucode load again if this happens.)
>-	 *
>-	 * FIXME: There is a known (but exceedingly unlikely) race condition
>-	 * where the asynchronous frequency management code could reduce
>-	 * the GT clock while a GuC reload is in progress (during a full
>-	 * GT reset). A fix is in progress but there are complex locking
>-	 * issues to be resolved. In the meantime bump the timeout to
>-	 * 200ms. Even at slowest clock, this should be sufficient. And
>-	 * in the working case, a larger timeout makes no difference.
>+	 * Note, can't use any kind of timing information from the call to xe_mmio_wait.
>+	 * It could return a thousand intermediate stages at random times. Instead, must
>+	 * manually track the total time taken and locally implement the timeout.

I went to see the v4 because after reading this I had the impression we
had already talked about how to improve it. Honestly reading the
changelog in the cover letter ("helper functions are not allowed") and
the replies to v4 it doesn't seem there's a willingness to improve our
abstraction. Instead it seems we are shoving more code and "let
maintainers rewrite the damn thing later".  This is not a recipe to get
code accepted.

Sometimes horrible code that works is better than one that doesn't. Sometimes.
But it's also good when we try our best to improve the abstractions.

...

>+/**
>+ * xe_mmio_wait32_not() - Wait for a register to return anything other than the given masked value
>+ * @gt: MMIO target GT
>+ * @reg: register to read value from
>+ * @mask: mask to be applied to the value read from the register
>+ * @val: value to match after applying the mask
>+ * @timeout_us: time out after this period of time. Wait logic tries to be
>+ * smart, applying an exponential backoff until @timeout_us is reached.
>+ * @out_val: if not NULL, points where to store the last unmasked value
>+ * @atomic: needs to be true if calling from an atomic context
>+ *
>+ * This function polls for a masked value to change from a given value and
>+ * returns zero on success or -ETIMEDOUT if timed out.
>+ *
>+ * Note that @timeout_us represents the minimum amount of time to wait before
>+ * giving up. The actual time taken by this function can be a little more than
>+ * @timeout_us for different reasons, specially in non-atomic contexts. Thus,
>+ * it is possible that this function succeeds even after @timeout_us has passed.
>+ */
>+int xe_mmio_wait32_not(struct xe_gt *gt, struct xe_reg reg, u32 mask, u32 val, u32 timeout_us,
>+		       u32 *out_val, bool atomic)
>+{
>+	ktime_t cur = ktime_get_raw();
>+	const ktime_t end = ktime_add_us(cur, timeout_us);
>+	int ret = -ETIMEDOUT;
>+	s64 wait = 10;
>+	u32 read;
>+
>+	for (;;) {
>+		read = xe_mmio_read32(gt, reg);
>+		if ((read & mask) != val) {
>+			ret = 0;
>+			break;
>+		}
>+
>+		cur = ktime_get_raw();
>+		if (!ktime_before(cur, end))
>+			break;
>+
>+		if (ktime_after(ktime_add_us(cur, wait), end))
>+			wait = ktime_us_delta(end, cur);
>+
>+		if (atomic)
>+			udelay(wait);
>+		else
>+			usleep_range(wait, wait << 1);
>+		wait <<= 1;
>+	}
>+
>+	if (ret != 0) {
>+		read = xe_mmio_read32(gt, reg);
>+		if ((read & mask) == val)

did you mean != here?		  ^

with that fixed,

|	Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>

Lucas De Marchi