[Intel-gfx] [PATCH] drm/i915: Fallback to reserve forcewake if primary ack missing

Thu Nov 2 11:27:53 UTC 2017

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2017-10-30 12:27:07)
>> There is a possibility on gen9 hardware to miss the forcewake ack
>> message. The recommended workaround is to use another free
>> bit and toggle it until original bit is successfully acknowledged.
>> 
>> Some future gen9 revs might or might not fix the underlying issue but
>> the fallback to reserve bit dance can be considered as harmless:
>> without the ack timeout we never reach the reserve bit forcewake.
>> Thus as of now we adopt a blanket approach for all gen9 and leave
>> the bypassing the reserve bit approach for future patches if
>> corresponding hw revisions do appear.
>> 
>> Commit 83e3337204b2 ("drm/i915: Increase maximum polling time to 50ms
>> for forcewake request/clear ack") did increase the forcewake timeout.
>> If the issue was a delayed ack, future work could include finding
>> a suitable timeout value both for primary ack and reserve toggle
>> to reduce the worst case latency.
>> 
>> v2: use bit 15, naming, comment (Chris), only wait fallback ack
>> v3: fix return on fallback, backoff after fallback write (Chris)
>> 
>> References: HSDES #1604254524
>> References: https://bugs.freedesktop.org/show_bug.cgi?id=102051
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>> Cc: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>> ---
>> +static int
>> +fw_domain_wait_ack_with_fallback(const struct drm_i915_private *i915,
>> +                                const struct intel_uncore_forcewake_domain *d,
>> +                                const enum ack_type type)
>> +{
>> +       const u32 ack_bit = FORCEWAKE_KERNEL;
>> +       const u32 value = type == ACK_SET ? ack_bit : 0;
>> +       unsigned int pass = 0;
>> +       bool ack_detected;
>> +
>> +       /*
>> +        * There is a possibility of driver's wake request colliding
>> +        * with hardware's own wake requests and that can cause
>> +        * hardware to not deliver the driver's ack message.
>> +        *
>> +        * Use a fallback bit toggle to kick the gpu state machine
>> +        * in hopes that the original ack will be delivered along with
>> +        * the fallback ack.
>
> s/in hopes/in the hope/
>
>> +        *
>> +        * This workaround is described in HSDES #1604254524
>> +        */
>> +
>> +       do {
>> +               wait_ack_clear(i915, d, FORCEWAKE_KERNEL_FALLBACK);
>> +
>> +               __raw_i915_write32(i915, d->reg_set,
>> +                                  _MASKED_BIT_ENABLE(FORCEWAKE_KERNEL_FALLBACK));
>> +               /* Give gt some time to relax before the polling frenzy */
>> +               udelay(10 * pass);
>> +               wait_ack_set(i915, d, FORCEWAKE_KERNEL_FALLBACK);
>
> I would have started from pass=1 (i.e. udelay(10)) as we already have a
> 0-delay for the primary wait_ack before we hit the fallback.
>
>> +
>> +               ack_detected = (__raw_i915_read32(i915, d->reg_ack) & ack_bit) == value;
>> +
>> +               __raw_i915_write32(i915, d->reg_set,
>> +                                  _MASKED_BIT_DISABLE(FORCEWAKE_KERNEL_FALLBACK));
>> +               pass++;
>> +       } while (!ack_detected && pass < 10);
>
> 	unsigned int pass = 1;
> 	do {
> 		...
> 	} while (!ack_detected && pass++ < 10);
>
>> +
>> +       DRM_DEBUG_DRIVER("%s had to use fallback to %s ack, 0x%x (passes %u)\n",
>> +                        intel_uncore_forcewake_domain_to_str(d->id),
>> +                        type == ACK_SET ? "set" : "clear",
>> +                        __raw_i915_read32(i915, d->reg_ack),
>> +                        pass);
>> +
>> +       return ack_detected ? 0 : -ETIMEDOUT;
>> +}
>
> I was going to say a-b, but given the state machine we've deduced that
> explains why this w/a has any chance of succeeding, I feel a
>

And it should be rather accurately maching the recommendation
of the hw engineers how to handle the issue.

There is a mention that this should be on top of delayed
ack reading of the primary fw request, which we don't have.

How I understand the issue is that without delaying the
primary ack we are more likely to hit the fallback
as the poll is interwined with the primary ack message
getting lost.

But as we now have the big hammer in place, we
can see how it goes and then tweak the delays and
timeouts if need arises.

Our timeouts are quite long and I have a suspicion
that the commit I am referring to in bxt case was
papering over of this issue.

> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>

Thanks. Pushed.

> is justified. (If it's wrong, I'm equally culpable ;)

Wouldn't it be nice if we had hardware engineers to
stamp these kind of patches.

-Mika