[Intel-gfx] [PATCH 07/19] drm/i915/icl: Mind the SFC units when resetting VD or VEBox engines

Thu Dec 13 09:24:15 UTC 2018

On 13/12/2018 08:47, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-12-13 08:20:58)
>>
>> On 12/12/2018 14:36, Tvrtko Ursulin wrote:
>>>
>>> On 12/12/2018 13:41, Chris Wilson wrote:
>>>> From: Oscar Mateo <oscar.mateo at intel.com>
>>>>
>>>> SFC (Scaler & Format Converter) units are shared between VD and VEBoxes.
>>>> They also happen to have separate reset bits. So, whenever we want to
>>>> reset
>>>> one or more of the media engines, we have to make sure the SFCs do not
>>>> change owner in the process and, if this owner happens to be one of the
>>>> engines being reset, we need to reset the SFC as well.
>>>>
>>>> This happens in 4 steps:
>>>>
>>>> 1) Tell the engine that a software reset is going to happen. The engine
>>>> will then try to force lock the SFC (if currently locked, it will
>>>> remain so; if currently unlocked, it will ignore this and all new lock
>>>> requests).
>>>>
>>>> 2) Poll the ack bit to make sure the hardware has received the forced
>>>> lock from the driver. Once this bit is set, it indicates SFC status
>>>> (lock or unlock) will not change anymore (until we tell the engine it
>>>> is safe to unlock again).
>>>>
>>>> 3) Check the usage bit to see if the SFC has ended up being locked to
>>>> the engine we want to reset. If this is the case, we have to reset
>>>> the SFC as well.
>>>>
>>>> 4) Unlock all the SFCs once the reset sequence is completed.
>>>>
>>>> Obviously, if we are resetting the whole GPU, we don't have to worry
>>>> about all of this.
>>>>
>>>> BSpec: 10989
>>>> BSpec: 10990
>>>> BSpec: 10954
>>>> BSpec: 10955
>>>> BSpec: 10956
>>>> BSpec: 19212
>>>>
>>>> Signed-off-by: Tomasz Lis <tomasz.lis at intel.com>
>>>> Signed-off-by: Oscar Mateo <oscar.mateo at intel.com>
>>>> Signed-off-by: Michel Thierry <michel.thierry at intel.com>
>>>> Cc: Michel Thierry <michel.thierry at intel.com>
>>>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_reg.h     |  18 +++++
>>>>    drivers/gpu/drm/i915/intel_uncore.c | 115 ++++++++++++++++++++++++++--
>>>>    2 files changed, 128 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>>> b/drivers/gpu/drm/i915/i915_reg.h
>>>> index 0a7d60509ca7..0796526dc10f 100644
>>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>>>> @@ -347,6 +347,24 @@ static inline bool i915_mmio_reg_valid(i915_reg_t
>>>> reg)
>>>>    #define  GEN11_GRDOM_MEDIA4        (1 << 8)
>>>>    #define  GEN11_GRDOM_VECS        (1 << 13)
>>>>    #define  GEN11_GRDOM_VECS2        (1 << 14)
>>>> +#define  GEN11_GRDOM_SFC0        (1 << 17)
>>>> +#define  GEN11_GRDOM_SFC1        (1 << 18)
>>>> +
>>>> +#define  GEN11_VCS_SFC_RESET_BIT(instance)    (GEN11_GRDOM_SFC0 <<
>>>> ((instance) >> 1))
>>>> +#define  GEN11_VECS_SFC_RESET_BIT(instance)    (GEN11_GRDOM_SFC0 <<
>>>> (instance))
>>
>> These two lines could be a bit of a hidden mine for the future. They
>> silently describe which video engines are connected to which SFC
>> instance. I guess we'll just have to remember to update this if/when it
>> changes.
> 
> Hmm, that sounds like the type of info we put into the tables for
> intel_engine_cs.

That is a good idea. I can take on myself to convert it to such scheme 
in scope of engine discovery, where I plan to fiddle with engine to SFC 
mapping anyway.

> Will we ever notice a bug here? What test do we actually need to perform
> to hit an error. Presumably more than just a spinner on a media engine.
> 
> That strays back into the topic of a lack of real world test samples for
> GPU reset handling.

I agree in thinking that we have no coverage for it. We'd probably need 
something actually using the SFC unit so we can detect the wrong one has 
been reset, or that the right one hasn't been, etc. Or at least from 
userspace that engine resets on other engines do not affect the SFC 
workload.

Regards,

Tvrtko

>>
>>>> +
>>>> +#define GEN11_VCS_SFC_FORCED_LOCK(engine)
>>>> _MMIO((engine)->mmio_base + 0x88C)
>>>> +#define   GEN11_VCS_SFC_FORCED_LOCK_BIT        (1 << 0)
>>>> +#define GEN11_VCS_SFC_LOCK_STATUS(engine)
>>>> _MMIO((engine)->mmio_base + 0x890)
>>>> +#define   GEN11_VCS_SFC_USAGE_BIT        (1 << 0)
>>>> +#define   GEN11_VCS_SFC_LOCK_ACK_BIT        (1 << 1)
>>>> +
>>>> +#define GEN11_VECS_SFC_FORCED_LOCK(engine)
>>>> _MMIO((engine)->mmio_base + 0x201C)
>>>> +#define   GEN11_VECS_SFC_FORCED_LOCK_BIT    (1 << 0)
>>>> +#define GEN11_VECS_SFC_LOCK_ACK(engine)
>>>> _MMIO((engine)->mmio_base + 0x2018)
>>>> +#define   GEN11_VECS_SFC_LOCK_ACK_BIT        (1 << 0)
>>>> +#define GEN11_VECS_SFC_USAGE(engine)        _MMIO((engine)->mmio_base
>>>> + 0x2014)
>>>> +#define   GEN11_VECS_SFC_USAGE_BIT        (1 << 0)
>>>>    #define RING_PP_DIR_BASE(engine)    _MMIO((engine)->mmio_base + 0x228)
>>>>    #define RING_PP_DIR_BASE_READ(engine)    _MMIO((engine)->mmio_base +
>>>> 0x518)
>>>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c
>>>> b/drivers/gpu/drm/i915/intel_uncore.c
>>>> index 9289515108c3..408692b88c98 100644
>>>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>>>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>>>> @@ -1931,6 +1931,103 @@ static int gen6_reset_engines(struct
>>>> drm_i915_private *dev_priv,
>>>>        return gen6_hw_domain_reset(dev_priv, hw_mask);
>>>>    }
>>>> +static u32 gen11_lock_sfc(struct drm_i915_private *dev_priv,
>>>> +              struct intel_engine_cs *engine)
>>>> +{
>>>> +    u8 vdbox_sfc_access = INTEL_INFO(dev_priv)->vdbox_sfc_access;
>>>
>>> Only a single use site so could get away with a local copy.
>>>
>>>> +    i915_reg_t sfc_forced_lock, sfc_forced_lock_ack;
>>>> +    u32 sfc_forced_lock_bit, sfc_forced_lock_ack_bit;
>>>> +    i915_reg_t sfc_usage;
>>>> +    u32 sfc_usage_bit;
>>>> +    u32 sfc_reset_bit;
>>>> +
>>>> +    switch (engine->class) {
>>>> +    case VIDEO_DECODE_CLASS:
>>>> +        if ((BIT(engine->instance) & vdbox_sfc_access) == 0)
>>>> +            return 0;
>>>> +
>>>> +        sfc_forced_lock = GEN11_VCS_SFC_FORCED_LOCK(engine);
>>>> +        sfc_forced_lock_bit = GEN11_VCS_SFC_FORCED_LOCK_BIT;
>>>> +
>>>> +        sfc_forced_lock_ack = GEN11_VCS_SFC_LOCK_STATUS(engine);
>>>> +        sfc_forced_lock_ack_bit  = GEN11_VCS_SFC_LOCK_ACK_BIT;
>>>> +
>>>> +        sfc_usage = GEN11_VCS_SFC_LOCK_STATUS(engine);
>>>> +        sfc_usage_bit = GEN11_VCS_SFC_USAGE_BIT;
>>>> +        sfc_reset_bit = GEN11_VCS_SFC_RESET_BIT(engine->instance);
>>>> +        break;
>>>> +
>>>> +    case VIDEO_ENHANCEMENT_CLASS:
>>>> +        sfc_forced_lock = GEN11_VECS_SFC_FORCED_LOCK(engine);
>>>> +        sfc_forced_lock_bit = GEN11_VECS_SFC_FORCED_LOCK_BIT;
>>>> +
>>>> +        sfc_forced_lock_ack = GEN11_VECS_SFC_LOCK_ACK(engine);
>>>> +        sfc_forced_lock_ack_bit  = GEN11_VECS_SFC_LOCK_ACK_BIT;
>>>> +
>>>> +        sfc_usage = GEN11_VECS_SFC_USAGE(engine);
>>>> +        sfc_usage_bit = GEN11_VECS_SFC_USAGE_BIT;
>>>> +        sfc_reset_bit = GEN11_VECS_SFC_RESET_BIT(engine->instance);
>>>> +        break;
>>>> +
>>>> +    default:
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Tell the engine that a software reset is going to happen. The
>>>> engine
>>>> +     * will then try to force lock the SFC (if currently locked, it will
>>>> +     * remain so until we tell the engine it is safe to unlock; if
>>>> currently
>>>> +     * unlocked, it will ignore this and all new lock requests). If SFC
>>>> +     * ends up being locked to the engine we want to reset, we have
>>>> to reset
>>>> +     * it as well (we will unlock it once the reset sequence is
>>>> completed).
>>>> +     */
>>>> +    I915_WRITE_FW(sfc_forced_lock,
>>>> +              I915_READ_FW(sfc_forced_lock) | sfc_forced_lock_bit);
>>>> +
>>>> +    if (__intel_wait_for_register_fw(dev_priv,
>>>> +                     sfc_forced_lock_ack,
>>>> +                     sfc_forced_lock_ack_bit,
>>>> +                     sfc_forced_lock_ack_bit,
>>>> +                     1000, 0, NULL)) {
>>>> +        DRM_DEBUG_DRIVER("Wait for SFC forced lock ack failed\n");
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    if ((I915_READ_FW(sfc_usage) & sfc_usage_bit) == sfc_usage_bit)
>>>
>>> "== sfc_usage_bit" parts is redundant.
>>>
>>>> +        return sfc_reset_bit;
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static void gen11_unlock_sfc(struct drm_i915_private *dev_priv,
>>>> +                 struct intel_engine_cs *engine)
>>>> +{
>>>> +    u8 vdbox_sfc_access = INTEL_INFO(dev_priv)->vdbox_sfc_access;
>>>> +    i915_reg_t sfc_forced_lock;
>>>> +    u32 sfc_forced_lock_bit;
>>>> +
>>>> +    switch (engine->class) {
>>>> +    case VIDEO_DECODE_CLASS:
>>>> +        if ((BIT(engine->instance) & vdbox_sfc_access) == 0)
>>>> +            return;
>>>> +
>>>> +        sfc_forced_lock = GEN11_VCS_SFC_FORCED_LOCK(engine);
>>>> +        sfc_forced_lock_bit = GEN11_VCS_SFC_FORCED_LOCK_BIT;
>>>> +        break;
>>>> +
>>>> +    case VIDEO_ENHANCEMENT_CLASS:
>>>> +        sfc_forced_lock = GEN11_VECS_SFC_FORCED_LOCK(engine);
>>>> +        sfc_forced_lock_bit = GEN11_VECS_SFC_FORCED_LOCK_BIT;
>>>> +        break;
>>>> +
>>>> +    default:
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    I915_WRITE_FW(sfc_forced_lock,
>>>> +              I915_READ_FW(sfc_forced_lock) & ~sfc_forced_lock_bit);
>>>> +}
>>>> +
>>>>    /**
>>>>     * gen11_reset_engines - reset individual engines
>>>>     * @dev_priv: i915 device
>>>> @@ -1947,7 +2044,6 @@ static int gen6_reset_engines(struct
>>>> drm_i915_private *dev_priv,
>>>>    static int gen11_reset_engines(struct drm_i915_private *dev_priv,
>>>>                       unsigned int engine_mask)
>>>>    {
>>>> -    struct intel_engine_cs *engine;
>>>>        const u32 hw_engine_mask[I915_NUM_ENGINES] = {
>>>>            [RCS] = GEN11_GRDOM_RENDER,
>>>>            [BCS] = GEN11_GRDOM_BLT,
>>>> @@ -1958,21 +2054,30 @@ static int gen11_reset_engines(struct
>>>> drm_i915_private *dev_priv,
>>>>            [VECS] = GEN11_GRDOM_VECS,
>>>>            [VECS2] = GEN11_GRDOM_VECS2,
>>>>        };
>>>> +    struct intel_engine_cs *engine;
>>>> +    unsigned int tmp;
>>>>        u32 hw_mask;
>>>> +    int ret;
>>>>        BUILD_BUG_ON(VECS2 + 1 != I915_NUM_ENGINES);
>>>>        if (engine_mask == ALL_ENGINES) {
>>>>            hw_mask = GEN11_GRDOM_FULL;
>>>>        } else {
>>>> -        unsigned int tmp;
>>>> -
>>>>            hw_mask = 0;
>>>> -        for_each_engine_masked(engine, dev_priv, engine_mask, tmp)
>>>> +        for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
>>>>                hw_mask |= hw_engine_mask[engine->id];
>>>> +            hw_mask |= gen11_lock_sfc(dev_priv, engine);
>>>
>>> Presumably point No. 3 from the commit message does not mean there is a
>>> separate way to reset the SFC, just that the act of resetting the engine
>>> to which the SFC is locked to, will reset SFC as well?
>>
>> My bad, gen11_lock_sfc is returning the bit which resets the SFC.
>>
>>>
>>>> +        }
>>>>        }
>>>> -    return gen6_hw_domain_reset(dev_priv, hw_mask);
>>>> +    ret = gen6_hw_domain_reset(dev_priv, hw_mask);
>>>> +
>>>> +    if (engine_mask != ALL_ENGINES)
>>>> +        for_each_engine_masked(engine, dev_priv, engine_mask, tmp)
>>>> +            gen11_unlock_sfc(dev_priv, engine);
>>>
>>> And this presumably means it is okay to unlock engines which we haven't
>>> lock in the previous loop?
>>
>> For this I found no clear information. Presumably it is okay to clear
>> what hasn't been set in the first place...
> 
> Yeah, that was the reasoning I came to as well. So long as we do not
> broaden engine_mask so that we can concurrently operate on disjoint sets
> the RMW should be safe.
> -Chris
>