[Intel-gfx] [PATCH v2 1/2] drm/i915: Sanitycheck MMIO access early in driver load
Andrzej Hajda
andrzej.hajda at intel.com
Wed Mar 22 08:40:07 UTC 2023
On 21.03.2023 23:43, Andi Shyti wrote:
> Hi Matt,
>
>>> We occasionally see the PCI device in a non-accessible state at the
>>> point the driver is loaded. When this happens, all BAR accesses will
>>> read back as 0xFFFFFFFF. Rather than reading registers and
>>> misinterpreting their (invalid) values, let's specifically check for
>>> 0xFFFFFFFF in a register that cannot have that value to see if the
>>> device is accessible.
>>>
>>> Signed-off-by: Matt Roper <matthew.d.roper at intel.com>
>>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>>> Signed-off-by: Andi Shyti <andi.shyti at linux.intel.com>
>>> ---
>>> drivers/gpu/drm/i915/intel_uncore.c | 35 +++++++++++++++++++++++++++++
>>> 1 file changed, 35 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
>>> index e1e1f34490c8e..0b69081d6d285 100644
>>> --- a/drivers/gpu/drm/i915/intel_uncore.c
>>> +++ b/drivers/gpu/drm/i915/intel_uncore.c
>>> @@ -2602,11 +2602,46 @@ static int uncore_forcewake_init(struct intel_uncore *uncore)
>>> return 0;
>>> }
>>>
>>> +static int sanity_check_mmio_access(struct intel_uncore *uncore)
>>> +{
>>> + struct drm_i915_private *i915 = uncore->i915;
>>> + int ret;
>>> +
>>> + if (GRAPHICS_VER(i915) < 8)
>>> + return 0;
>>> +
>>> + /*
>>> + * Sanitycheck that MMIO access to the device is working properly. If
>>> + * the CPU is unable to communcate with a PCI device, BAR reads will
>>> + * return 0xFFFFFFFF. Let's make sure the device isn't in this state
>>> + * before we start trying to access registers.
>>> + *
>>> + * We use the primary GT's forcewake register as our guinea pig since
>>> + * it's been around since HSW and it's a masked register so the upper
>>> + * 16 bits can never read back as 1's if device access is operating
>>> + * properly.
Maybe we can just check upper 16bits then. Sth like:
ret = intel_wait_for_register_fw(uncore, FORCEWAKE_MT, GENMASK(31, 16),
0, 2000000);
Regards
Andrzej
>>> + *
>>> + * If MMIO isn't working, we'll wait up to 2 seconds to see if it
>>> + * recovers, then give up.
>>> + */
>>> + ret = intel_wait_for_register_fw(uncore, FORCEWAKE_MT, 0, 0, 2000000);
>>
>> It looks like you lost the check for 0xFFFFFFFF specifically. In fact
>> with a mask/value of 0, isn't this always going to just always pass
>> immediately?
>
> uh... yes... sorry, I just got confused and lost track of the
> goal of the patch.
>
> Sorry, then please ignore... I don't see then how
> intel_wait_for_register_fw() can be used with a '!='.
>
> Please, ignore this v2.
>
> Thanks and sorry, again,
> Andi
>
>> We don't know what the value of this register will be (there may or may
>> not be some bits set), but we need to make sure that it isn't 0xFFFFFFFF
>> because that means we're not even truly accessing the register, just
>> hitting a PCI BAR read failure.
>>
>>
>> Matt
>>
>>> + if (ret == -ETIMEDOUT) {
>>> + drm_err(&i915->drm, "Device is non-operational; MMIO access returns 0xFFFFFFFF!\n");
>>> + return -EIO;
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> int intel_uncore_init_mmio(struct intel_uncore *uncore)
>>> {
>>> struct drm_i915_private *i915 = uncore->i915;
>>> int ret;
>>>
>>> + ret = sanity_check_mmio_access(uncore);
>>> + if (ret)
>>> + return ret;
>>> +
>>> /*
>>> * The boot firmware initializes local memory and assesses its health.
>>> * If memory training fails, the punit will have been instructed to
>>> --
>>> 2.39.2
>>>
>>
>> --
>> Matt Roper
>> Graphics Software Engineer
>> Linux GPU Platform Enablement
>> Intel Corporation
More information about the Intel-gfx
mailing list