[PATCH] pci/quirks: Add quirk to reset nvgpu at boot for the Lenovo ThinkPad P50

Bjorn Helgaas helgaas at kernel.org
Thu Apr 25 13:01:24 UTC 2019


On Tue, Feb 12, 2019 at 05:02:30PM -0500, Lyude Paul wrote:
> On a very specific subset of ThinkPad P50 SKUs, particularly ones that
> come with a Quadro M1000M chip instead of the M2000M variant, the BIOS
> seems to have a very nasty habit of not always resetting the secondary
> Nvidia GPU between full reboots if the laptop is configured in Hybrid
> Graphics mode. The reason for this happening is unknown, but the
> following steps and possibly a good bit of patience will reproduce the
> issue:
> 
> 1. Boot up the laptop normally in Hybrid graphics mode
> 2. Make sure nouveau is loaded and that the GPU is awake
> 2. Allow the nvidia GPU to runtime suspend itself after being idle
> 3. Reboot the machine, the more sudden the better (e.g sysrq-b may help)
> 4. If nouveau loads up properly, reboot the machine again and go back to
> step 2 until you reproduce the issue
> 
> This results in some very strange behavior: the GPU will
> quite literally be left in exactly the same state it was in when the
> previously booted kernel started the reboot. This has all sorts of bad
> sideaffects: for starters, this completely breaks nouveau starting with a
> mysterious EVO channel failure that happens well before we've actually
> used the EVO channel for anything:
> 
> nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000
> 00000002
> ...

> So to do this, we add a new pci quirk using
> DECLARE_PCI_FIXUP_CLASS_FINAL that will be invoked before the PCI probe
> at boot finishes. From there, we check to make sure that this is indeed
> the specific P50 variant of this GPU. We also make sure that the GPU PCI
> device is advertising NoReset- in order to prevent us from trying to
> reset the GPU when the machine is in Dedicated graphics mode (where the
> GPU being initialized by the BIOS is normal and expected). Finally, we
> try mapping the MMIO space for the GPU which should only work if the GPU
> is actually active in D0 mode. We can then read the magic 0x2240c
> register on the GPU, which will have bit 1 set if the GPU's firmware has
> already been posted during a previous boot. Once we've confirmed all of
> this, we reset the PCI device and re-disable it - bringing the GPU back
> into a healthy state.
> 
> Signed-off-by: Lyude Paul <lyude at redhat.com>
> Cc: nouveau at lists.freedesktop.org
> Cc: dri-devel at lists.freedesktop.org
> Cc: Karol Herbst <kherbst at redhat.com>
> Cc: Ben Skeggs <skeggsb at gmail.com>
> Cc: stable at vger.kernel.org

Applied to pci/misc for v5.2, thanks!

> ---
>  drivers/pci/quirks.c | 65 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 65 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b0a413f3f7ca..948492fda8bf 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5117,3 +5117,68 @@ SWITCHTEC_QUIRK(0x8573);  /* PFXI 48XG3 */
>  SWITCHTEC_QUIRK(0x8574);  /* PFXI 64XG3 */
>  SWITCHTEC_QUIRK(0x8575);  /* PFXI 80XG3 */
>  SWITCHTEC_QUIRK(0x8576);  /* PFXI 96XG3 */
> +
> +/*
> + * On certain Lenovo Thinkpad P50 SKUs, specifically those with a Nvidia
> + * Quadro M1000M, the BIOS will occasionally make the mistake of not resetting
> + * the nvidia GPU between reboots if the system is configured to use hybrid
> + * graphics mode. This results in the GPU being left in whatever state it was
> + * in during the previous boot which causes spurious interrupts from the GPU,
> + * which in turn cause us to disable the wrong IRQs and end up breaking the
> + * touchpad. Unsurprisingly, this also completely breaks nouveau.
> + *
> + * Luckily, it seems a simple reset of the PCI device for the nvidia GPU
> + * manages to bring the GPU back into a clean state and fix all of these
> + * issues. Additionally since the GPU will report NoReset+ when the machine is
> + * configured in Dedicated display mode, we don't need to worry about
> + * accidentally resetting the GPU when it's supposed to already be
> + * initialized.
> + */
> +static void
> +quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot(struct pci_dev *pdev)
> +{
> +	void __iomem *map;
> +	int ret;
> +
> +	if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
> +	    pdev->subsystem_device != 0x222e ||
> +	    !pdev->reset_fn)
> +		return;
> +
> +	/*
> +	 * If we can't enable the device's mmio space, it's probably not even
> +	 * initialized. This is fine, and means we can just skip the quirk
> +	 * entirely.
> +	 */
> +	if (pci_enable_device_mem(pdev)) {
> +		pci_dbg(pdev, "Can't enable device mem, no reset needed\n");
> +		return;
> +	}
> +
> +	/* Taken from drivers/gpu/drm/nouveau/engine/device/base.c */
> +	map = ioremap(pci_resource_start(pdev, 0), 0x102000);
> +	if (!map) {
> +		pci_err(pdev, "Can't map MMIO space, this is probably very bad\n");
> +		goto out_disable;
> +	}
> +
> +	/*
> +	 * Be extra careful, and make sure that the GPU firmware is posted
> +	 * before trying a reset
> +	 */
> +	if (ioread32(map + 0x2240c) & 0x2) {
> +		pci_info(pdev,
> +			 FW_BUG "GPU left initialized by EFI, resetting\n");
> +		ret = pci_reset_function(pdev);
> +		if (ret < 0)
> +			pci_err(pdev, "Failed to reset GPU: %d\n", ret);
> +	}
> +
> +	iounmap(map);
> +out_disable:
> +	pci_disable_device(pdev);
> +}
> +
> +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
> +			      PCI_CLASS_DISPLAY_VGA, 8,
> +			      quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot);
> -- 
> 2.20.1
> 


More information about the dri-devel mailing list