[Nouveau] [PATCH] pci: do a msi rearm on init

Thierry Reding thierry.reding at gmail.com
Fri Nov 24 14:23:28 UTC 2017


On Fri, Nov 24, 2017 at 03:08:25PM +0100, Karol Herbst wrote:
> On Fri, Nov 24, 2017 at 3:02 PM, Thierry Reding
> <thierry.reding at gmail.com> wrote:
> > On Fri, Nov 24, 2017 at 03:56:26AM +0100, Karol Herbst wrote:
> >> On my GP107 when I load nouveau after unloading it, for some reason the
> >> GPU stopped sending or the CPU stopped receiving interrupts if MSI was
> >> enabled.
> >
> > I suppose this could happen if the GPU raises an interrupt after the
> > driver's already called free_irq() on it, and hence the driver can't
> > rearm itself in the interrupt handler.
> >
> > This possibly points to a bug somewhere (the GPU should be completely
> > idle by the time free_irq() is called), but this seems like a valid
> > thing to do at initialization in any case to avoid relying on the prior
> > owner of the device to always behave properly.
> >
> 
> Yeah, this makes sense. But what I am wondering about is, why this
> isn't a bigger problem or maybe this is just due to those changes in
> the Pascal interrupt handler and this is a Pascal only problem?

Yeah, this could be some kind of race that's only triggering on Pascal.

Comparing with the nvgpu driver it seems like the MSI interrupt should
be rearmed only after all interrupts have been processed, while Nouveau
currently rearms before processing interrupts (though after masking the
interrupts). I'm not very familiar with all of this, but perhaps Pascal
has some interrupts that Nouveau doesn't mask and therefore might race.

Perhaps something like this would help:

--- >8 ---
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
index b1b1f3626b96..0b3b802c26df 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c
@@ -72,10 +72,10 @@ nvkm_pci_intr(int irq, void *arg)
        struct nvkm_device *device = pci->subdev.device;
        bool handled = false;
        nvkm_mc_intr_unarm(device);
-       if (pci->msi)
-               pci->func->msi_rearm(pci);
        nvkm_mc_intr(device, &handled);
        nvkm_mc_intr_rearm(device);
+       if (pci->msi)
+               pci->func->msi_rearm(pci);
        return handled ? IRQ_HANDLED : IRQ_NONE;
 }

--- >8 ---

> Anyway, the Nvidia driver seems to do it once on loading time as well,
> so I was quite sure we could simply do it this way and be sure that we
> are able to use the GPU from any state.

I think it's totally fine to apply as-is and leave it to further
investigation what Nouveau needs to do to properly uninitialize the
device. Like you said it can always happen that somebody else leaves
the GPU in some undefined state, in which case it's good to always
do this at initialization.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20171124/364580b0/attachment-0001.sig>


More information about the Nouveau mailing list