Device loses its IRQ number on driver unload?

Alex Deucher alexdeucher at gmail.com
Tue Mar 10 07:01:26 PDT 2015


On Tue, Mar 10, 2015 at 8:55 AM, Thomas Hellstrom <thellstrom at vmware.com> wrote:
> On 03/09/2015 09:25 PM, Dave Airlie wrote:
>> On 10 March 2015 at 02:02, Thomas Hellstrom <thellstrom at vmware.com> wrote:
>>> On 03/09/2015 04:22 PM, Daniel Vetter wrote:
>>>> On Mon, Mar 09, 2015 at 11:04:01AM +0100, Thomas Hellstrom wrote:
>>>>> Hi,
>>>>>
>>>>> I'm not sure this started with 4.0 but when I rmmod the device driver
>>>>> like so
>>>>> rmmod vmwgfx
>>>>>
>>>>> The device loses its IRQ line as shown in lscpi:
>>>>>    Flags: bus master, medium devsel, latency 64 <irq missing here>
>>>>>
>>>>> and a subsequent modprobe will fail since pdev->irq is 0.
>>>>>
>>>>> Is anyone else seeing this with other drivers?
>>>> I seen occasionally (over the past couple of kernels) random zeros in pdev
>>>> but dismissed it as broken machines or bugs in i915 (we have them ...).
>>>> Usually the box died chasing a NULL pointer from pdev. Otherwise no.
>>>> -Daniel
>>> OK. Thanks for the info. Since in my case this is 100% reproducible I
>>> guess I have an excellent opportunity to bisect the problem :-/
>>>
>> does lspci -H1, or some option like to direct access hw show it?
>>
>> just whether this is the kernel copy or the hw register getting messed up.
>>
>> Dave.
> Hi, Dave,
>
> lspci -H1 indeed shows the IRQ number. It turns out that the commit
> introduced in 4.0 breaking this is
>
> b4b55cda587442477a3a9f0669e26bba4b7800c0 is the first bad commit
> commit b4b55cda587442477a3a9f0669e26bba4b7800c0
> Author: Jiang Liu <jiang.liu at linux.intel.com>
> Date:   Thu Feb 5 13:44:47 2015 +0800
>
>     x86/PCI: Refine the way to release PCI IRQ resources
>
>
> It's obvious from the commit message that unloading the driver *should*
> drop the irq resource but its not
> obvious what's reallocating that resource on driver load...
>
> Anyway, it turns out that adding a
> pci_disable_device(pdev) in the pci driver's remove() method
> (vmw_remove() in my case) appears to fix the problem:
> The device irq is removed on driver unload and enabled again on driver
> load There appears to be no pci_disable_device() on driver exit in core drm.
>
> However it still beats me why other drm drivers aren't seeing this, and
> IMHO that commit should probably add a warning message if the pci device
> isn't disabled on pci driver unload......

They are probably broken as well.  I don't think module unload and
reload is commonly done with most drivers.  FWIW, the drm core also
does not register a pci shutdown callback so when you use kexec,
nothing in the driver gets torn down properly.

Alex


More information about the dri-devel mailing list