[PATCH] gpu: nova-core: avoid probing non-display/compute PCI functions

John Hubbard jhubbard at nvidia.com
Thu Aug 14 00:02:33 UTC 2025


On 8/13/25 4:50 PM, Danilo Krummrich wrote:
> On Thu Aug 14, 2025 at 1:28 AM CEST, John Hubbard wrote:
>> NovaCore 0000:c1:00.0: GPU instance built
>> NovaCore 0000:c1:00.1: Probe Nova Core GPU driver.
>> NovaCore 0000:c1:00.1: enabling device (0000 -> 0002)
>> NovaCore 0000:c1:00.1: probe with driver NovaCore failed with error -22
>> ...
>> Bad IO access at port 0x0 ()
>> WARNING: CPU: 26 PID: 748 at lib/iomap.c:45 pci_iounmap+0x3f/0x50
>> ...
>> <kernel::devres::Devres<kernel::pci::Bar<16777216>>>::devres_callback+0x2c/0x70 [nova_core]
>> devres_release_all+0xa8/0xf0
>> really_probe+0x30f/0x420
>> __driver_probe_device+0x77/0xf0
>> driver_probe_device+0x22/0x1b0
>> __driver_attach+0x118/0x250
>> bus_for_each_dev+0x105/0x130
>> bus_add_driver+0x163/0x2a0
>> driver_register+0x5d/0xf0
>> init_module+0x6d/0x1000 [nova_core]
>> do_one_initcall+0xde/0x380
>> do_init_module+0x60/0x250
>>
>> ...and then:
>> BUG: kernel NULL pointer dereference, address: 0000000000000538
>> RIP: 0010:pci_release_region+0x10/0x60
>> ...
>> <kernel::devres::Devres<kernel::pci::Bar<16777216>>>::devres_callback+0x36/0x70 [nova_core]
>> devres_release_all+0xa8/0xf0
>> really_probe+0x30f/0x420
>> __driver_probe_device+0x77/0xf0
>> driver_probe_device+0x22/0x1b0
>> __driver_attach+0x118/0x250
>> bus_for_each_dev+0x105/0x130
>> bus_add_driver+0x163/0x2a0
>> driver_register+0x5d/0xf0
>> init_module+0x6d/0x1000 [nova_core]
>> do_one_initcall+0xde/0x380
>> do_init_module+0x60/0x250
> 
> This is caused by a bug in Devres, which I already fixed in [1].

Nice timing! btw, I scanned through rust-for-linux emails, and wasn't
immediately able to connect it ("fix leak...") with this crash.

You might want to add a little note to that commit description, to
mention that it also prevents a kernel crash in some cases? 

> 
> With the patch in [1] nova-core should gracefully fail probing for the
> non-supported device classes as expected.

Beautiful. 

> 
> However, I think we still want to filter by PCI class, so the patch is fine in
> general. :)
> 
> Few comments below.
> 
> [1] https://lore.kernel.org/lkml/20250812130928.11075-1-dakr@kernel.org/
>>
>> Signed-off-by: John Hubbard <jhubbard at nvidia.com>
>> ---
>>  drivers/gpu/nova-core/driver.rs | 13 +++++++++++++
>>  rust/kernel/pci.rs              |  6 ++++++
>>  2 files changed, 19 insertions(+)
>>
>> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
>> index 274989ea1fb4..4e0e6f5338e9 100644
>> --- a/drivers/gpu/nova-core/driver.rs
>> +++ b/drivers/gpu/nova-core/driver.rs
>> @@ -31,6 +31,19 @@ impl pci::Driver for NovaCore {
>>      fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>> {
>>          dev_dbg!(pdev.as_ref(), "Probe Nova Core GPU driver.\n");
>>  
>> +        let class_code = pdev.class();
>> +
>> +        if class_code != bindings::PCI_CLASS_DISPLAY_VGA
>> +            && class_code != bindings::PCI_CLASS_DISPLAY_3D
> 
> I think it would be nice if we could provide a Rust enum for PCI classes, such
> that this could be pci::Class::DISPLAY_VGA instead.
> 
> Of course the same is true for PCI (sub)vendor, (sub)device IDs.
> 

OK, will do.

>> +        {
>> +            dev_dbg!(
>> +                pdev.as_ref(),
>> +                "Skipping non-display NVIDIA device with class 0x{:04x}\n",
>> +                class_code
>> +            );
>> +            return Err(kernel::error::code::ENODEV);
> 
> With the prelude included you should be able to use ENODEV directly.

Yes. I should have noticed that, sorry.

> 
>> +        }
>> +
>>          pdev.enable_device_mem()?;
>>          pdev.set_master();
>>  
>> diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
> 
> Please split the PCI part up into a separate patch.

OK, will do.

Thanks for the review!


thanks,
-- 
John Hubbard

> 
>> index 887ee611b553..b6416fe7bdfd 100644
>> --- a/rust/kernel/pci.rs
>> +++ b/rust/kernel/pci.rs
>> @@ -399,6 +399,12 @@ pub fn device_id(&self) -> u16 {
>>          unsafe { (*self.as_raw()).device }
>>      }
>>  
>> +    /// Returns the PCI class code (class and subclass).
>> +    pub fn class(&self) -> u32 {
>> +        // SAFETY: `self.as_raw` is a valid pointer to a `struct pci_dev`.
>> +        unsafe { (*self.as_raw()).class >> 8 }
>> +    }
>> +
>>      /// Returns the size of the given PCI bar resource.
>>      pub fn resource_len(&self, bar: u32) -> Result<bindings::resource_size_t> {
>>          if !Bar::index_is_valid(bar) {
>>
>> base-commit: dfc0f6373094dd88e1eaf76c44f2ff01b65db851
>> -- 
>> 2.50.1
> 




More information about the Nouveau mailing list