[PATCH] drm/i915/gvt: fix vgpu debugfs clean in remove

He, Yu yu.he at intel.com
Fri Dec 9 08:00:47 UTC 2022


On 2022/12/9 16:00, He, Yu wrote: 
> Check carefully on root debugfs available when destroying vgpu, e.g in remove
> case drm minor's debugfs root might already be destroyed, which led to kernel
> oops like below.
> 
> Console: switching to colour dummy device 80x25
> i915 0000:00:02.0: MDEV: Unregistering
> intel_vgpu_mdev b1338b2d-a709-4c23-b766-cc436c36cdf0: Removing from
> iommu group 14
> BUG: kernel NULL pointer dereference, address: 0000000000000150 PGD 0
> P4D 0
> Oops: 0000 [#1] PREEMPT SMP
> CPU: 3 PID: 1046 Comm: driverctl Not tainted 6.1.0-rc2+ #6 Hardware name: HP
> HP ProDesk 600 G3 MT/829D, BIOS P02 Ver. 02.44 09/13/2022
> RIP: 0010:__lock_acquire+0x5e2/0x1f90
> Code: 87 ad 09 00 00 39 05 e1 1e cc 02 0f 82 f1 09 00 00 ba 01 00 00 00 48 83
> c4 48 89 d0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 45 31 ff <48> 81 3f 60 9e c2 b6
> 45 0f 45 f8 83 fe 01 0f 87 55 fa ff ff 89 f0
> RSP: 0018:ffff9f770274f948 EFLAGS: 00010046
> RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000150
> RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> R10: ffff8895d1173300 R11: 0000000000000001 R12: 0000000000000000
> R13: 0000000000000150 R14: 0000000000000000 R15: 0000000000000000
> FS:  00007fc9b2ba0740(0000) GS:ffff889cdfcc0000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000150 CR3: 000000010fd93005 CR4: 00000000003706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  lock_acquire+0xbf/0x2b0
>  ? simple_recursive_removal+0xa5/0x2b0
>  ? lock_release+0x13d/0x2d0
>  down_write+0x2a/0xd0
>  ? simple_recursive_removal+0xa5/0x2b0
>  simple_recursive_removal+0xa5/0x2b0
>  ? start_creating.part.0+0x110/0x110
>  ? _raw_spin_unlock+0x29/0x40
>  debugfs_remove+0x40/0x60
>  intel_gvt_debugfs_remove_vgpu+0x15/0x30 [kvmgt]
>  intel_gvt_destroy_vgpu+0x60/0x100 [kvmgt]
>  intel_vgpu_release_dev+0xe/0x20 [kvmgt]
>  device_release+0x30/0x80
>  kobject_put+0x79/0x1b0
>  device_release_driver_internal+0x1b8/0x230
>  bus_remove_device+0xec/0x160
>  device_del+0x189/0x400
>  ? up_write+0x9c/0x1b0
>  ? mdev_device_remove_common+0x60/0x60 [mdev]
>  mdev_device_remove_common+0x22/0x60 [mdev]
>  mdev_device_remove_cb+0x17/0x20 [mdev]
>  device_for_each_child+0x56/0x80
>  mdev_unregister_parent+0x5a/0x81 [mdev]
>  intel_gvt_clean_device+0x2d/0xe0 [kvmgt]
>  intel_gvt_driver_remove+0x2e/0xb0 [i915]
>  i915_driver_remove+0xac/0x100 [i915]
>  i915_pci_remove+0x1a/0x30 [i915]
>  pci_device_remove+0x31/0xa0
>  device_release_driver_internal+0x1b8/0x230
>  unbind_store+0xd8/0x100
>  kernfs_fop_write_iter+0x156/0x210
>  vfs_write+0x236/0x4a0
>  ksys_write+0x61/0xd0
>  do_syscall_64+0x55/0x80
>  ? find_held_lock+0x2b/0x80
>  ? lock_release+0x13d/0x2d0
>  ? up_read+0x17/0x20
>  ? lock_is_held_type+0xe3/0x140
>  ? asm_exc_page_fault+0x22/0x30
>  ? lockdep_hardirqs_on+0x7d/0x100
>  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7fc9b2c9e0c4
> Code: 15 71 7d 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa
> 80 3d 3d 05 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f
> 1f 00 48 83 ec 28 48 89 54 24 18 48
> RSP: 002b:00007ffec29c81c8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000001
> RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc9b2c9e0c4
> RDX: 000000000000000d RSI: 0000559f8b5f48a0 RDI: 0000000000000001
> RBP: 0000559f8b5f48a0 R08: 0000559f8b5f3540 R09: 00007fc9b2d76d30
> R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000000d
> R13: 00007fc9b2d77780 R14: 000000000000000d R15: 00007fc9b2d72a00
> </TASK> Modules linked in: sunrpc intel_rapl_msr intel_rapl_common
> intel_pmc_core_pltdrv intel_pmc_core intel_tcc_cooling
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ee1004 igbvf
> rapl vfat fat intel_cstate intel_uncore pktcdvd i2c_i801 pcspkr wmi_bmof
> i2c_smbus acpi_pad vfio_pci vfio_pci_core vfio_virqfd zram fuse dm_multipath
> kvmgt mdev vfio_iommu_type1 vfio kvm irqbypass i915 nvme e1000e igb
> nvme_core crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni
> polyval_generic serio_raw ghash_clmulni_intel sha512_ssse3 dca drm_buddy
> intel_gtt video wmi drm_display_helper ttm
> CR2: 0000000000000150
> ---[ end trace 0000000000000000 ]---
> 
> Cc: Wang Zhi <zhi.a.wang at intel.com>
> Cc: He Yu <yu.he at intel.com>
> Cc: Alex Williamson <alex.williamson at redhat.com>
> Fixes: bc7b0be316ae ("drm/i915/gvt: Add basic debugfs infrastructure")
> Signed-off-by: Zhenyu Wang <zhenyuw at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gvt/debugfs.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/debugfs.c
> b/drivers/gpu/drm/i915/gvt/debugfs.c
> index e64d97e44716..0616b73175f3 100644
> --- a/drivers/gpu/drm/i915/gvt/debugfs.c
> +++ b/drivers/gpu/drm/i915/gvt/debugfs.c
> @@ -192,8 +192,13 @@ void intel_gvt_debugfs_add_vgpu(struct intel_vgpu
> *vgpu)
>   */
>  void intel_gvt_debugfs_remove_vgpu(struct intel_vgpu *vgpu)  {
> -	debugfs_remove_recursive(vgpu->debugfs);
> -	vgpu->debugfs = NULL;
> +	struct intel_gvt *gvt = vgpu->gvt;
> +	struct drm_minor *minor = gvt->gt->i915->drm.primary;
> +
> +	if (minor->debugfs_root && gvt->debugfs_root) {
> +		debugfs_remove_recursive(vgpu->debugfs);
> +		vgpu->debugfs = NULL;
> +	}
>  }
> 
>  /**
> --
> 2.38.1

With this patch, no "kernel NULL pointer" error occurred when removing GPU through unbinding. And display would be inactive after removing GPU as expected.
Tested-by: Yu He <yu.he at intel.com>
--
Best regards,
He,Yu


More information about the intel-gvt-dev mailing list