[PATCH] drm/xe: Move device sysfs init to before GT init

Summers, Stuart stuart.summers at intel.com
Fri Aug 22 18:56:53 UTC 2025


I apologize for the noise here... please ignore this review. I think
this was an issue of ordering in another series I'm debugging. I'll
report back if it seems like something we want to pursue.

Thanks,
Stuart

On Fri, 2025-08-22 at 18:28 +0000, Stuart Summers wrote:
> I'm seeing the following splat if running one of the fault
> injection tests in a loop for long enough:
> [  591.853234] sysfs: cannot create duplicate filename
> '/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:0
> 0.0/lb_fan_control_version'
> [  591.853241] CPU: 6 UID: 0 PID: 10800 Comm: xe_fault_inject Kdump:
> loaded Not tainted 6.17.0-rc2+ #74 PREEMPT(voluntary)
> [  591.853245] Hardware name: Intel Corporation Raptor Lake Client
> Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS
> RPLSFWI1.R00.4064.A02.2302091143 02/09/2023
> [  591.853246] Call Trace:
> [  591.853247]  <TASK>
> [  591.853249]  dump_stack_lvl+0xc2/0xf0
> [  591.853256]  dump_stack+0x10/0x20
> [  591.853258]  sysfs_warn_dup+0xd9/0x120
> [  591.853264]  sysfs_add_file_mode_ns+0x296/0x3e0
> [  591.853269]  sysfs_create_file_ns+0x12d/0x1e0
> [  591.853273]  ? __pfx_sysfs_create_file_ns+0x10/0x10
> [  591.853276]  ? pcode_mailbox_rw+0xce/0x180 [xe]
> [  591.853414]  ? mutex_unlock+0x12/0x20
> [  591.853417]  ? xe_pcode_read+0x59/0x80 [xe]
> [  591.853542]  xe_device_sysfs_init+0x2cd/0x350 [xe]
> [  591.853646]  ? __pfx_xe_device_sysfs_init+0x10/0x10 [xe]
> [  591.853750]  ? __devm_add_action+0xa6/0xe0
> [  591.853757]  xe_device_probe+0xb09/0x1bf0 [xe]
> [  591.853871]  ? add_dr+0x180/0x230
> [  591.853879]  ? __pfx_xe_device_probe+0x10/0x10 [xe]
> [  591.853994]  ? xe_pm_init_early+0x345/0x420 [xe]
> [  591.854124]  xe_pci_probe+0x8f8/0x11f0 [xe]
> [  591.854257]  ? __pfx_xe_pci_probe+0x10/0x10 [xe]
> [  591.854383]  local_pci_probe+0xe4/0x1b0
> [  591.854389]  pci_device_probe+0x5b4/0x870
> [  591.854393]  ? __pfx_pci_device_probe+0x10/0x10
> [  591.854395]  ? kernfs_put+0x1d/0x60
> [  591.854398]  ? sysfs_do_create_link_sd+0x91/0x120
> [  591.854402]  ? sysfs_create_link+0x44/0xc0
> [  591.854408]  really_probe+0x1fa/0x950
> [  591.854414]  __driver_probe_device+0x307/0x410
> [  591.854418]  device_driver_attach+0xc9/0x200
> [  591.854423]  bind_store+0xd4/0x150
> [  591.854425]  ? __pfx_bind_store+0x10/0x10
> [  591.854429]  drv_attr_store+0x6a/0xc0
> [  591.854431]  ? __pfx_sysfs_kf_write+0x10/0x10
> [  591.854435]  ? __pfx_drv_attr_store+0x10/0x10
> [  591.854437]  sysfs_kf_write+0xdc/0x130
> [  591.854441]  ? __pfx_sysfs_kf_write+0x10/0x10
> [  591.854444]  kernfs_fop_write_iter+0x373/0x550
> [  591.854449]  vfs_write+0xa5f/0x1380
> [  591.854456]  ? __pfx_vfs_write+0x10/0x10
> [  591.854464]  ? lock_acquire+0x172/0x300
> [  591.854468]  ? __kasan_check_read+0x11/0x20
> [  591.854474]  ksys_write+0x115/0x220
> [  591.854478]  ? __pfx_ksys_write+0x10/0x10
> [  591.854482]  ? __rseq_handle_notify_resume+0x56e/0xda0
> [  591.854488]  __x64_sys_write+0x72/0xc0
> [  591.854492]  x64_sys_call+0x18ec/0x2740
> [  591.854496]  do_syscall_64+0x8f/0xf70
> [  591.854501]  ? trace_irq_disable+0xd9/0x120
> [  591.854506]  ? trace_irq_enable+0xd9/0x120
> [  591.854510]  ? do_syscall_64+0x1c0/0xf70
> [  591.854513]  ? do_syscall_64+0x1c0/0xf70
> [  591.854516]  ? irqentry_exit+0x77/0xb0
> [  591.854519]  ? exc_page_fault+0x95/0x130
> [  591.854523]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  591.854525] RIP: 0033:0x79d8b8b1c574
> [  591.854529] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f
> 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec
> 20 48 89
> [  591.854531] RSP: 002b:00007ffeb4a9bb48 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000001
> [  591.854534] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 000079d8b8b1c574
> [  591.854536] RDX: 000000000000000c RSI: 00007ffeb4a9cfd0 RDI:
> 0000000000000005
> [  591.854537] RBP: 000000000000000c R08: 0000000000000073 R09:
> 0000000000000000
> [  591.854538] R10: 0000000000000000 R11: 0000000000000202 R12:
> 00007ffeb4a9cfd0
> [  591.854540] R13: 0000000000000005 R14: 00005d144cd2fc00 R15:
> 000079d8b8ff7000
> [  591.854549]  </TASK>
> 
> It looks like for some reason there is a chance this can race with
> the GT teardown (based on when each of their respective drmm fini
> handlers are called). If we start the new driver quick enough (just
> a while loop with no delay), it seems like we try to create a sysfs
> entry before the one from the prior driver load has completed being
> removed.
> 
> Add stricter initialization ordering between the sysfs files and the
> GT subsystem by moving the sysfs initialization earlier in the probe
> sequence.
> 
> Signed-off-by: Stuart Summers <stuart.summers at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c
> b/drivers/gpu/drm/xe/xe_device.c
> index 3e0402dff423..f57007faa024 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -876,6 +876,10 @@ int xe_device_probe(struct xe_device *xe)
>         if (err)
>                 return err;
>  
> +       err = xe_device_sysfs_init(xe);
> +       if (err)
> +               goto err_unregister_display;
> +
>         for_each_gt(gt, xe, id) {
>                 err = xe_gt_init(gt);
>                 if (err)
> @@ -922,10 +926,6 @@ int xe_device_probe(struct xe_device *xe)
>         if (err)
>                 goto err_unregister_display;
>  
> -       err = xe_device_sysfs_init(xe);
> -       if (err)
> -               goto err_unregister_display;
> -
>         xe_debugfs_register(xe);
>  
>         err = xe_hwmon_register(xe);



More information about the Intel-xe mailing list