✓ CI.checkpatch: success for drm/xe/userptr: fix EFAULT handling

Thu Feb 13 14:19:38 UTC 2025

== Series Details ==

Series: drm/xe/userptr: fix EFAULT handling
URL   : https://patchwork.freedesktop.org/series/144799/
State : success

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
22f9cda3436b4fe965b5c5f31d2f2c1bcb483189
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit e21c4aae253ceb67a91f2e72d82eb112b6391a7a
Author: Matthew Auld <matthew.auld at intel.com>
Date:   Thu Feb 13 13:54:35 2025 +0000

    drm/xe/userptr: fix EFAULT handling
    
    Currently we treat EFAULT from hmm_range_fault() as a non-fatal error
    when called from xe_vm_userptr_pin() with the idea that we want to avoid
    killing the entire vm and chucking an error, under the assumption that
    the user just did an unmap or something, and has no intention of
    actually touching that memory from the GPU.  At this point we have
    already zapped the PTEs so any access should generate a page fault, and
    if the pin fails there also it will then become fatal.
    
    However it looks like it's possible for the userptr vma to still be on
    the rebind list in preempt_rebind_work_func(), if we had to retry the
    pin again due to something happening in the caller before we did the
    rebind step, but in the meantime needing to re-validate the userptr and
    this time hitting the EFAULT.
    
    This might explain an internal user report of hitting:
    
    [  191.738349] WARNING: CPU: 1 PID: 157 at drivers/gpu/drm/xe/xe_res_cursor.h:158 xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
    [  191.738551] Workqueue: xe-ordered-wq preempt_rebind_work_func [xe]
    [  191.738616] RIP: 0010:xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
    [  191.738690] Call Trace:
    [  191.738692]  <TASK>
    [  191.738694]  ? show_regs+0x69/0x80
    [  191.738698]  ? __warn+0x93/0x1a0
    [  191.738703]  ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
    [  191.738759]  ? report_bug+0x18f/0x1a0
    [  191.738764]  ? handle_bug+0x63/0xa0
    [  191.738767]  ? exc_invalid_op+0x19/0x70
    [  191.738770]  ? asm_exc_invalid_op+0x1b/0x20
    [  191.738777]  ? xe_pt_stage_bind.constprop.0+0x60a/0x6b0 [xe]
    [  191.738834]  ? ret_from_fork_asm+0x1a/0x30
    [  191.738849]  bind_op_prepare+0x105/0x7b0 [xe]
    [  191.738906]  ? dma_resv_reserve_fences+0x301/0x380
    [  191.738912]  xe_pt_update_ops_prepare+0x28c/0x4b0 [xe]
    [  191.738966]  ? kmemleak_alloc+0x4b/0x80
    [  191.738973]  ops_execute+0x188/0x9d0 [xe]
    [  191.739036]  xe_vm_rebind+0x4ce/0x5a0 [xe]
    [  191.739098]  ? trace_hardirqs_on+0x4d/0x60
    [  191.739112]  preempt_rebind_work_func+0x76f/0xd00 [xe]
    
    Followed by NPD, when running some workload, since the sg was never
    actually populated but the vma is still marked for rebind when it should
    be skipped for this special EFAULT case. And from the logs it does seem
    like we hit this special EFAULT case before the explosions.
    
    Fixes: 521db22a1d70 ("drm/xe: Invalidate userptr VMA on page pin fault")
    Signed-off-by: Matthew Auld <matthew.auld at intel.com>
    Cc: Matthew Brost <matthew.brost at intel.com>
    Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
    Cc: <stable at vger.kernel.org> # v6.10+
+ /mt/dim checkpatch dbf65862427b99b51ac5279560a1f7995779480b drm-intel
e21c4aae253c drm/xe/userptr: fix EFAULT handling