[Intel-gfx] [PATCH] drm/i915: Clarify vma lifetime

Matthew Auld matthew.auld at intel.com
Thu Feb 17 12:01:20 UTC 2022


On 10/02/2022 07:19, Thomas Hellström wrote:
> It's unclear what reference the initial vma kref refernce refers to.
> A vma can have multiple weak references, the object vma list,
> the vm's bound list and the GT's closed_list, and the initial vma
> reference can be put from lookups of all these lists.
> 
> With the current implementation this means
> that any holder of yet another vma refcount (currently only
> i915_gem_object_unbind()) needs to be holding two of either
> *) An object refcount,
> *) A vm open count
> *) A vma open count
> 
> in order for us to not risk leaking a reference by having the
> initial vma reference being put twice.
> 
> Address this by re-introducing i915_vma_destroy() which removes all
> weak references of the vma and *then* puts the initial vma refcount.
> This makes a strong vma reference hold on to the vma unconditionally.
> 
> Perhaps a better name would be i915_vma_revoke() or i915_vma_zombify(),
> since other callers may still hold a refcount, but with the prospect of
> being able to replace the vma refcount with the object lock in the near
> future, let's stick with i915_vma_destroy().
> 
> Finally this commit fixes a race in that previously i915_vma_release() and
> now i915_vma_destroy() could destroy a vma without taking the vm->mutex
> after an advisory check that the vma mm_node was not allocated.
> This would race with the ungrab_vma() function creating a trace similar
> to the below one. This was fixed in one __i915_vma_put() callsites in
> Fixes: bc1922e5d349 ("drm/i915: Fix a race between vma / object destruction and unbinding")
> but although not seemingly triggered by CI, that
> is not sufficient. This patch is needed to fix that properly.
> 
> [823.012188] Console: switching to colour dummy device 80x25
> [823.012422] [IGT] gem_ppgtt: executing
> [823.016667] [IGT] gem_ppgtt: starting subtest blt-vs-render-ctx0
> [852.436465] stack segment: 0000 [#1] PREEMPT SMP NOPTI
> [852.436480] CPU: 0 PID: 3200 Comm: gem_ppgtt Not tainted 5.16.0-CI-CI_DRM_11115+ #1
> [852.436489] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR5 RVP, BIOS ADLPFWI1.R00.2422.A00.2110131104 10/13/2021
> [852.436499] RIP: 0010:ungrab_vma+0x9/0x80 [i915]
> [852.436711] Code: ef e8 4b 85 cf e0 e8 36 a3 d6 e0 8b 83 f8 9c 00 00 85 c0 75 e1 5b 5d 41 5c 41 5d c3 e9 d6 fd 14 00 55 53 48 8b af c0 00 00 00 <8b> 45 00 85 c0 75 03 5b 5d c3 48 8b 85 a0 02 00 00 48 89 fb 48 8b
> [852.436727] RSP: 0018:ffffc90006db7880 EFLAGS: 00010246
> [852.436734] RAX: 0000000000000000 RBX: ffffc90006db7598 RCX: 0000000000000000
> [852.436742] RDX: ffff88815349e898 RSI: ffff88815349e858 RDI: ffff88810a284140
> [852.436748] RBP: 6b6b6b6b6b6b6b6b R08: ffff88815349e898 R09: ffff88815349e8e8
> [852.436754] R10: 0000000000000001 R11: 0000000051ef1141 R12: ffff88810a284140
> [852.436762] R13: 0000000000000000 R14: ffff88815349e868 R15: ffff88810a284458
> [852.436770] FS:  00007f5c04b04e40(0000) GS:ffff88849f000000(0000) knlGS:0000000000000000
> [852.436781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [852.436788] CR2: 00007f5c04b38fe0 CR3: 000000010a6e8001 CR4: 0000000000770ef0
> [852.436797] PKRU: 55555554
> [852.436801] Call Trace:
> [852.436806]  <TASK>
> [852.436811]  i915_gem_evict_for_node+0x33c/0x3c0 [i915]
> [852.437014]  i915_gem_gtt_reserve+0x106/0x130 [i915]
> [852.437211]  i915_vma_pin_ww+0x8f4/0xb60 [i915]
> [852.437412]  eb_validate_vmas+0x688/0x860 [i915]
> [852.437596]  i915_gem_do_execbuffer+0xc0e/0x25b0 [i915]
> [852.437770]  ? deactivate_slab+0x5f2/0x7d0
> [852.437778]  ? _raw_spin_unlock_irqrestore+0x50/0x60
> [852.437789]  ? i915_gem_execbuffer2_ioctl+0xc6/0x2c0 [i915]
> [852.437944]  ? init_object+0x49/0x80
> [852.437950]  ? __lock_acquire+0x5e6/0x2580
> [852.437963]  i915_gem_execbuffer2_ioctl+0x116/0x2c0 [i915]
> [852.438129]  ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915]
> [852.438300]  drm_ioctl_kernel+0xac/0x140
> [852.438310]  drm_ioctl+0x201/0x3d0
> [852.438316]  ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915]
> [852.438490]  __x64_sys_ioctl+0x6a/0xa0
> [852.438498]  do_syscall_64+0x37/0xb0
> [852.438507]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [852.438515] RIP: 0033:0x7f5c0415b317
> [852.438523] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
> [852.438542] RSP: 002b:00007ffd765039a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [852.438553] RAX: ffffffffffffffda RBX: 000055e4d7829dd0 RCX: 00007f5c0415b317
> [852.438562] RDX: 00007ffd76503a00 RSI: 00000000c0406469 RDI: 0000000000000017
> [852.438571] RBP: 00007ffd76503a00 R08: 0000000000000000 R09: 0000000000000081
> [852.438579] R10: 00000000ffffff7f R11: 0000000000000246 R12: 00000000c0406469
> [852.438587] R13: 0000000000000017 R14: 00007ffd76503a00 R15: 0000000000000000
> [852.438598]  </TASK>
> [852.438602] Modules linked in: snd_hda_codec_hdmi i915 mei_hdcp x86_pkg_temp_thermal snd_hda_intel snd_intel_dspcfg drm_buddy coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec ttm ghash_clmulni_intel snd_hwdep snd_hda_core e1000e drm_dp_helper ptp snd_pcm mei_me drm_kms_helper pps_core mei syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_lpss_pci smsc75xx usbnet mii
> [852.440310] ---[ end trace e52cdd2fe4fd911c ]---
> 
> Fixes: 7e00897be8bf ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.")
> Fixes: bc1922e5d349 ("drm/i915: Fix a race between vma / object destruction and unbinding")
> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld at intel.com>


More information about the Intel-gfx mailing list