[Intel-gfx] 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages
Hillf Danton
hdanton at sina.com
Fri Aug 9 14:31:27 UTC 2019
[off topic: plain text mail please]
On Fri, 9 Aug 2019 12:41:42 +0000 Martin Wilck wrote:
>
> This happened to me today, running kernel 5.3.0-rc3-1.g571863b-default
> (5.3-rc3 with just a few patches on top), after starting a KVM virtual
> machine. The X screen was frozen. Remote login via ssh was still
> possible, thus I was able to retrieve basic logs.
Thanks for report.
>
> sysrq-w showed two blocked processes (kcompactd0 and KVM). After a
> minute, the same two processes were still blocked. KVM seems to try to
> acquire a lock that kcompactd is holding. kcompactd is waiting for IO
> to complete on pages owned by the i915 driver.
>
> kcompactd stack:
>
> Aug 09 12:12:48 apollon.suse.de kernel: sysrq: Show Blocked State
> Aug 09 12:12:48 apollon.suse.de kernel: task PC stack pid father
> Aug 09 12:12:48 apollon.suse.de kernel: kcompactd0 D 0 43 2 0x80004000
> Aug 09 12:12:48 apollon.suse.de kernel: Call Trace:
> Aug 09 12:12:48 apollon.suse.de kernel: ? __schedule+0x2af/0x6a0
> Aug 09 12:12:48 apollon.suse.de kernel: schedule+0x33/0x90
> Aug 09 12:12:48 apollon.suse.de kernel: io_schedule+0x12/0x40
> Aug 09 12:12:48 apollon.suse.de kernel: __lock_page+0x123/0x200
> Aug 09 12:12:48 apollon.suse.de kernel: ? gen8_ppgtt_clear_pdp+0xc0/0x140 [i915]
> Aug 09 12:12:48 apollon.suse.de kernel: ? file_fdatawait_range+0x20/0x20
> Aug 09 12:12:48 apollon.suse.de kernel: set_page_dirty_lock+0x49/0x50
> Aug 09 12:12:48 apollon.suse.de kernel: i915_gem_userptr_put_pages+0x13f/0x1c0 [i915]
The two lines above show commit aa56a292ce62 ("drm/i915/userptr: Acquire
the page lock around set_page_dirty()") is culprit.
> Aug 09 12:12:48 apollon.suse.de kernel: __i915_gem_object_put_pages+0x5e/0xa0 [i915]
> Aug 09 12:12:48 apollon.suse.de kernel: userptr_mn_invalidate_range_start+0x1ff/0x220 [i915]
> Aug 09 12:12:48 apollon.suse.de kernel: __mmu_notifier_invalidate_range_start+0x57/0xa0
> Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap_one+0xa0b/0xae0
> Aug 09 12:12:48 apollon.suse.de kernel: ? __mod_lruvec_state+0x3f/0xf0
> Aug 09 12:12:48 apollon.suse.de kernel: rmap_walk_file+0xf2/0x250
> Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap+0xa6/0xe0
Page is locked before try_to_unmap(), and dirty page table entry is
handled in try_to_unmap_one(), so what was added in aa56a292ce62 is
a bit of overaction in this call trace. A bigger pain is it can not
be reverted because of the Fixes tag in it.
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_remove_rmap+0x290/0x290
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_not_mapped+0x20/0x20
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_get_anon_vma+0x80/0x80
> Aug 09 12:12:48 apollon.suse.de kernel: migrate_pages+0x8cd/0xbc0
> Aug 09 12:12:48 apollon.suse.de kernel: ? fast_isolate_freepages+0x6b0/0x6b0
> Aug 09 12:12:48 apollon.suse.de kernel: ? move_freelist_tail+0xb0/0xb0
> Aug 09 12:12:48 apollon.suse.de kernel: compact_zone+0x669/0xc80
> Aug 09 12:12:48 apollon.suse.de kernel: ? entry_SYSCALL_64_after_hwframe+0xb8/0xbe
> Aug 09 12:12:48 apollon.suse.de kernel: kcompactd_do_work+0x120/0x290
>
> KVM stack:
>
> Aug 09 12:12:48 apollon.suse.de kernel: CPU 0/KVM D 0 25189 1 0x00000320
> Aug 09 12:12:48 apollon.suse.de kernel: Call Trace:
> Aug 09 12:12:48 apollon.suse.de kernel: ? __schedule+0x2af/0x6a0
> Aug 09 12:12:48 apollon.suse.de kernel: schedule+0x33/0x90
> Aug 09 12:12:48 apollon.suse.de kernel: schedule_preempt_disabled+0xa/0x10
> Aug 09 12:12:48 apollon.suse.de kernel: __mutex_lock.isra.0+0x172/0x4d0
> Aug 09 12:12:48 apollon.suse.de kernel: userptr_mn_invalidate_range_start+0x1bf/0x220 [i915]
> Aug 09 12:12:48 apollon.suse.de kernel: __mmu_notifier_invalidate_range_start+0x57/0xa0
> Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap_one+0xa0b/0xae0
> Aug 09 12:12:48 apollon.suse.de kernel: rmap_walk_file+0xf2/0x250
> Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap+0xa6/0xe0
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_remove_rmap+0x290/0x290
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_not_mapped+0x20/0x20
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_get_anon_vma+0x80/0x80
> Aug 09 12:12:48 apollon.suse.de kernel: migrate_pages+0x8cd/0xbc0
> Aug 09 12:12:48 apollon.suse.de kernel: ? fast_isolate_freepages+0x6b0/0x6b0
> Aug 09 12:12:48 apollon.suse.de kernel: ? move_freelist_tail+0xb0/0xb0
> Aug 09 12:12:48 apollon.suse.de kernel: compact_zone+0x669/0xc80
> Aug 09 12:12:48 apollon.suse.de kernel: compact_zone_order+0xc6/0xf0
> Aug 09 12:12:48 apollon.suse.de kernel: try_to_compact_pages+0xcc/0x2a0
> Aug 09 12:12:48 apollon.suse.de kernel: __alloc_pages_direct_compact+0x7c/0x150
> Aug 09 12:12:48 apollon.suse.de kernel: __alloc_pages_slowpath+0x1ee/0xd00
> Aug 09 12:12:48 apollon.suse.de kernel: ? vmx_vcpu_load+0x100/0x120 [kvm_intel]
>
> Full logs can be found under https://pastebin.com/KJ6tccj4
> I haven't yet tried if this is reproducible.
Set page dirty unless someone else is taking care of it.
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -663,7 +663,7 @@ i915_gem_userptr_put_pages(struct drm_i9
i915_gem_gtt_finish_pages(obj, pages);
for_each_sgt_page(page, sgt_iter, pages) {
- if (obj->mm.dirty)
+ if (obj->mm.dirty) {
/*
* As this may not be anonymous memory (e.g. shmem)
* but exist on a real mapping, we have to lock
@@ -672,8 +672,15 @@ i915_gem_userptr_put_pages(struct drm_i9
* prevent the inode from being truncated.
* Play safe and take the lock.
*/
- set_page_dirty_lock(page);
-
+ if (trylock_page(page)) {
+ set_page_dirty(page);
+ unlock_page(page);
+ }
+ /*
+ * else someone else is taking care of page and
+ * we can do nothing about it to avoid deadlock
+ */
+ }
mark_page_accessed(page);
put_page(page);
}
--
More information about the Intel-gfx
mailing list