[drm/ttm] Memory corruption problem when ttm_tt_init() fails.

Tetsuo Handa penguin-kernel at I-love.SAKURA.ne.jp
Wed Jan 21 03:56:53 PST 2015


I'm doing memory allocation failure injection test using 3.19-rc5 and
it seems to me that there is a memory corruption bug in ttm or vmwgfx code.

---------- Crash pattern 1 start ----------
[   80.751971] [TTM] Failed allocating page table
[   83.000393] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   83.004392] IP: [<ffffffff811b65a9>] __fput+0x39/0x1e0
[   83.006944] PGD 7acd2067 PUD 7b0c7067 PMD 0
[   83.009240] Oops: 0000 [#1] SMP
[   83.010940] Modules linked in: stap_fault_injection(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel dm_mirror ghash_clmulni_intel dm_region_hash aesni_intel dm_log glue_helper dm_mod lrw gf128mul ablk_helper cryptd ppdev
vmw_balloon microcode serio_raw pcspkr parport_pc shpchp parport vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput sd_mod ata_generic pata_acpi mptspi scsi_transport_spi mptscsih ata_piix e1000 mptbase libata floppy
[   83.038033] CPU: 2 PID: 8795 Comm: sh Tainted: G        W  OE  3.19.0-rc5+ #28
[   83.039666] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   83.042110] task: ffff88007a220000 ti: ffff880052048000 task.ti: ffff880052048000
[   83.043865] RIP: 0010:[<ffffffff811b65a9>]  [<ffffffff811b65a9>] __fput+0x39/0x1e0
[   83.045665] RSP: 0018:ffff88005204bea8  EFLAGS: 00010297
[   83.046895] RAX: 0000000000000000 RBX: ffff88007aff3500 RCX: 0000000000000a0a
[   83.048595] RDX: 000000000002801d RSI: 000000000000000a RDI: ffff88007aff3500
[   83.050254] RBP: ffff88005204bee8 R08: ffff88007cbfd000 R09: 0000000180080006
[   83.051848] R10: 0000000000000000 R11: ffffea0001f2fe00 R12: ffffffff81e6c040
[   83.053515] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   83.055156] FS:  0000000000000000(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
[   83.057000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   83.058328] CR2: 0000000000000000 CR3: 000000007b0bc000 CR4: 00000000000407e0
[   83.060004] Stack:
[   83.060482]  ffff88007af0de48 ffff88007af0dc00 ffff88007af0de48 0000000000000000
[   83.062285]  ffffffff81e6c040 ffff88007a220610 ffff88007a220000 0000000000000000
[   83.064115]  ffff88005204bef8 ffffffff811b679e ffff88005204bf28 ffffffff81088f6f
[   83.065956] Call Trace:
[   83.066544]  [<ffffffff811b679e>] ____fput+0xe/0x10
[   83.067738]  [<ffffffff81088f6f>] task_work_run+0xaf/0xf0
[   83.068971]  [<ffffffff81013c5a>] do_notify_resume+0x7a/0x90
[   83.070307]  [<ffffffff816a6d87>] int_signal+0x12/0x17
[   83.071464] Code: 55 41 54 53 48 89 fb 48 83 ec 18 4c 8b 7f 18 4c 8b 77 10 4c 8b 6f 20 e8 06 c7 4e 00 8b 53 44 4c 8b 53 20 89 d0 83 e0 02 83 f8 01 <41> 0f b7 02 45 19 e4 41 83 e4 08 41 83 c4 08 44 89 e1 66 25 00
[   83.077450] RIP  [<ffffffff811b65a9>] __fput+0x39/0x1e0
[   83.078729]  RSP <ffff88005204bea8>
[   83.079522] CR2: 0000000000000000

crash> bt -l
PID: 8795   TASK: ffff88007a220000  CPU: 2   COMMAND: "sh"
 #0 [ffff88005204ba70] machine_kexec at ffffffff8104ef62
    /usr/src/linux/arch/x86/kernel/machine_kexec_64.c: 320
 #1 [ffff88005204bac0] crash_kexec at ffffffff810ed983
    /usr/src/linux/kernel/kexec.c: 1482
 #2 [ffff88005204bb90] oops_end at ffffffff810176e8
    /usr/src/linux/arch/x86/kernel/dumpstack.c: 231
 #3 [ffff88005204bbc0] no_context at ffffffff8169af1f
    /usr/src/linux/arch/x86/mm/fault.c: 724
 #4 [ffff88005204bc20] __bad_area_nosemaphore at ffffffff8169aff6
    /usr/src/linux/arch/x86/mm/fault.c: 804
 #5 [ffff88005204bc70] bad_area at ffffffff8169b31f
    /usr/src/linux/arch/x86/mm/fault.c: 833
 #6 [ffff88005204bca0] __do_page_fault at ffffffff81059b37
    /usr/src/linux/arch/x86/mm/fault.c: 1213
 #7 [ffff88005204bdc0] do_page_fault at ffffffff81059c11
    /usr/src/linux/arch/x86/mm/fault.c: 1295
 #8 [ffff88005204bdf0] page_fault at ffffffff816a8a28
    /usr/src/linux/arch/x86/kernel/entry_64.S: 1283
    [exception RIP: __fput+57]
    RIP: ffffffff811b65a9  RSP: ffff88005204bea8  RFLAGS: 00010297
    RAX: 0000000000000000  RBX: ffff88007aff3500  RCX: 0000000000000a0a
    RDX: 000000000002801d  RSI: 000000000000000a  RDI: ffff88007aff3500
    RBP: ffff88005204bee8   R8: ffff88007cbfd000   R9: 0000000180080006
    R10: 0000000000000000  R11: ffffea0001f2fe00  R12: ffffffff81e6c040
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88005204bef0] ____fput at ffffffff811b679e
    /usr/src/linux/fs/file_table.c: 245
#10 [ffff88005204bf00] task_work_run at ffffffff81088f6f
    /usr/src/linux/kernel/task_work.c: 125
#11 [ffff88005204bf30] do_notify_resume at ffffffff81013c5a
    /usr/src/linux/include/linux/tracehook.h: 190
#12 [ffff88005204bf50] int_signal at ffffffff816a6d87
    /usr/src/linux/arch/x86/kernel/entry_64.S: 587
    RIP: 00007f1361d5f420  RSP: 00007fff77be5740  RFLAGS: 00000200
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b
WARNING: possibly bogus exception frame
---------- Crash pattern 1 end ----------

---------- Crash pattern 2 start ----------
[  227.647021] [TTM] Failed allocating page table
[  227.875795] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  227.877714] IP: [<ffffffff81594c57>] skb_queue_tail+0x37/0x60
[  227.879107] PGD 78adc067 PUD 78ada067 PMD 0
[  227.880186] Oops: 0002 [#1] SMP
[  227.881017] Modules linked in: stap_fault_injection(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mirror aesni_intel dm_region_hash dm_log glue_helper dm_mod lrw gf128mul ablk_helper cryptd ppdev
vmw_balloon microcode parport_pc serio_raw pcspkr parport vmw_vmci shpchp i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput ata_generic pata_acpi sd_mod ata_piix libata mptspi scsi_transport_spi e1000 mptscsih mptbase floppy
[  227.898988] CPU: 2 PID: 610 Comm: Xorg Tainted: G        W  OE  3.19.0-rc5+ #28
[  227.900691] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[  227.903162] task: ffff8800788c6040 ti: ffff8800792d8000 task.ti: ffff8800792d8000
[  227.904884] RIP: 0010:[<ffffffff81594c57>]  [<ffffffff81594c57>] skb_queue_tail+0x37/0x60
[  227.906816] RSP: 0018:ffff8800792dbbc8  EFLAGS: 00010046
[  227.908056] RAX: 0000000000000292 RBX: ffff88007cbc6d10 RCX: 0000000000000000
[  227.909718] RDX: 0000000000000000 RSI: 0000000000000292 RDI: ffff88007cbc6d24
[  227.911376] RBP: ffff8800792dbbe8 R08: 0000000000000292 R09: 0180000002800000
[  227.913027] R10: 0000000700020008 R11: 0000000000000000 R12: ffff88007b65aa00
[  227.914690] R13: ffff88007cbc6d24 R14: 0000000000000000 R15: ffff88007cbc6c80
[  227.916356] FS:  00007f3d07740980(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
[  227.918232] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  227.919559] CR2: 0000000000000000 CR3: 0000000078add000 CR4: 00000000000407e0
[  227.921261] Stack:
[  227.921744]  0000000000000078 ffff88007b65aa00 0000000000000078 0000000000000000
[  227.923618]  ffff8800792dbca8 ffffffff816491bd ffff88007cbc6d10 ffff8800792dbd10
[  227.925427]  0000007800000000 ffff8800792dbcc8 0000000000000078 ffff88007cbc6f78
[  227.927271] Call Trace:
[  227.927872]  [<ffffffff816491bd>] unix_stream_sendmsg+0x1dd/0x430
[  227.929301]  [<ffffffff8158c0c3>] sock_aio_write+0x103/0x140
[  227.930638]  [<ffffffff811b42ec>] do_sync_readv_writev+0x4c/0x80
[  227.932047]  [<ffffffff811b5c95>] do_readv_writev+0x1e5/0x280
[  227.933406]  [<ffffffff8101fe4b>] ? __restore_xstate_sig+0x8b/0x680
[  227.934865]  [<ffffffff81104424>] ? __audit_syscall_entry+0xb4/0x110
[  227.936371]  [<ffffffff811b5db9>] vfs_writev+0x39/0x50
[  227.937565]  [<ffffffff811b5eea>] SyS_writev+0x4a/0xd0
[  227.938777]  [<ffffffff816a6d6c>] ? int_check_syscall_exit_work+0x34/0x3d
[  227.940364]  [<ffffffff816a6ae9>] system_call_fastpath+0x12/0x17
[  227.941775] Code: 8d 6f 14 41 54 49 89 f4 53 48 89 fb 4c 89 ef 48 83 ec 08 e8 dc 1a 11 00 48 8b 53 08 49 89 1c 24 4c 89 ef 48 89 c6 49 89 54 24 08 <4c> 89 22 83 43 10 01 4c 89 63 08 e8 09 17 11 00 48 83 c4 08 5b
[  227.947880] RIP  [<ffffffff81594c57>] skb_queue_tail+0x37/0x60
[  227.949297]  RSP <ffff8800792dbbc8>
[  227.950112] CR2: 0000000000000000

crash> bt -l
PID: 610    TASK: ffff8800788c6040  CPU: 2   COMMAND: "Xorg"
 #0 [ffff8800792db790] machine_kexec at ffffffff8104ef62
    /usr/src/linux/arch/x86/kernel/machine_kexec_64.c: 320
 #1 [ffff8800792db7e0] crash_kexec at ffffffff810ed983
    /usr/src/linux/kernel/kexec.c: 1482
 #2 [ffff8800792db8b0] oops_end at ffffffff810176e8
    /usr/src/linux/arch/x86/kernel/dumpstack.c: 231
 #3 [ffff8800792db8e0] no_context at ffffffff8169af1f
    /usr/src/linux/arch/x86/mm/fault.c: 724
 #4 [ffff8800792db940] __bad_area_nosemaphore at ffffffff8169aff6
    /usr/src/linux/arch/x86/mm/fault.c: 804
 #5 [ffff8800792db990] bad_area at ffffffff8169b31f
    /usr/src/linux/arch/x86/mm/fault.c: 833
 #6 [ffff8800792db9c0] __do_page_fault at ffffffff81059b37
    /usr/src/linux/arch/x86/mm/fault.c: 1213
 #7 [ffff8800792dbae0] do_page_fault at ffffffff81059c11
    /usr/src/linux/arch/x86/mm/fault.c: 1295
 #8 [ffff8800792dbb10] page_fault at ffffffff816a8a28
    /usr/src/linux/arch/x86/kernel/entry_64.S: 1283
    [exception RIP: skb_queue_tail+55]
    RIP: ffffffff81594c57  RSP: ffff8800792dbbc8  RFLAGS: 00010046
    RAX: 0000000000000292  RBX: ffff88007cbc6d10  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000292  RDI: ffff88007cbc6d24
    RBP: ffff8800792dbbe8   R8: 0000000000000292   R9: 0180000002800000
    R10: 0000000700020008  R11: 0000000000000000  R12: ffff88007b65aa00
    R13: ffff88007cbc6d24  R14: 0000000000000000  R15: ffff88007cbc6c80
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff8800792dbbf0] unix_stream_sendmsg at ffffffff816491bd
    /usr/src/linux/net/unix/af_unix.c: 1711
#10 [ffff8800792dbcb0] sock_aio_write at ffffffff8158c0c3
    /usr/src/linux/net/socket.c: 955
#11 [ffff8800792dbd90] do_sync_readv_writev at ffffffff811b42ec
    /usr/src/linux/fs/read_write.c: 697
#12 [ffff8800792dbe20] do_readv_writev at ffffffff811b5c95
    /usr/src/linux/fs/read_write.c: 851
#13 [ffff8800792dbf20] vfs_writev at ffffffff811b5db9
    /usr/src/linux/fs/read_write.c: 893
#14 [ffff8800792dbf30] sys_writev at ffffffff811b5eea
    /usr/src/linux/fs/read_write.c: 926
#15 [ffff8800792dbf80] system_call_fastpath at ffffffff816a6ae9
    /usr/src/linux/arch/x86/kernel/entry_64.S: 423
    RIP: 00007f3d056223c0  RSP: 00007ffff316be40  RFLAGS: 00003293
    RAX: ffffffffffffffda  RBX: ffffffff816a6ae9  RCX: ffffffffffffffff
    RDX: 0000000000000001  RSI: 00007ffff316af90  RDI: 0000000000000014
    RBP: 0000000001d59be0   R8: 0000000000000000   R9: 0000000000000004
    R10: 00000000ffffffff  R11: 0000000000003293  R12: 00007f3d077406a0
    R13: 0000000000000001  R14: 00007ffff316af90  R15: 0000000000000000
    ORIG_RAX: 0000000000000014  CS: 0033  SS: 002b
---------- Crash pattern 2 end ----------

---------- Crash pattern 3 start ----------
[   88.675004] [TTM] Failed allocating page table
[   88.678152] BUG: unable to handle kernel paging request at ffff8801531d77c0
[   88.679845] IP: [<ffffffff815964b5>] __alloc_skb+0x165/0x2b0
[   88.681221] PGD 1f2b067 PUD 0
[   88.682000] Oops: 0002 [#1] SMP
[   88.682838] Modules linked in: stap_fault_injection(OE) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter ip_tables coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mirror dm_region_hash aesni_intel dm_log glue_helper dm_mod lrw gf128mul ablk_helper cryptd ppdev
vmw_balloon microcode serio_raw pcspkr parport_pc shpchp vmw_vmci parport i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc uinput sd_mod ata_generic pata_acpi e1000 ata_piix libata mptspi scsi_transport_spi mptscsih mptbase floppy
[   88.701377] CPU: 0 PID: 3904 Comm: gnome-shell Tainted: G        W  OE  3.19.0-rc5+ #31
[   88.703292] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[   88.705840] task: ffff880079e05780 ti: ffff88007918c000 task.ti: ffff88007918c000
[   88.707575] RIP: 0010:[<ffffffff815964b5>]  [<ffffffff815964b5>] __alloc_skb+0x165/0x2b0
[   88.709601] RSP: 0018:ffff88007918faa8  EFLAGS: 00010246
[   88.710884] RAX: 00000000ffffffff RBX: ffff8800531d7700 RCX: 00000000ffffffff
[   88.712584] RDX: ffff8801531d77c0 RSI: 0000000000000000 RDI: ffff8800531d77c8
[   88.714260] RBP: ffff88007918faf8 R08: 00000000ffffffc0 R09: 0000000000000200
[   88.715927] R10: ffffffff8159639e R11: ffff88007f803700 R12: ffff8800531d7800
[   88.717648] R13: 00000000ffffffff R14: ffff88007f803700 R15: 0000000000000100
[   88.719327] FS:  00007fcafd8aaa00(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[   88.721216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   88.722548] CR2: ffff8801531d77c0 CR3: 00000000790ae000 CR4: 00000000000407f0
[   88.724257] Stack:
[   88.724761]  ffff88007a2cdf00 000000007a01b800 0000000000000246 00ff88007918fcd8
[   88.726741]  ffff88007918fae8 0000000000000003 0000000000000000 ffff88007918fba8
[   88.728578]  ffff88007a01b800 0000000000000000 ffff88007918fb58 ffffffff81596d5c
[   88.730582] Call Trace:
[   88.731243]  [<ffffffff81596d5c>] alloc_skb_with_frags+0x5c/0x1e0
[   88.732725]  [<ffffffff811c99bc>] ? do_sys_poll+0x12c/0x5b0
[   88.734208]  [<ffffffff815910b6>] sock_alloc_send_pskb+0x196/0x250
[   88.735710]  [<ffffffff8159b887>] ? skb_copy_datagram_from_iter+0xe7/0x200
[   88.737361]  [<ffffffff8164ba07>] ? wait_for_unix_gc+0x27/0xa0
[   88.738784]  [<ffffffff8164928a>] unix_stream_sendmsg+0x2aa/0x430
[   88.740213]  [<ffffffff8158c0c3>] sock_aio_write+0x103/0x140
[   88.741610]  [<ffffffff811c8860>] ? poll_select_copy_remaining+0x130/0x130
[   88.743278]  [<ffffffff811b42ec>] do_sync_readv_writev+0x4c/0x80
[   88.744721]  [<ffffffff811b5c95>] do_readv_writev+0x1e5/0x280
[   88.746109]  [<ffffffff8158bf9d>] ? SYSC_recvfrom+0x13d/0x160
[   88.747452]  [<ffffffff81104424>] ? __audit_syscall_entry+0xb4/0x110
[   88.748992]  [<ffffffff811b5db9>] vfs_writev+0x39/0x50
[   88.750192]  [<ffffffff811b5eea>] SyS_writev+0x4a/0xd0
[   88.751423]  [<ffffffff811046b6>] ? __audit_syscall_exit+0x236/0x2e0
[   88.753121]  [<ffffffff816a6ae9>] system_call_fastpath+0x12/0x17
[   88.754650] Code: b6 83 90 00 00 00 83 e0 f7 09 c8 b9 ff ff ff ff 85 f6 88 83 90 00 00 00 b8 ff ff ff ff 66 89 8b c2 00 00 00 66 89 83 c6 00 00 00 <48> c7 02 00 00 00 00 48 c7 42 08 00 00 00 00 48 c7 42 10 00 00
[   88.761554] RIP  [<ffffffff815964b5>] __alloc_skb+0x165/0x2b0
[   88.763077]  RSP <ffff88007918faa8>
[   88.763978] CR2: ffff8801531d77c0

crash> bt -l
PID: 3904   TASK: ffff880079e05780  CPU: 0   COMMAND: "gnome-shell"
 #0 [ffff88007918f690] machine_kexec at ffffffff8104ef62
    /usr/src/linux/arch/x86/kernel/machine_kexec_64.c: 320
 #1 [ffff88007918f6e0] crash_kexec at ffffffff810ed983
    /usr/src/linux/kernel/kexec.c: 1482
 #2 [ffff88007918f7b0] oops_end at ffffffff810176e8
    /usr/src/linux/arch/x86/kernel/dumpstack.c: 231
 #3 [ffff88007918f7e0] no_context at ffffffff8169af1f
    /usr/src/linux/arch/x86/mm/fault.c: 724
 #4 [ffff88007918f840] __bad_area_nosemaphore at ffffffff8169aff6
    /usr/src/linux/arch/x86/mm/fault.c: 804
 #5 [ffff88007918f890] bad_area_nosemaphore at ffffffff8169b162
    /usr/src/linux/arch/x86/mm/fault.c: 812
 #6 [ffff88007918f8a0] __do_page_fault at ffffffff810596f8
    /usr/src/linux/arch/x86/mm/fault.c: 1277
 #7 [ffff88007918f9c0] do_page_fault at ffffffff81059c11
    /usr/src/linux/arch/x86/mm/fault.c: 1295
 #8 [ffff88007918f9f0] page_fault at ffffffff816a8a28
    /usr/src/linux/arch/x86/kernel/entry_64.S: 1283
    [exception RIP: __alloc_skb+357]
    RIP: ffffffff815964b5  RSP: ffff88007918faa8  RFLAGS: 00010246
    RAX: 00000000ffffffff  RBX: ffff8800531d7700  RCX: 00000000ffffffff
    RDX: ffff8801531d77c0  RSI: 0000000000000000  RDI: ffff8800531d77c8
    RBP: ffff88007918faf8   R8: 00000000ffffffc0   R9: 0000000000000200
    R10: ffffffff8159639e  R11: ffff88007f803700  R12: ffff8800531d7800
    R13: 00000000ffffffff  R14: ffff88007f803700  R15: 0000000000000100
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff88007918fb00] alloc_skb_with_frags at ffffffff81596d5c
    /usr/src/linux/net/core/skbuff.c: 4386
#10 [ffff88007918fb60] sock_alloc_send_pskb at ffffffff815910b6
    /usr/src/linux/net/core/sock.c: 1826
#11 [ffff88007918fbf0] unix_stream_sendmsg at ffffffff8164928a
    /usr/src/linux/net/unix/af_unix.c: 1682
#12 [ffff88007918fcb0] sock_aio_write at ffffffff8158c0c3
    /usr/src/linux/net/socket.c: 955
#13 [ffff88007918fd90] do_sync_readv_writev at ffffffff811b42ec
    /usr/src/linux/fs/read_write.c: 697
#14 [ffff88007918fe20] do_readv_writev at ffffffff811b5c95
    /usr/src/linux/fs/read_write.c: 851
#15 [ffff88007918ff20] vfs_writev at ffffffff811b5db9
    /usr/src/linux/fs/read_write.c: 893
#16 [ffff88007918ff30] sys_writev at ffffffff811b5eea
    /usr/src/linux/fs/read_write.c: 926
#17 [ffff88007918ff80] system_call_fastpath at ffffffff816a6ae9
    /usr/src/linux/arch/x86/kernel/entry_64.S: 423
    RIP: 00007fcaf3c273c0  RSP: 00007fffadd91330  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff816a6ae9  RCX: 00007fffadd91360
    RDX: 0000000000000002  RSI: 00007fffadd914b0  RDI: 0000000000000006
    RBP: 0000000000b5c230   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000293  R12: 00007fffadd91428
    R13: 00007fffadd91424  R14: 0000000000b5c248  R15: 0000000000000001
    ORIG_RAX: 0000000000000014  CS: 0033  SS: 002b
---------- Crash pattern 3 end ----------

---------- Failed memory allocation start ----------
0xffffffff81199850 : __kmalloc+0x0/0x280 [kernel]
    /usr/src/linux/mm/slub.c:3247
0xffffffff814676fa : ttm_tt_init+0x8a/0xb0 [kernel]
    /usr/src/linux/include/linux/slab.h:524
    /usr/src/linux/include/linux/slab.h:535
    /usr/src/linux/include/drm/drm_mem_util.h:38
    /usr/src/linux/drivers/gpu/drm/ttm/ttm_tt.c:53
    /usr/src/linux/drivers/gpu/drm/ttm/ttm_tt.c:200
0xffffffff8147caa6 : vmw_ttm_tt_create+0x76/0xb0 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c:700
0xffffffff81467b8d : ttm_bo_add_ttm+0x9d/0xe0 [kernel]
    /usr/src/linux/drivers/gpu/drm/ttm/ttm_bo.c:238
0xffffffff8146a2ff : ttm_bo_validate+0x14f/0x1f0 [kernel]
    /usr/src/linux/drivers/gpu/drm/ttm/ttm_bo.c:1067
0xffffffff8146a5d4 : ttm_bo_init+0x234/0x470 [kernel]
    /usr/src/linux/drivers/gpu/drm/ttm/ttm_bo.c:1167
0xffffffff8147ae9e : vmw_dmabuf_init+0x13e/0x240 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c:435
0xffffffff8147b0cb : vmw_user_dmabuf_alloc+0x8b/0x120 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c:503
0xffffffff8147b202 : vmw_dmabuf_alloc_ioctl+0x52/0xb0 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c:698
0xffffffff814497a4 : drm_ioctl+0x1a4/0x630 [kernel]
    /usr/src/linux/drivers/gpu/drm/drm_ioctl.c:727
0xffffffff814773c9 : vmw_generic_ioctl+0x169/0x260 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1073
0xffffffff814774f5 : vmw_unlocked_ioctl+0x15/0x20 [kernel]
    /usr/src/linux/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:1084
0xffffffff811c7c18 : do_vfs_ioctl+0x2f8/0x510 [kernel]
    /usr/src/linux/fs/ioctl.c:44
    /usr/src/linux/fs/ioctl.c:602
0xffffffff811c7e71 : sys_ioctl+0x41/0x80 [kernel]
    /usr/src/linux/include/linux/file.h:38
    /usr/src/linux/fs/ioctl.c:618
    /usr/src/linux/fs/ioctl.c:608
0xffffffff816a6ae9 : system_call_fastpath+0x12/0x17 [kernel]
    /usr/src/linux/arch/x86/kernel/entry_64.S:423
---------- Failed memory allocation end ----------

If I skip ttm_tt_destroy() call, this bug no longer occurs. Therefore,
I guess that this memory corruption is caused by the destroy function
being called with partially initialized ttm object.

--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -199,8 +199,8 @@ int ttm_tt_init(struct ttm_tt *ttm, struct ttm_bo_device *bdev,
 
 	ttm_tt_alloc_page_directory(ttm);
 	if (!ttm->pages) {
-		ttm_tt_destroy(ttm);
-		pr_err("Failed allocating page table\n");
+		//ttm_tt_destroy(ttm);
+		pr_err("Failed allocating page table, but skip ttm_tt_destroy()\n");
 		return -ENOMEM;
 	}
 	return 0;
 

I can reproduce this problem at least since 3.13.0. I don't know whether
this problem is specific to vmwgfx code or not, for I tested only CentOS 7
with GUI environment on VMware Player 6.

I think you can reproduce this problem by starting a SystemTap script shown
below and then flipping windows using from Ctrl-Alt-F1 to Ctrl-Alt-F7 .

---------- Reproducer start ----------
# stap -g -e 'global is_target%;
probe begin { printf("Probe start!\n"); }
probe module("ttm").function("ttm_tt_init") { is_target[tid()] = 1; }
probe module("ttm").function("ttm_tt_init").return { is_target[tid()] = 0; }
probe kernel.function("__kmalloc") {
  if (($flags & %{ __GFP_NOFAIL | __GFP_WAIT %} ) == %{ __GFP_WAIT %} && is_target[tid()]) {
    print_backtrace();
    $size = 1 << 30;
    exit();
  }
}
probe end { delete is_target; }'
---------- Reproducer end ----------

I can also reproduce below problem using 3.10.0-123.9.3.el7.x86_64 ,
though below problem might be different from above problem.

---------- Crash pattern 4 start ----------
[TTM] Failed allocating page table
------------[ cut here ]------------
WARNING: at lib/list_debug.c:33 __list_add+0xac/0xc0()
list_add corruption. prev->next should be next (ffff88007af4cd98), but was           (null). (prev=ffff88007ac881f0).
Modules linked in: fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 mbcache jbd2 netconsole ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul ppdev glue_helper vmw_balloon ablk_helper cryptd serio_raw parport_pc i2c_piix4 parport
vmw_vmci pcspkr dm_mirror shpchp dm_region_hash dm_log mperf dm_mod nfsd auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi crc32c_intel vmwgfx mptspi ttm scsi_transport_spi mptscsih ahci ata_piix libahci drm mptbase libata e1000 i2c_core floppy [last unloaded: stap_bad36894e80d53e8ee72ce3ee48a27ac_3394]
CPU: 0 PID: 849 Comm: Xorg Tainted: GF       W  O--------------   3.10.0-123.9.3.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
 ffff88007984da10 00000000da42f7a4 ffff88007984d9c8 ffffffff815e239b
 ffff88007984da00 ffffffff8105dee1 ffff88007ac881f0 ffff88007af4cd98
 ffff88007ac881f0 0000000000000282 ffff88007984db98 ffff88007984da68
Call Trace:
 [<ffffffff815e239b>] dump_stack+0x19/0x1b
 [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
 [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffff812cfeec>] __list_add+0xac/0xc0
 [<ffffffffa01a56e9>] vmw_fence_create+0xd9/0x130 [vmwgfx]
 [<ffffffffa0197ef8>] vmw_execbuf_fence_commands+0xc8/0x120 [vmwgfx]
 [<ffffffffa01987b8>] vmw_execbuf_process+0x4f8/0xbe0 [vmwgfx]
 [<ffffffff81194585>] ? __kmalloc+0x55/0x230
 [<ffffffffa0199af8>] do_dmabuf_dirty_sou.isra.9+0x328/0x3c0 [vmwgfx]
 [<ffffffffa00da00c>] ? ttm_read_lock+0x2c/0xd0 [ttm]
 [<ffffffffa00d50a1>] ? ttm_bo_add_to_lru+0x51/0xc0 [ttm]
 [<ffffffffa0199d50>] vmw_framebuffer_dmabuf_dirty+0x1c0/0x1f0 [vmwgfx]
 [<ffffffff81194723>] ? __kmalloc+0x1f3/0x230
 [<ffffffffa012d3f0>] drm_mode_dirtyfb_ioctl+0xe0/0x190 [drm]
 [<ffffffffa011cdb2>] drm_ioctl+0x502/0x630 [drm]
 [<ffffffff815edbb4>] ? __do_page_fault+0x204/0x540
 [<ffffffff812c0e64>] ? timerqueue_del+0x24/0x70
 [<ffffffff81089486>] ? __remove_hrtimer+0x46/0xa0
 [<ffffffffa019ca71>] vmw_unlocked_ioctl+0x51/0x80 [vmwgfx]
 [<ffffffff811c2b25>] do_vfs_ioctl+0x2e5/0x4c0
 [<ffffffff810650d6>] ? do_setitimer+0xe6/0x2a0
 [<ffffffff811c2da1>] SyS_ioctl+0xa1/0xc0
 [<ffffffff815f2a99>] system_call_fastpath+0x16/0x1b
---[ end trace a993c155f4775b96 ]---
------------[ cut here ]------------
WARNING: at lib/list_debug.c:36 __list_add+0x8a/0xc0()
list_add double add: new=ffff88007ac881f0, prev=ffff88007ac881f0, next=ffff88007af4cd98.
Modules linked in: fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 mbcache jbd2 netconsole ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul ppdev glue_helper vmw_balloon ablk_helper cryptd serio_raw parport_pc i2c_piix4 parport
vmw_vmci pcspkr dm_mirror shpchp dm_region_hash dm_log mperf dm_mod nfsd auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi crc32c_intel vmwgfx mptspi ttm scsi_transport_spi mptscsih ahci ata_piix libahci drm mptbase libata e1000 i2c_core floppy [last unloaded: stap_bad36894e80d53e8ee72ce3ee48a27ac_3394]
CPU: 0 PID: 849 Comm: Xorg Tainted: GF       W  O--------------   3.10.0-123.9.3.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
 ffff88007984da10 00000000da42f7a4 ffff88007984d9c8 ffffffff815e239b
 ffff88007984da00 ffffffff8105dee1 ffff88007ac881f0 ffff88007af4cd98
 ffff88007ac881f0 0000000000000282 ffff88007984db98 ffff88007984da68
Call Trace:
 [<ffffffff815e239b>] dump_stack+0x19/0x1b
 [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
 [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffff812cfeca>] __list_add+0x8a/0xc0
 [<ffffffffa01a56e9>] vmw_fence_create+0xd9/0x130 [vmwgfx]
 [<ffffffffa0197ef8>] vmw_execbuf_fence_commands+0xc8/0x120 [vmwgfx]
 [<ffffffffa01987b8>] vmw_execbuf_process+0x4f8/0xbe0 [vmwgfx]
 [<ffffffff81194585>] ? __kmalloc+0x55/0x230
 [<ffffffffa0199af8>] do_dmabuf_dirty_sou.isra.9+0x328/0x3c0 [vmwgfx]
 [<ffffffffa00da00c>] ? ttm_read_lock+0x2c/0xd0 [ttm]
 [<ffffffffa00d50a1>] ? ttm_bo_add_to_lru+0x51/0xc0 [ttm]
 [<ffffffffa0199d50>] vmw_framebuffer_dmabuf_dirty+0x1c0/0x1f0 [vmwgfx]
 [<ffffffff81194723>] ? __kmalloc+0x1f3/0x230
 [<ffffffffa012d3f0>] drm_mode_dirtyfb_ioctl+0xe0/0x190 [drm]
 [<ffffffffa011cdb2>] drm_ioctl+0x502/0x630 [drm]
 [<ffffffff815edbb4>] ? __do_page_fault+0x204/0x540
 [<ffffffff812c0e64>] ? timerqueue_del+0x24/0x70
 [<ffffffff81089486>] ? __remove_hrtimer+0x46/0xa0
 [<ffffffffa019ca71>] vmw_unlocked_ioctl+0x51/0x80 [vmwgfx]
 [<ffffffff811c2b25>] do_vfs_ioctl+0x2e5/0x4c0
 [<ffffffff810650d6>] ? do_setitimer+0xe6/0x2a0
 [<ffffffff811c2da1>] SyS_ioctl+0xa1/0xc0
 [<ffffffff815f2a99>] system_call_fastpath+0x16/0x1b
---[ end trace a993c155f4775b97 ]---
INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 1, t=60019 jiffies, g=6722, c=6721, q=0)
sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU: 0 PID: 849 Comm: Xorg Tainted: GF       W  O--------------   3.10.0-123.9.3.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
task: ffff880077655b00 ti: ffff88007984c000 task.ti: ffff88007984c000
RIP: 0010:[<ffffffff8108ece5>]  [<ffffffff8108ece5>] __wake_up_common+0x5/0x90
RSP: 0018:ffff88007984d9d0  EFLAGS: 00000046
RAX: 0000000000000046 RBX: ffff88007ac88220 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88007ac88220
RBP: ffff88007984da00 R08: 0000000000000000 R09: ffff88007f617320
R10: ffffea000173f700 R11: ffffffffa01a462d R12: 0000000000000046
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
FS:  00007faaaca78980(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007faaa4b3c000 CR3: 000000007baaf000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffff81090af9 ffff88007af4cd80 ffff88007ac881e0 ffff88007ac881f0
 ffff88007984da48 ffff88007ac881e0 ffff88007984da88 ffffffffa01a524b
 ffffc90008680018 ffff88007af4cda8 ffff88007af4cdd8 0000000000000292
Call Trace:
 [<ffffffff81090af9>] ? __wake_up+0x39/0x50
 [<ffffffffa01a524b>] vmw_fences_update+0x11b/0x220 [vmwgfx]
 [<ffffffffa01a2568>] vmw_update_seqno+0x48/0x50 [vmwgfx]
 [<ffffffffa01a2073>] vmw_fifo_send_fence+0x93/0xe0 [vmwgfx]
 [<ffffffffa0197e85>] vmw_execbuf_fence_commands+0x55/0x120 [vmwgfx]
 [<ffffffffa01987b8>] vmw_execbuf_process+0x4f8/0xbe0 [vmwgfx]
 [<ffffffffa01998d0>] do_dmabuf_dirty_sou.isra.9+0x100/0x3c0 [vmwgfx]
 [<ffffffffa00da00c>] ? ttm_read_lock+0x2c/0xd0 [ttm]
 [<ffffffffa00d50a1>] ? ttm_bo_add_to_lru+0x51/0xc0 [ttm]
 [<ffffffffa0199d50>] vmw_framebuffer_dmabuf_dirty+0x1c0/0x1f0 [vmwgfx]
 [<ffffffff81194723>] ? __kmalloc+0x1f3/0x230
 [<ffffffffa012d3f0>] drm_mode_dirtyfb_ioctl+0xe0/0x190 [drm]
 [<ffffffffa011cdb2>] drm_ioctl+0x502/0x630 [drm]
 [<ffffffff815edbb4>] ? __do_page_fault+0x204/0x540
 [<ffffffff812c0e64>] ? timerqueue_del+0x24/0x70
 [<ffffffff81089486>] ? __remove_hrtimer+0x46/0xa0
 [<ffffffffa019ca71>] vmw_unlocked_ioctl+0x51/0x80 [vmwgfx]
 [<ffffffff811c2b25>] do_vfs_ioctl+0x2e5/0x4c0
 [<ffffffff810650d6>] ? do_setitimer+0xe6/0x2a0
 [<ffffffff811c2da1>] SyS_ioctl+0xa1/0xc0
 [<ffffffff815f2a99>] system_call_fastpath+0x16/0x1b
Code: 49 0f af c0 e9 64 ff ff ff 0f 1f 44 00 00 44 8d 4a ff 31 c0 45 31 c0 4d 63 c9 e9 4e ff ff ff 0f 1f 80 00 00 00 00 66 66 66 66 90 <55> 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 4c 8d 67
---------- Crash pattern 4 end ----------


More information about the dri-devel mailing list