[systemd-devel] what does --ephemeral really do on btrfs?
Elias Probst
mail at eliasprobst.eu
Fri Jul 3 06:35:51 PDT 2015
Looks like I shouldn't have announced success that early.
Even on a fresh raw-backed machines btrfs volume, I'm running into problems.
What I did:
# re-create btrfs volume
machinectl set-limit 6G
# built a mininmal CentOS7.1 image (yum.conf below):
yum -c /tmp/yum.conf groupinstall Base
# yum.conf
[main]
reposdir=/dev/null
gpgcheck=0
logfile=/var/log/yum.log
installroot=/var/lib/machines/centos7.1-base
assumeyes=1
[base]
name=CentOS 7.1.1503 - x86_64
baseurl=http://mirror.centos.org/centos/7.1.1503/os/x86_64/
enabled=1
# tried to create a new machine using the centos7.1-base as template
systemd-nspawn --template=/var/lib/machines/centos7.1-base -bD
/var/lib/machines/storage-test
The result is the same again… hanging 'btrfs-transacti':
kernel: INFO: task btrfs-transacti:3603 blocked for more than 120 seconds.
kernel: Not tainted 4.1.0-gentoo #4
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kernel: btrfs-transacti D ffff88007ac6f8a8 0 3603 2 0x00000080
kernel: ffff88007ac6f8a8 ffff8802361e02d0 ffff88007db44390 0000000000000246
kernel: ffff88007ac70000 ffff88020f11f1e0 ffff88007db44390 ffff88007ac6f8f8
kernel: ffff88007db44390 ffff88007ac6f8c8 ffffffff95963329 0000000000000000
kernel: Call Trace:
kernel: [<ffffffff95963329>] schedule+0x74/0x83
kernel: [<ffffffff953e44eb>] btrfs_tree_read_lock+0xc0/0xea
kernel: [<ffffffff951346bd>] ? wait_woken+0x74/0x74
kernel: [<ffffffff95395c6d>] btrfs_search_old_slot+0x509/0x7fc
kernel: [<ffffffff9538fc41>] ? __tree_mod_log_oldest_root.isra.20+0x46/0x67
kernel: [<ffffffff9540587b>] __resolve_indirect_refs+0x192/0x5d9
kernel: [<ffffffff953d0c46>] ? release_extent_buffer+0x34/0xb1
kernel: [<ffffffff953d105f>] ? free_extent_buffer+0x6e/0x7a
kernel: [<ffffffff95406905>] find_parent_nodes+0xb28/0xfaa
kernel: [<ffffffff95406e19>] __btrfs_find_all_roots+0x92/0xf0
kernel: [<ffffffff95406ed3>] btrfs_find_all_roots+0x45/0x65
kernel: [<ffffffff95391387>] ? btrfs_get_tree_mod_seq+0x72/0x86
kernel: [<ffffffff9540ad70>] btrfs_delayed_qgroup_accounting+0x32b/0x962
kernel: [<ffffffff953a58e4>] btrfs_run_delayed_refs+0x185/0x210
kernel: [<ffffffff953b3e0a>] btrfs_commit_transaction+0x47/0xaf0
kernel: [<ffffffff953b4c5a>] ? start_transaction+0x3a7/0x52a
kernel: [<ffffffff953b0550>] transaction_kthread+0xfa/0x1d0
kernel: [<ffffffff953b0456>] ? btrfs_cleanup_transaction+0x498/0x498
kernel: [<ffffffff9511b44d>] kthread+0xcd/0xd5
kernel: [<ffffffff95120000>] ?
ftrace_raw_output_sched_migrate_task+0x7f/0x7f
kernel: [<ffffffff9511b380>] ? kthread_create_on_node+0x17a/0x17a
kernel: [<ffffffff95967d62>] ret_from_fork+0x42/0x70
kernel: [<ffffffff9511b380>] ? kthread_create_on_node+0x17a/0x17a
kernel: INFO: task kworker/u8:7:28547 blocked for more than 120 seconds.
kernel: Not tainted 4.1.0-gentoo #4
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kernel: kworker/u8:7 D ffff88010051f9b8 0 28547 2 0x00000080
kernel: Workqueue: btrfs-delayed-meta btrfs_delayed_meta_helper
kernel: ffff88010051f9b8 ffffffff95e04460 ffff880178028090 0000000000000246
kernel: ffff880100520000 ffff8801086ac308 ffff88020f11f178 ffff880108789010
kernel: 000000000000015b ffff88010051f9d8 ffffffff95963329 0000000000000000
kernel: Call Trace:
kernel: [<ffffffff95963329>] schedule+0x74/0x83
kernel: [<ffffffff953e478f>] btrfs_tree_lock+0xa7/0x1b7
kernel: [<ffffffff951346bd>] ? wait_woken+0x74/0x74
On 07/03/2015 01:31 PM, Elias Probst wrote:
> On 07/02/2015 07:22 AM, Johannes Ernst wrote:
>> Mine takes an awful long time — blocking IO on the device in the awful long meantime — and I’m puzzled why. Does it perhaps copy (deep? references only?) the entire drive?
>>
>
> I've seen a similar issue here which was caused by a obviously corrupted
> btrfs volume.
> 'systemd-nspawn' and 'btrfs-transacti' consumed a full CPU core and
> never stopped doing so - they were just stuck there.
>
> After enabling CONFIG_HANGCHECK_TIMER=y in my Kernel I was able to get
> some more information out of it:
>
> kernel: INFO: task kworker/u8:2:213 blocked for more than 120 seconds.
> kernel: Not tainted 4.1.0-gentoo #4
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> kernel: kworker/u8:2 D ffff8802344d3978 0 213 2 0x00000000
> kernel: Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1)
> kernel: ffff8802344d3978 ffff8802361ee350 ffff880234ad8390 0000000000000246
> kernel: ffff8802344d4000 ffff8801f5296ae0 ffff8802344d3a80 ffff8802304ac000
> kernel: ffff880001447300 ffff8802344d3998 ffffffff94963329 0000000000000000
> kernel: Call Trace:
> kernel: [<ffffffff94963329>] schedule+0x74/0x83
> kernel: [<ffffffff943e478f>] btrfs_tree_lock+0xa7/0x1b7
> kernel: [<ffffffff941346bd>] ? wait_woken+0x74/0x74
> kernel: [<ffffffff943d03eb>] lock_extent_buffer_for_io+0x39/0x199
> kernel: [<ffffffff943d1371>] btree_write_cache_pages+0x306/0x35d
> kernel: [<ffffffff943ad2db>] btree_writepages+0x1e/0x5d
> kernel: [<ffffffff941ad9b9>] do_writepages+0x19/0x27
> kernel: [<ffffffff9420d42d>] __writeback_single_inode+0x4e/0x258
> kernel: [<ffffffff9420d804>] writeback_sb_inodes+0x1cd/0x35a
> kernel: [<ffffffff9420da0b>] __writeback_inodes_wb+0x7a/0xb3
> kernel: [<ffffffff9420dbeb>] wb_writeback+0x10d/0x256
> kernel: [<ffffffff941acb92>] ? bdi_dirty_limit+0x2c/0x8c
> kernel: [<ffffffff9420e27f>] bdi_writeback_workfn+0x1a1/0x352
> kernel: [<ffffffff94116b4b>] process_one_work+0x1d5/0x364
> kernel: [<ffffffff9411756e>] worker_thread+0x2d2/0x3f4
> kernel: [<ffffffff9411729c>] ? cancel_delayed_work_sync+0x10/0x10
> kernel: [<ffffffff9411b44d>] kthread+0xcd/0xd5
> kernel: [<ffffffff94120000>] ?
> ftrace_raw_output_sched_migrate_task+0x7f/0x7f
> kernel: [<ffffffff9411b380>] ? kthread_create_on_node+0x17a/0x17a
> kernel: [<ffffffff94967d62>] ret_from_fork+0x42/0x70
> kernel: [<ffffffff9411b380>] ? kthread_create_on_node+0x17a/0x17a
> kernel: INFO: task btrfs-transacti:2020 blocked for more than 120 seconds.
> kernel: Not tainted 4.1.0-gentoo #4
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> kernel: btrfs-transacti D ffff8802347838a8 0 2020 2 0x00000080
> kernel: ffff8802347838a8 ffff8802361e02d0 ffff88023477a350 0000000000000246
> kernel: ffff880234784000 ffff8801f5296ae0 ffff88023477a350 ffff8802347838f8
> kernel: ffff88023477a350 ffff8802347838c8 ffffffff94963329 0000000000000000
> kernel: Call Trace:
> kernel: [<ffffffff94963329>] schedule+0x74/0x83
> kernel: [<ffffffff943e44eb>] btrfs_tree_read_lock+0xc0/0xea
> kernel: [<ffffffff941346bd>] ? wait_woken+0x74/0x74
> kernel: [<ffffffff94395c6d>] btrfs_search_old_slot+0x509/0x7fc
> kernel: [<ffffffff9440587b>] __resolve_indirect_refs+0x192/0x5d9
> kernel: [<ffffffff943e43e5>] ? btrfs_clear_lock_blocking_rw+0x78/0xbe
> kernel: [<ffffffff943d0c46>] ? release_extent_buffer+0x34/0xb1
> kernel: [<ffffffff943d105f>] ? free_extent_buffer+0x6e/0x7a
> kernel: [<ffffffff94406905>] find_parent_nodes+0xb28/0xfaa
> kernel: [<ffffffff94406e19>] __btrfs_find_all_roots+0x92/0xf0
> kernel: [<ffffffff94406ed3>] btrfs_find_all_roots+0x45/0x65
> kernel: [<ffffffff94391387>] ? btrfs_get_tree_mod_seq+0x72/0x86
> kernel: [<ffffffff9440ad70>] btrfs_delayed_qgroup_accounting+0x32b/0x962
> kernel: [<ffffffff943a58e4>] btrfs_run_delayed_refs+0x185/0x210
> kernel: [<ffffffff943b3e0a>] btrfs_commit_transaction+0x47/0xaf0
> kernel: [<ffffffff943b4c5a>] ? start_transaction+0x3a7/0x52a
> kernel: [<ffffffff943b0550>] transaction_kthread+0xfa/0x1d0
> kernel: [<ffffffff943b0456>] ? btrfs_cleanup_transaction+0x498/0x498
> kernel: [<ffffffff9411b44d>] kthread+0xcd/0xd5
> kernel: [<ffffffff94120000>] ?
> ftrace_raw_output_sched_migrate_task+0x7f/0x7f
> kernel: [<ffffffff9411b380>] ? kthread_create_on_node+0x17a/0x17a
> kernel: [<ffffffff94967d62>] ret_from_fork+0x42/0x70
> kernel: [<ffffffff9411b380>] ? kthread_create_on_node+0x17a/0x17a
>
>
> In this state, I was even unable to fully copy away my machines into a
> tarball which just ended up in
>
> tar: Unexpected EOF in archive
> tar: Unexpected EOF in archive
> tar: Error is not recoverable: exiting now
>
> and
>
> gzip: stdin: unexpected end of file
>
>
> Didn't get a reply in #btrfs regarding this yet and didn't get around
> yet to raise this on the btrfs ML.
>
> I unmounted the machine volume (which was just backed by a raw image on
> my ext4 root volume), moved the raw volume file away for possible later
> analysis and recreated the volume from scratch (machinectl set-limit 6G).
>
>
> - Elias
>
>
>
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20150703/87b57d9d/attachment-0001.sig>
More information about the systemd-devel
mailing list