[systemd-devel] what does --ephemeral really do on btrfs?

Elias Probst mail at eliasprobst.eu
Fri Jul 3 06:35:51 PDT 2015


Looks like I shouldn't have announced success that early.

Even on a fresh raw-backed machines btrfs volume, I'm running into problems.

What I did:
# re-create btrfs volume
machinectl set-limit 6G

# built a mininmal CentOS7.1 image (yum.conf below):
yum -c /tmp/yum.conf groupinstall Base

# yum.conf
[main]
reposdir=/dev/null
gpgcheck=0
logfile=/var/log/yum.log
installroot=/var/lib/machines/centos7.1-base
assumeyes=1

[base]
name=CentOS 7.1.1503 - x86_64
baseurl=http://mirror.centos.org/centos/7.1.1503/os/x86_64/
enabled=1

# tried to create a new machine using the centos7.1-base as template
systemd-nspawn --template=/var/lib/machines/centos7.1-base -bD
/var/lib/machines/storage-test

The result is the same again… hanging 'btrfs-transacti':
kernel: INFO: task btrfs-transacti:3603 blocked for more than 120 seconds.
kernel:       Not tainted 4.1.0-gentoo #4
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kernel: btrfs-transacti D ffff88007ac6f8a8     0  3603      2 0x00000080
kernel:  ffff88007ac6f8a8 ffff8802361e02d0 ffff88007db44390 0000000000000246
kernel:  ffff88007ac70000 ffff88020f11f1e0 ffff88007db44390 ffff88007ac6f8f8
kernel:  ffff88007db44390 ffff88007ac6f8c8 ffffffff95963329 0000000000000000
kernel: Call Trace:
kernel:  [<ffffffff95963329>] schedule+0x74/0x83
kernel:  [<ffffffff953e44eb>] btrfs_tree_read_lock+0xc0/0xea
kernel:  [<ffffffff951346bd>] ? wait_woken+0x74/0x74
kernel:  [<ffffffff95395c6d>] btrfs_search_old_slot+0x509/0x7fc
kernel:  [<ffffffff9538fc41>] ? __tree_mod_log_oldest_root.isra.20+0x46/0x67
kernel:  [<ffffffff9540587b>] __resolve_indirect_refs+0x192/0x5d9
kernel:  [<ffffffff953d0c46>] ? release_extent_buffer+0x34/0xb1
kernel:  [<ffffffff953d105f>] ? free_extent_buffer+0x6e/0x7a
kernel:  [<ffffffff95406905>] find_parent_nodes+0xb28/0xfaa
kernel:  [<ffffffff95406e19>] __btrfs_find_all_roots+0x92/0xf0
kernel:  [<ffffffff95406ed3>] btrfs_find_all_roots+0x45/0x65
kernel:  [<ffffffff95391387>] ? btrfs_get_tree_mod_seq+0x72/0x86
kernel:  [<ffffffff9540ad70>] btrfs_delayed_qgroup_accounting+0x32b/0x962
kernel:  [<ffffffff953a58e4>] btrfs_run_delayed_refs+0x185/0x210
kernel:  [<ffffffff953b3e0a>] btrfs_commit_transaction+0x47/0xaf0
kernel:  [<ffffffff953b4c5a>] ? start_transaction+0x3a7/0x52a
kernel:  [<ffffffff953b0550>] transaction_kthread+0xfa/0x1d0
kernel:  [<ffffffff953b0456>] ? btrfs_cleanup_transaction+0x498/0x498
kernel:  [<ffffffff9511b44d>] kthread+0xcd/0xd5
kernel:  [<ffffffff95120000>] ?
ftrace_raw_output_sched_migrate_task+0x7f/0x7f
kernel:  [<ffffffff9511b380>] ? kthread_create_on_node+0x17a/0x17a
kernel:  [<ffffffff95967d62>] ret_from_fork+0x42/0x70
kernel:  [<ffffffff9511b380>] ? kthread_create_on_node+0x17a/0x17a
kernel: INFO: task kworker/u8:7:28547 blocked for more than 120 seconds.
kernel:       Not tainted 4.1.0-gentoo #4
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kernel: kworker/u8:7    D ffff88010051f9b8     0 28547      2 0x00000080
kernel: Workqueue: btrfs-delayed-meta btrfs_delayed_meta_helper
kernel:  ffff88010051f9b8 ffffffff95e04460 ffff880178028090 0000000000000246
kernel:  ffff880100520000 ffff8801086ac308 ffff88020f11f178 ffff880108789010
kernel:  000000000000015b ffff88010051f9d8 ffffffff95963329 0000000000000000
kernel: Call Trace:
kernel:  [<ffffffff95963329>] schedule+0x74/0x83
kernel:  [<ffffffff953e478f>] btrfs_tree_lock+0xa7/0x1b7
kernel:  [<ffffffff951346bd>] ? wait_woken+0x74/0x74


On 07/03/2015 01:31 PM, Elias Probst wrote:
> On 07/02/2015 07:22 AM, Johannes Ernst wrote:
>> Mine takes an awful long time — blocking IO on the device in the awful long meantime — and I’m puzzled why. Does it perhaps copy (deep? references only?) the entire drive? 
>>
> 
> I've seen a similar issue here which was caused by a obviously corrupted
> btrfs volume.
> 'systemd-nspawn' and 'btrfs-transacti' consumed a full CPU core and
> never stopped doing so - they were just stuck there.
> 
> After enabling CONFIG_HANGCHECK_TIMER=y in my Kernel I was able to get
> some more information out of it:
> 
> kernel: INFO: task kworker/u8:2:213 blocked for more than 120 seconds.
> kernel:       Not tainted 4.1.0-gentoo #4
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> kernel: kworker/u8:2    D ffff8802344d3978     0   213      2 0x00000000
> kernel: Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1)
> kernel:  ffff8802344d3978 ffff8802361ee350 ffff880234ad8390 0000000000000246
> kernel:  ffff8802344d4000 ffff8801f5296ae0 ffff8802344d3a80 ffff8802304ac000
> kernel:  ffff880001447300 ffff8802344d3998 ffffffff94963329 0000000000000000
> kernel: Call Trace:
> kernel:  [<ffffffff94963329>] schedule+0x74/0x83
> kernel:  [<ffffffff943e478f>] btrfs_tree_lock+0xa7/0x1b7
> kernel:  [<ffffffff941346bd>] ? wait_woken+0x74/0x74
> kernel:  [<ffffffff943d03eb>] lock_extent_buffer_for_io+0x39/0x199
> kernel:  [<ffffffff943d1371>] btree_write_cache_pages+0x306/0x35d
> kernel:  [<ffffffff943ad2db>] btree_writepages+0x1e/0x5d
> kernel:  [<ffffffff941ad9b9>] do_writepages+0x19/0x27
> kernel:  [<ffffffff9420d42d>] __writeback_single_inode+0x4e/0x258
> kernel:  [<ffffffff9420d804>] writeback_sb_inodes+0x1cd/0x35a
> kernel:  [<ffffffff9420da0b>] __writeback_inodes_wb+0x7a/0xb3
> kernel:  [<ffffffff9420dbeb>] wb_writeback+0x10d/0x256
> kernel:  [<ffffffff941acb92>] ? bdi_dirty_limit+0x2c/0x8c
> kernel:  [<ffffffff9420e27f>] bdi_writeback_workfn+0x1a1/0x352
> kernel:  [<ffffffff94116b4b>] process_one_work+0x1d5/0x364
> kernel:  [<ffffffff9411756e>] worker_thread+0x2d2/0x3f4
> kernel:  [<ffffffff9411729c>] ? cancel_delayed_work_sync+0x10/0x10
> kernel:  [<ffffffff9411b44d>] kthread+0xcd/0xd5
> kernel:  [<ffffffff94120000>] ?
> ftrace_raw_output_sched_migrate_task+0x7f/0x7f
> kernel:  [<ffffffff9411b380>] ? kthread_create_on_node+0x17a/0x17a
> kernel:  [<ffffffff94967d62>] ret_from_fork+0x42/0x70
> kernel:  [<ffffffff9411b380>] ? kthread_create_on_node+0x17a/0x17a
> kernel: INFO: task btrfs-transacti:2020 blocked for more than 120 seconds.
> kernel:       Not tainted 4.1.0-gentoo #4
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> kernel: btrfs-transacti D ffff8802347838a8     0  2020      2 0x00000080
> kernel:  ffff8802347838a8 ffff8802361e02d0 ffff88023477a350 0000000000000246
> kernel:  ffff880234784000 ffff8801f5296ae0 ffff88023477a350 ffff8802347838f8
> kernel:  ffff88023477a350 ffff8802347838c8 ffffffff94963329 0000000000000000
> kernel: Call Trace:
> kernel:  [<ffffffff94963329>] schedule+0x74/0x83
> kernel:  [<ffffffff943e44eb>] btrfs_tree_read_lock+0xc0/0xea
> kernel:  [<ffffffff941346bd>] ? wait_woken+0x74/0x74
> kernel:  [<ffffffff94395c6d>] btrfs_search_old_slot+0x509/0x7fc
> kernel:  [<ffffffff9440587b>] __resolve_indirect_refs+0x192/0x5d9
> kernel:  [<ffffffff943e43e5>] ? btrfs_clear_lock_blocking_rw+0x78/0xbe
> kernel:  [<ffffffff943d0c46>] ? release_extent_buffer+0x34/0xb1
> kernel:  [<ffffffff943d105f>] ? free_extent_buffer+0x6e/0x7a
> kernel:  [<ffffffff94406905>] find_parent_nodes+0xb28/0xfaa
> kernel:  [<ffffffff94406e19>] __btrfs_find_all_roots+0x92/0xf0
> kernel:  [<ffffffff94406ed3>] btrfs_find_all_roots+0x45/0x65
> kernel:  [<ffffffff94391387>] ? btrfs_get_tree_mod_seq+0x72/0x86
> kernel:  [<ffffffff9440ad70>] btrfs_delayed_qgroup_accounting+0x32b/0x962
> kernel:  [<ffffffff943a58e4>] btrfs_run_delayed_refs+0x185/0x210
> kernel:  [<ffffffff943b3e0a>] btrfs_commit_transaction+0x47/0xaf0
> kernel:  [<ffffffff943b4c5a>] ? start_transaction+0x3a7/0x52a
> kernel:  [<ffffffff943b0550>] transaction_kthread+0xfa/0x1d0
> kernel:  [<ffffffff943b0456>] ? btrfs_cleanup_transaction+0x498/0x498
> kernel:  [<ffffffff9411b44d>] kthread+0xcd/0xd5
> kernel:  [<ffffffff94120000>] ?
> ftrace_raw_output_sched_migrate_task+0x7f/0x7f
> kernel:  [<ffffffff9411b380>] ? kthread_create_on_node+0x17a/0x17a
> kernel:  [<ffffffff94967d62>] ret_from_fork+0x42/0x70
> kernel:  [<ffffffff9411b380>] ? kthread_create_on_node+0x17a/0x17a
> 
> 
> In this state, I was even unable to fully copy away my machines into a
> tarball which just ended up in
> 
> tar: Unexpected EOF in archive
> tar: Unexpected EOF in archive
> tar: Error is not recoverable: exiting now
> 
> and
> 
> gzip: stdin: unexpected end of file
> 
> 
> Didn't get a reply in #btrfs regarding this yet and didn't get around
> yet to raise this on the btrfs ML.
> 
> I unmounted the machine volume (which was just backed by a raw image on
> my ext4 root volume), moved the raw volume file away for possible later
> analysis and recreated the volume from scratch (machinectl set-limit 6G).
> 
> 
> - Elias
> 
> 
> 
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20150703/87b57d9d/attachment-0001.sig>


More information about the systemd-devel mailing list