[systemd-devel] [systemd-219] Journal spamming by umount / high CPU usage
Kai Krakow
hurikhan77 at gmail.com
Sun Mar 29 08:12:35 PDT 2015
Hello!
I've got a automount point for a daily USB backup job. Due to some
instabilities of early USB3 chipsets and early USB3 devices, the mounted
device sometimes wents offline and eventually comes back after a while but
my backup job is stuck (rsync). I configured rsync to shutdown upon block io
timeouts. This in turn makes the systemd automounter think (correctly) that
the mount point is no longer in use - but it cannot be unmounted. The
problem here is, that it spams the journal with hundreds of messages per
minute all day long (or as long as I don't reboot):
Mär 29 16:49:00 jupiter umount[19547]: umount: /mnt/private/usb-backup:
umount fehlgeschlagen: Die Operation ist nicht erlaubt
Mär 29 16:49:00 jupiter umount[19575]: umount: /mnt/private/usb-backup:
umount fehlgeschlagen: Die Operation ist nicht erlaubt
Mär 29 16:49:00 jupiter umount[19577]: umount: /mnt/private/usb-backup:
umount fehlgeschlagen: Die Operation ist nicht erlaubt
Mär 29 16:49:00 jupiter umount[19645]: umount: /mnt/private/usb-backup:
umount fehlgeschlagen: Die Operation ist nicht erlaubt
Mär 29 16:49:00 jupiter umount[19671]: umount: /mnt/private/usb-backup:
umount fehlgeschlagen: Die Operation ist nicht erlaubt
Mär 29 16:49:00 jupiter umount[19719]: umount: /mnt/private/usb-backup:
umount fehlgeschlagen: Die Operation ist nicht erlaubt
Mär 29 16:49:00 jupiter umount[19795]: umount: /mnt/private/usb-backup:
umount fehlgeschlagen: Die Operation ist nicht erlaubt
Mär 29 16:49:00 jupiter umount[19843]: umount: /mnt/private/usb-backup:
umount fehlgeschlagen: Die Operation ist nicht erlaubt
Mär 29 16:49:00 jupiter umount[19853]: umount: /mnt/private/usb-backup:
umount fehlgeschlagen: Die Operation ist nicht erlaubt
Mär 29 16:49:00 jupiter umount[19907]: umount: /mnt/private/usb-backup:
umount fehlgeschlagen: Die Operation ist nicht erlaubt
Mär 29 16:49:00 jupiter umount[19947]: umount: /mnt/private/usb-backup:
umount fehlgeschlagen: Die Operation ist nicht erlaubt
The German phrase means "umount failed: operation not permitted". This in
turn leads to a constant CPU usage of systemd and journald between 30% and
70% added together and the journal grows to gigabytes, meanwhile "old"
journals become rotated away so that I currently cannot even look back far
enough to see the logs right before that repeating incident. The system
reboots just fine, there's no hanging systemd job during shutdown.
The thing is, the device is back - it switched to another device node, tho
(sda at boot, not sdf). The systemd automounter happily mounts it when
changing to the directory, and I can list all my files fine. The backup
scratch area is dirty/incomplete as expected - but there are no file system
errors or inconsistencies.
Related dmesg log purely for informational purpose to see the incident in
chronological order, this in not about discussing a btrfs issue (and it is
not):
$ dmesg|egrep "sd[af]"
[ 2.241559] sd 6:0:0:0: [sda] 3907029164 512-byte logical blocks: (2.00
TB/1.81 TiB)
[ 2.241878] sd 6:0:0:0: [sda] Write Protect is off
[ 2.241883] sd 6:0:0:0: [sda] Mode Sense: 03 00 00 00
[ 2.242380] sd 6:0:0:0: [sda] No Caching mode page found
[ 2.242419] sd 6:0:0:0: [sda] Assuming drive cache: write through
[ 2.267163] sda: sda1
[ 2.268617] sd 6:0:0:0: [sda] Attached SCSI disk
[ 4.349156] BTRFS: device label usb-backup devid 1 transid 28610
/dev/sda1
# backup started here:
[13059.571377] BTRFS info (device sda1): force zlib compression
[13059.571382] BTRFS info (device sda1): disk space caching is enabled
# USB hickup here:
[58885.081027] sd 7:0:0:0: [sdf] 3907029164 512-byte logical blocks: (2.00
TB/1.81 TiB)
[58885.081433] sd 7:0:0:0: [sdf] Write Protect is off
[58885.081437] sd 7:0:0:0: [sdf] Mode Sense: 03 00 00 00
[58885.081979] sd 7:0:0:0: [sdf] No Caching mode page found
[58885.083306] sd 7:0:0:0: [sdf] Assuming drive cache: write through
# backup failed here:
[58885.250710] BTRFS: error (device sda1) in btrfs_commit_transaction:2010:
errno=-5 IO failure (Error while writing out transaction)
[58885.252190] BTRFS info (device sda1): forced readonly
[58885.252195] BTRFS warning (device sda1): Skipping commit of aborted
transaction.
[58885.252359] BTRFS: error (device sda1) in cleanup_transaction:1670:
errno=-5 IO failure
[58885.253900] BTRFS info (device sda1): delayed_refs has NO entry
[58885.253913] BTRFS error (device sda1): commit super ret -5
[58893.553877] sdf: sdf1
[58893.555061] sd 7:0:0:0: [sdf] Attached SCSI disk
# I've cd'ed into the directory here:
[99457.380783] BTRFS info (device sdf1): force zlib compression
[99457.380788] BTRFS info (device sdf1): disk space caching is enabled
Here's the configuration:
$ fgrep usb-backup /etc/fstab
LABEL=usb-backup /mnt/private/usb-backup btrfs \
noauto,noatime,compress-force=zlib,subvolid=0,x-systemd.automount \
0 0
$ systemctl cat mnt-private-usb\\x2dbackup.{automount,mount}
# /run/systemd/generator/mnt-private-usb\x2dbackup.automount
# Automatically generated by systemd-fstab-generator
[Unit]
SourcePath=/etc/fstab
Documentation=man:fstab(5) man:systemd-fstab-generator(8)
Before=local-fs.target
[Automount]
Where=/mnt/private/usb-backup
# /run/systemd/generator/mnt-private-usb\x2dbackup.mount
# Automatically generated by systemd-fstab-generator
[Unit]
SourcePath=/etc/fstab
Documentation=man:fstab(5) man:systemd-fstab-generator(8)
[Mount]
What=/dev/disk/by-label/usb-backup
Where=/mnt/private/usb-backup
Type=btrfs
Options=noauto,noatime,compress-force=zlib,subvolid=0,x-systemd.automount
I wonder if systemd automount should become smarter about handling such
hickups because they can happen out in the real world. It will probably have
a similar or same problem with users just disconnecting a device that was
automounted by systemd. This has not been an issue in earlier versions of
systemd so this looks like a regression, but I cannot say which version.
Probably it was still okay with v216 or v217.
I see at least to problems which should be fixed:
* automounter umount can loop until infinity in case of errors
* automounter can become very spammy in the journal
The latter one could be fixed with different burst settings maybe tho I
think that should not be the purpose of that setting for this particular
problem. The first looks like a regression. OTOH, in previous versions maybe
it wasn't just so spammy to the journal so I didn't discover the problem.
--
Replies to list only preferred.
More information about the systemd-devel
mailing list