[systemd-devel] Delaying VM startup until block devices are available

Andrei Borzenkov arvidjaar at gmail.com
Sat Jan 27 06:34:22 UTC 2024


On 27.01.2024 00:40, Orion Poplawski wrote:
> On 1/26/24 01:21, Lennart Poettering wrote:
>> On Do, 25.01.24 16:28, Orion Poplawski (orion at nwra.com) wrote:
>>
>>> We have various VMs that are back by luks encrypted LVs.  At boot the volumes
>>> are decrypted by clevis.  The problem we are seeing at the moment is that the
>>> VMs are started before the block devices are decrypted.  Our current
>>> solution is:
>>
>> We generally wait for all devices listed in /etc/crypttab, unless you
>> set noauto or nofail.
> 
> We are setting 'nofail', because I don't think I want to fail the boot in
> general.  They are not required for the system itself to function, just
> certain VMs. e.g:
> 
> luks-backup /dev/vg_root/backup-raw none discard,_netdev,nofail
> 
> See below for more though.
> 
>>> # cat /etc/systemd/system/virtqemud.service.d/override.conf
>>> [Unit]
>>> After=blockdev at dev-mapper-luks\x2dbackup.target
>>> blockdev at dev-mapper-luks\x2dvm\x2d01\x2ddisk0.target
>>>
>>> Where we list each of the volumes to be decyrpted as blocking the virtqemud
>>> service.
>>>
>>> Does anyone have any better alternatives?  My main issue it that it feels
>>> somewhere in between fine-grained and coarse-grained control.
>>>
>>> Ideally I think one would be able to have each individual VM startup
>>> automatically delayed until the devices each used became available, but I
>>> don't see how to do this.
>>
>> I am not sure how libvirt works, but if it runs every VM in a systemd
>> unit, then you could just order the device before that unit, or the
>> unit after the device.
>>
>> Really depends on how libvirt splits things up.
> 
> I'm honestly not sure how libvirt works here either.  But there seems to be this:
> 
> # rpm -qf /usr/lib/systemd/system/virtqemud.service
> libvirt-daemon-driver-qemu-9.5.0-7.el9_3.alma.2.x86_64
> 
> which gets started:
> 
> Jan 25 14:42:58 systemd[1]: Starting Virtualization qemu daemon...
> Jan 25 14:42:58 systemd[1]: Started Virtualization qemu daemon.
> 
> Then the qemu-kvm processes end up in their own scope:
> 
> ● machine-qemu\x2d1\x2dsrv\x2dmry01.scope - Virtual Machine qemu-1-srv-mry01
>       Loaded: loaded
> (/run/systemd/transient/machine-qemu\x2d1\x2dsrv\x2dmry01.scope; transient)
>    Transient: yes
>       Active: active (running) since Thu 2024-01-25 14:42:58 PST; 22h ago
>        Tasks: 6 (limit: 16384)
>       Memory: 15.6G
>          CPU: 1h 15min 44.863s
>       CGroup: /machine.slice/machine-qemu\x2d1\x2dsrv\x2dmry01.scope
>               └─libvirt
>                 └─9086 /usr/libexec/qemu-kvm -name guest=...
> 
>>
>>> Alternatively it seems like one should be able to delay all VM startup until
>>> all volumes in /etc/crypttab were unlocked, rather than having to specify each
>>> one.  But I don't see a target for that.
>>
>> This is default behaviour. Anything listed in /etc/crypttab is ordered
>> before cryptsetup.target, which is ordered before sysinit.target,
>> which is ordered before basic.target, which is ordered before regular services.
> 
> We are specifying _netdev because they require the network to unlock.  This I
> think puts them under remote-cryptsetup.target, and I used to depend on that.
> But with EL9 I'm seeing:
> 
> # j -b -u remote-cryptsetup.target -u
> 'blockdev at dev-mapper-luks\x2dbackup.target' -u clevis-luks-askpass.service
> --no-hostname
> 
> Jan 25 14:42:12 systemd[1]: Reached target Remote Encrypted Volumes.
> Jan 25 14:42:12 systemd[1]: Started Forward Password Requests to Clevis.
> Jan 25 14:42:48 clevis-luks-askpass[1706]: Unlocked /dev/vg_root/backup-raw
> (UUID=d6d25a85-2d43-4780-a312-e0e9b2383807) successfully
> Jan 25 14:42:54 systemd[1]: Reached target Block Device Preparation for
> /dev/mapper/luks-backup.
> Jan 25 14:42:59 systemd[1]: clevis-luks-askpass.service: Deactivated successfully.
> 
> # systemctl list-dependencies remote-cryptsetup.target
> remote-cryptsetup.target
> ● ├─systemd-cryptsetup at luks\x2dbackup.service
> 
> # j --no-hostname -b -u 'systemd-cryptsetup at luks\x2dbackup.service'
> Jan 25 14:42:12 systemd[1]: Starting Cryptography Setup for luks-backup...
> Jan 25 14:42:42 systemd-cryptsetup[1697]: Set cipher aes, mode xts-plain64,
> key size 512 bits for device /dev/vg_root/backup-raw.
> Jan 25 14:42:47 systemd-cryptsetup[1697]: Failed to activate with specified
> passphrase. (Passphrase incorrect?)
> Jan 25 14:42:48 systemd-cryptsetup[1697]: Set cipher aes, mode xts-plain64,
> key size 512 bits for device /dev/vg_root/backup-raw.
> Jan 25 14:42:54 systemd[1]: Finished Cryptography Setup for luks-backup.
> 
> # systemctl show 'systemd-cryptsetup at luks\x2dbackup.service' | grep Type
> Type=oneshot
> 
> So, if I'm following things correctly, this doesn't seem right.
> remote-cryptsetup.target depends on systemd-cryptsetup at luks\x2dbackup.service.
>   This is a oneshot that is considered started after the main process exits,
> and above is shown as 14:42:54.  But we are seeing 'Reached target Remote
> Encrypted Volumes' at 14:42:12.
> 
> What am I missing?
> 
> systemd-252-18.el9.x86_64
> 
> 

"nofail" encrypted devices are not ordered before 
(remote-)cryptsetup.target to not delay startup. The reasoning is, if 
you do not care whether this device exists or not, there is no reason to 
globally wait for it anyway. I believe this was changed (even several 
times) in the past.

If the device list is static, just add configuration snippets to 
explicitly order their blockdev@ services before 
remote-cryptsetup.target. /etc/fstab generator supports x-systemd.before 
(and others), may be it could be generalized to /etc/crypttab generator.


More information about the systemd-devel mailing list