[systemd-devel] timed out waiting for device dev-disk-by\x2duuid

Sat May 17 16:30:45 PDT 2014

On May 16, 2014, at 11:31 AM, Goffredo Baroncelli <kreijack at inwind.it> wrote:

> On 05/15/2014 11:54 PM, Chris Murphy wrote:
>> 
>> On May 15, 2014, at 2:57 PM, Goffredo Baroncelli <kreijack at libero.it>
>> wrote:
> [....]
>> 
>> The udev rule right now is asking if all Btrfs member devices are
>> present and it sounds like that answer is no with a missing device;
>> so a mount isn't even attempted by systemd rather than attempting a
>> degraded mount specifically for the root=UUID device(s).
> 
> Who is in charge to mount the filesystem ?

Ultimately systemd mounts the defined root file system to /sysroot. It knows what volume to mount based on boot parameter root=UUID= but it doesn't even try to mount it until the volume UUID for root fs has appeared. Until then, the mount command isn't even issued.

> 
> What I found is that dracut waits until all the btrfs devices are present:
> 
> cat /usr/lib/dracut/modules.d/90btrfs/80-btrfs.rules
> SUBSYSTEM!="block", GOTO="btrfs_end"
> ACTION!="add|change", GOTO="btrfs_end"
> ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"
> RUN+="/sbin/btrfs device scan $env{DEVNAME}"
> 
> RUN+="/sbin/initqueue --finished --unique --name btrfs_finished /sbin/btrfs_finished"
> 
> LABEL="btrfs_end"
> 
> 
> and 
> 
> 
> cat btrfs_finished.sh 
> #!/bin/sh
> # -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
> # ex: ts=8 sw=4 sts=4 et filetype=sh
> 
> type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh
> 
> btrfs_check_complete() {
>    local _rootinfo _dev
>    _dev="${1:-/dev/root}"
>    [ -e "$_dev" ] || return 0
>    _rootinfo=$(udevadm info --query=env "--name=$_dev" 2>/dev/null)
>    if strstr "$_rootinfo" "ID_FS_TYPE=btrfs"; then
>        info "Checking, if btrfs device complete"
>        unset __btrfs_mount
>        mount -o ro "$_dev" /tmp >/dev/null 2>&1
>        __btrfs_mount=$?
>        [ $__btrfs_mount -eq 0 ] && umount "$_dev" >/dev/null 2>&1
>        return $__btrfs_mount
>    fi
>    return 0
> }
> 
> btrfs_check_complete $1
> exit $?
> 

> 
> It seems that when a new btrfs device appears, the system attempt to mount it. If it succeed then it is assumed that all devices are present.

No, the system definitely does not attempt to mount it if there's a missing device. Systemd never executes /bin/mount at all in that case. A prerequisite for the mount attempt is this line:

[    1.621517] localhost.localdomain systemd[1]: dev-disk-by\x2duuid-9ff63135\x2dce42\x2d4447\x2da6de\x2dd7c9b4fb6d66.device changed dead -> plugged

That line only appears if all devices are present. And mount attempt doesn't happen. The system just hangs.

However, if I do an rd.break=pre-mount, and get to a dracut shell this command works:

mount -t btrfs -o subvol=root,ro,degraded -U <uuid>

The volume UUID is definitely present even though not all devices are present. So actually in this case it's confusing why this uuid hasn't gone from dead to plugged. Until it's plugged, the mount command won't happen.

> To allow a degraded boot, it should be sufficient replace
> 
> 
> 	mount -o ro "$_dev" /tmp >/dev/null 2>&1
> 
> with
> 
> 	OPTS="ro"
> 	grep -q degraded /proc/cmdline && OPTS=",degraded"
> 	mount -o $OPTS "$_dev" /tmp >/dev/null 2>&1

The problem isn't that the degraded mount option isn't being used by systemd. The problem is that systemd isn't changing the device from dead to plugged.

And the problem there is that there are actually four possible states for an array, yet btrfs device ready apparently only distinguishes between 1 and not 1 (i.e. 2, 3, 4).

1. All devices ready.
2. Minimum number of data/metadata devices ready, allow degraded rw mount.
3. Minimum number of data devices not ready, but enough metadata devices are ready, allow degraded ro mount.
4. Minimum number of data/metadata devices not ready, degraded mount not possible.

So I think it's a question for the btrfs list to see what the long term strategy is, in the face of the fact rootflags=degraded alone does not work on systemd systems. Once I'm on 208-16 on Fedora 20, I get the same hang as on Rawhide. So actually I have to force power off, reboot with mount option rd.break=pre-mount, mount the volume manually, and exit twice. And that's fine for me, but it's non-obvious for most users.

The thing to put to the Btrfs list is how are they expecting this to work down the road.

Right now, the way md does this, it doesn't do anything at all. It's actually dracut scripts that check for the existance of the rootfs volume UUID up to 240 times, with an 0.5 sleep between each attempt. After 240 failed attempts, dracut runs mdadm -R which forcibly runs the array with available devices (i.e. degraded assembly), at that moment the volume UUID becomes available, the device goes from dead to plugged, and systemd mounts it. And boot continues normally.

So maybe Btrfs can leverage that same loop used for md degraded booting. But after the loop completes, then what? I don't see how systemd gets informed to use an additional mount option "degraded" conditionally. I think the equivalent for dracut's mdadm -R, for btrfs would be something like 'btrfs device allowdegraded -U <uuid>' to set a state on the volume to permit normal mounts to work. Then the device goes from dead to plugged, and systemd just issues the usual mount command.

*shrug*

Chris Murphy