[systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
Goffredo Baroncelli
kreijack at libero.it
Sat Jun 13 08:09:19 PDT 2015
On 2015-06-13 11:35, Anand Jain wrote:
>
> Thanks for your reply Andrei and Goffredo. more below...
>
> On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote:
>> On 2015-06-12 20:04, Andrei Borzenkov wrote:
>>> В Fri, 12 Jun 2015 21:16:30 +0800 Anand Jain
>>> <anand.jain at oracle.com> пишет:
>>>
>>>>
>>>>
>>>> BTRFS_IOC_DEVICES_READY is to check if all the required
>>>> devices are known by the btrfs kernel, so that
>>>> admin/system-application could mount the FS. It is checked
>>>> against a device in the argument.
>>>>
>>>> However the actual implementation is bit more than just that,
>>>> in the way that it would also scan and register the device
>>>> provided in the argument (same as btrfs device scan subcommand
>>>> or BTRFS_IOC_SCAN_DEV ioctl).
>>>>
>>>> So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
>>>> but its a write command as well.
>>>>
>>>> Next, since in the kernel we only check if total_devices (read
>>>> from SB) is equal to num_devices (counted in the list) to
>>>> state the status as 0 (ready) or 1 (not ready). But this does
>>>> not work in rest of the device pool state like missing,
>>>> seeding, replacing since total_devices is actually not equal to
>>>> num_devices in these state but device pool is ready for the
>>>> mount and its a bug which is not part of this discussions.
>>>>
>>>>
>>>> Questions:
>>>>
>>>> - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
>>>> register the device provided (same as btrfs device scan command
>>>> or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY
>>>> be read-only ioctl interface to check the state of the device
>>>> pool. ?
>>>>
>>>
>>> udev is using it to incrementally assemble multi-device btrfs, so
>>> in this case I think it should.
>
> Nice. Thanks for letting me know this.
>
>> I agree, the ioctl name is confusing, but unfortunately this is an
>> API and it has to be stay here forever. Udev uses it, so we know
>> for sure that it is widely used.
>
> ok. what goes in stays there forever. its time to update the man page
> rather.
>
>>> Are there any other users?
>>>
>>>> - If the the device in the argument is already mounted, can it
>>>> straightaway return 0 (ready) ? (as of now it would again
>>>> independently read the SB determine total_devices and check
>>>> against num_devices.
>>>>
>>>
>>> I think yes; obvious use case is btrfs mounted in initrd and
>>> later coldplug. There is no point to wait for anything as
>>> filesystem is obviously there.
>>>
>
> There is little difference. If the device is already mounted. And
> there are two device paths for the same device PA and PB. The path as
> last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)' or 'btrfs
> device ready (BTRFS_IOC_DEVICES_READY)' will be shown in the 'btrfs
> filesystem show' or '/proc/self/mounts' output. It does not mean that
> btrfs kernel will close the first device path and reopen the 2nd
> given device path, it just updates the device path in the kernel.
>
> Further, the problem will be more intense in this eg. if you use dd
> and copy device A to device B. After you mount device A, by just
> providing device B in the above two commands you could let kernel
> update the device path, again all the IO (since device is mounted)
> are still going to the device A (not B), but /proc/self/mounts and
> 'btrfs fi show' shows it as device B (not A).
>
> Its a bug. very tricky to fix.
In the past [*] I proposed a mount.btrfs helper . I tried to move the logic outside the kernel.
I think that the problem is that we try to manage all these cases from a device point of view: when a device appears, we register the device and we try to mount the filesystem... This works very well when there is 1-volume filesystem. For the other cases there is a mess between the different layers:
- kernel
- udev/systemd
- initrd logic
My attempt followed a different idea: the mount helper waits the devices if needed, or if it is the case it mounts the filesystem in degraded mode. All devices are passed as mount arguments (--device=/dev/sdX), there is no a device registration: this avoids all these problems.
[*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767
back to your questions
> - we can't return -EBUSY for subsequent (after mount) calls for the
> above two ioctls (if a mounted device is used as an argument). Since
> admin/system-application might actually call again to mount subvols.
I am not sure that the two things are related: the mount doesn't use BTRFS_IOC_DEVICES_READY. After BTRFS_IOC_DEVICES_READY returns OK, all the filesystem belongs this FSID should be mounted; but it is a job of systemd/initramfs/sysv... a further failed BTRFS_IOC_DEVICES_READY shouldn't case any problem ...
>
> - we can return success (without updating the device path) but, we
> would be wrong when device A is copied into device B using dd. Since
> we would check against the on device SB's fsid/uuid/devid. Checking
> using strcmp the device paths is not practical since there can be
> different paths to the same device (lets says mapper).
>
> (any suggestion on how to check if its the same device in the
> kernel?).
check minor/major ?
>
> - Also if we don't let to update the device path after device is
> mounted, then are there chances that we would be stuck with the
> device path during initrd which does not make any sense to the user
> ?
>
>
>>>> - What should be the expected return when the FS is mounted and
>>>> there is a missing device.
>>
>> I suggest to not invest further energy on a ioctl API. If you want
>> these kind of information, you (we) should export these in sysfs:
>> In an ideal world:
>>
>> - a new btrfs device appears - udev register it with
>> BTRFS_IOC_SCAN_DEV: - udev (or mount ?) checks the status of the
>> filesystem reading the sysfs entries (total devices, present
>> devices, seed devices, raid level....); on the basis of the local
>> policy (allow degraded mount, device timeout, how many device are
>> missing, filesystem redundancy level.....) udev (mount) may mount
>> the filesystem with the appropriate parameter (ro, degraded, or
>> even insert a spare device to correct a missing device....)
>
> Yes. sysfs interface is coming. few framework patch were sent
> sometime back, any comments will help. On the ioctl part I am trying
> to fix the bug(s).
>
>>>>
>>>
>>> This is similar to problem mdadm had to solve. mdadm starts timer
>>> as soon as enough raid devices are present; if timer expires
>>> before raid is complete, raid is started in degraded mode. This
>>> avoids spurious rebuilds. So it would be good if btrfs could
>>> distinguish between enough devices to mount and all devices.
>
>> These are two different things: how export the filesystem
>> information (I am still convinced that these have to be exported
>> via sysfs), and what the system has to do in case of ... (a missing
>> device ?). The latter is a policy, and I think that it should be
>> not rely in the kernel.
>>
>>
>>> -- To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in the body of a message to
>>> majordomo at vger.kernel.org More majordomo info at
>>> http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
> -- To unsubscribe from this list: send the line "unsubscribe
> linux-btrfs" in the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
More information about the systemd-devel
mailing list