[lvm-team] Soliciting feature requests for development of an LVM library / API

Mon Dec 15 17:45:54 PST 2008

On Mon, 2008-12-15 at 20:46 +0000, Alasdair G Kergon wrote:
> On Mon, Dec 15, 2008 at 03:00:41PM -0500, David Zeuthen wrote:
> > I'm not sure we want to extend the udev database; in my view it's
> > supposed to be a small and efficient mechanism that allows one to
> > annotate directories in /sys with additional information that we collect
> > in user space.
>  
> So we need another database to 'wrap' around the udev one.
> 
> Could the udev database at least store 'claiming subsystem' information?
> e.g. that device X is 'claimed' by LVM2; device 'Y' is 'claimed' by md etc.

Not sure what 'claimed' means, can you clarify?

However for identification it's usually sufficient to look at the name
the kernel hands out, e.g. md block nodes are prefixed with "md",
device-mapper device nodes are prefixed with "dm-" and so on.

> > FWIW, in many ways, one can think of the udev database as an extension
> > of sysfs insofar that attributes in sysfs represents information / state
> > exported by the kernel driver while attributes in the udev database
> > represents information / state exported by a user space programs /
> > daemons.
> 
> Just state, not new classes of entities that have no correspondance with
> sysfs (such as 'Volume Groups')?
>  
> >  3. Finally we can discuss how this information can be used to implement
> >     policy by writing very simple udev rules that leverages the info in
> >     the udev database defined in 1.
> >     For example, one thing many people (desktop developers like me
> >     but also people working on initramfs/booting (jeremy, davej)
> >     and also anaconda) probably want to request is that device-mapper
> >     and LVM ship with udev rules that uses the information defined in 1.
> >     above to implement a policy that automatically assembles LV's from
> >     PV's. If we solve 1. correctly, this *shouldn't* be more complicated
> >     than a simple one-liner udev rule.
> 
> Those rules - triggers - depend on these additional entities (Volume Groups).
> 
> The way we store and index and trigger using this Volume Group information is
> critical to this whole exercise and has to be resolved before we can really
> make much more progress IMHO.

I'm not sure why we need to extend the udev database like this - with a
little care we should be able to use the udev database as-is for this
task (in the same essence that you can normalize/denormalize SQL
database tables from one extreme to the other). 

Here's a concrete proposal of how we'd store data for LVM PV's.

First, we'd install an udev rule that runs whenever vol_id(8) (which is
currently supplied by udev) detects the block device signature for a LVM
PV:

  /lib/udev/65-lvm.rules:
  # This file is supplied by LVM.
  #
  # This rule extracts metadata about LVM physical volumes and inserts
  # it into the udev database. See the pvdisplay(8) man page for details
  # about what key/value pairs are inserted and their format and
  # meaning.
  #
  # Do not edit this file, it is overwritten on updates.

  SUBSYSTEM=="block", ACTION=="add|change",           \
                      ENV{ID_FS_TYPE}=="LVM2_member", \
                      RUN+="pvdisplay --udev-export $root/%k"

where "--udev-export" is a new option for pvdisplay(8). Note that the
priority should be in the 60-69 priority to fit in with other rules that
insert data into the udev database.

Now onto what key/value pairs we'd insert. I'd expect this would pretty
much would be what pvdisplay(8) would give us. Let's just assume we
insert all the raw data for each PV in the udev database.

Now, all the data we just insert into the udev database may not be
entirely useful at a glance (it's just the raw PV metadata formatted in
a semi-useful way for human beings) but at least with this scheme we'd
have all information about all PV's that are available on the system.
Without doing any IO at all. This means it's possible to write a program
that uses the libudev library to go through all this information and
answer questions like

 - What VGs are available?
 - What LVs are available and what is their relationship to VGs?
 - What LVs can be started that hasn't been started?

again without doing any IO. So the proposal would now to be to teach
lvm(8) or some other tool about these operations and export them as
commandline options.

So for the policy bits we'd have

  /lib/udev/75-lvm-activate.rules:
  # This file is supplied by LVM.
  #
  # This rule autostarts Logical Volumes as the Physical Volumes
  # backing them becomes available.
  #
  # Do not edit this file, it is overwritten on updates.

  SUBSYSTEM=="block", ACTION=="add",                              \
                      ENV{ID_FS_TYPE}=="LVM2_member",             \
                      RUN+="lvm --start-lv-from-added-pv $root/%k"

where "--start-lv-from-added-pv" is a new option. Also note that this
rule lives in the 70 priority scheme to adhere to udev standards (at
least I think this is the scheme. Kay?).

So, --start-lv-from-added-pv will do just what I described above. It
will extract all PV information from the udev database and autostart the
set of LV's that are now available with the addition of the passed in PV
(need to be careful to compute the right delta so we don't inadvertently
start LV's that the user manually has stopped).

In a way this proposal is the extreme denormalization of your nice
proposal with logical database tables. But I think it's fine just to
store raw data; the point here isn't so much that the data is stored in
a nice format; the point is that it's straightforward to write a program
that does no IO (e.g. doesn't open all block devices on the system) and
can answer any question we might have.

Of course, I'm not suggesting to put in a hexencoded string with the PV
meta data - it's useful that it's name spaced and human readable so we
still want that I think.

Specifically for things like DeviceKit-disks, I'd just use the same PV
data stored in the udev database to compute the set of available VGs and
LVs and then I can present that in my pretty little GNOME disk utility
program.

Note that since I'm not a domain expert so it's possible this proposal
needs refinement. And names of the options etc. is currently made up;
probably needs lot of refinements.

     David