[systemd-devel] Proposal: Add biosdevname naming scheme to systemd
Jordan Hargrave
jharg93 at gmail.com
Tue Oct 20 07:45:36 PDT 2015
On Tue, Oct 20, 2015 at 3:14 AM, Kay Sievers <kay at vrfy.org> wrote:
> On Tue, Oct 20, 2015 at 6:46 AM, Jordan Hargrave <jharg93 at gmail.com> wrote:
>> On Mon, Mar 2, 2015 at 1:17 PM, Tom Gundersen <teg at jklm.no> wrote:
>>> Hi Jordan,
>>>
>>> On Mon, Mar 2, 2015 at 4:45 PM, Jordan Hargrave <jharg93 at gmail.com> wrote:
>>>> There are currently two competing naming mechanisms for network cards,
>>>> biosdevname and systemd. Systemd currently has some limitations on naming
>>>> cards that use network partitioning or support SR-IOV.
>>>
>>> Could you point to an example so we can fix it? I thought all bug
>>> reports had been handled, but maybe I lost track of something.
>>>
>>
>> I have a quad-port NIC:
>> 0000:40:00.0 = PCIE bridge (SMBIOS Slot 2)
>> 0000:41:00.0 = Ethernet Device (port1)
>> 0000:41:00.1 = Ethernet Device (port2)
>> 0000:42:00.0 = Ethernet Device (port3)
>> 0000:42:00.1 = Ethernet Device (port4)
>>
>> biosdevname would name these p2p1, p2p2, p2p3, p2p4 respectively.
>>
>> With systemd, it's ugly. I added the patch to get SMBIOS slot numbers
>> and I see systemd get RANDOM names depending on boot.
>>
>> Either:
>> s2f0 (p1)
>> s2f1 (p2)
>> p66s0f0 (p3)
>> p66s0f1 (p4)
>>
>> I also saw the opposite:
>> p65s0f0 (p1)
>> p65s0f1 (p2)
>> s2f0 (p3)
>> s2f1 (p4)
>
> That looks like an issue with the PCI hotplug drivers. You either need
> to load them early enough, or not at all. Or just disable the slot
> naming policy in a networkd link file.
>
>> Since systemd doesn't have a concept of a 'port', whichever devices
>> get named first (they are named in parallel, race conditions), the
>> other devices have name collision (function 0,1 are duplicate, but on
>> different bus).
>
> Systemd cannot have a concept of a port across otherwise independent
> devices. It would mean to mainain a counter across devices which
> again will depend and introduce names based on enumeration order.
>
Dell systems export a string as part of PCI VPD data that has a
mapping of which PCI B:D:F belongs to which port. This mainly is
used for mapping virtual/partition devices to the parent partition.
These devices show up as physical pci devices on the pci scan. There
are also cards that support virtual SR-IOV devices. The quad-port
example above was a special case, but here is another. We have a
Mellanox card that implements two network devices under a single
B:D:F. It also supports SR-IOV. So a single PCI B:D:F maps to 16
network devices. Systemd uses the sysfs dev_port/dev_id to identify
which actual device it is.
Systemd names these as:
p66s0f0
p66s0f0d1
p66s0f1
p66s0f1d1
p66s0f2
p66s0f2d1
p66s0f3
p66s0f3d1
p66s0f4
p66s0f4d1
etc.
Again, p66 doesn't tell the user anything about where the device is in
the system or which port the network cable is plugged into.
biosdevname looks up the 'physical' sr-iov device and SMBIOS slot
number and names them:
p2p1 (original device)
p2p1_0 (virtual)
p2p1_1
p2p1_2
p2p1_3
...
p2p2
p2p2_0 (virtual device)
p2p2_1
...
etc.
This feature is really what we would like to see implemented in
systemd. First, name devices properly based on SMBIOS slot number.
Second, have physical name of NIC be the base name, along with virtual
index. We use this when enabling bonding to warn if a bond is enabled
using the same physical cable. The name can be stored in a separate
environment variable (ID_NET_NAME_BIOSDEVNAME or similar).
>>>> Proposal is to add
>>>> support for biosdevname-like names as part of systemd. The names would be
>>>> created as a new environment variable ID_NET_NAME_BIOSDEVNAME. This could
>>>> then be used in the udev rules scripts to replace the external biosdevname
>>>> handler.
>
> This is unlikely going to happen. Biosdevname "invents" counters which
> are unreliable and introduce inter-device probe-order depenedencies.
> It causes the same problem as the the kernel's ethX, just less likely.
> Systemd cannot do that.
>
It doesn't invent them if they are part of the DCM string in the PCI VPD.
>>> I don't think this makes much sense. If biosdevname had been
>>> acceptable, the udev naming scheme would not have been introduced in
>>> the first place.
>
> Right, the udev naming would not have been there or used the same
> names if biosdevname was reliable, which it unfortunately isn't for
> the above mentioned reasons,
>
>> biosdevname is going away in new version of RHEL, so we will lose the
>> capability to detect if two 'virtual' NICs are actually the same
>> physical NIC. The naming in systemd doesn't have the capabilty of
>> showing relationship between physical/virtual (SR-IOV) NIC location
>> name.
>>
>>>> At least on Dell systems, systemd generates unusable names (PCI B:D:F vs
>>>> Slot#) for add-in cards as our PCIe slots do not have the ACPI _SUN method,
>>>> but they do have a SMBIOS slot number.
>>>
>>> Wouldn't the better approach be to simply add SMBIOS support to udev
>>> then? I must admit I don't know what challenges that entails, but
>>> seems like a natural first step.
>>
>> That could be possible. I've tried submitting a patch upstream for
>> kernel but hasn't been accepted yet. So SMBIOS parsing would have to
>> be part of systemd.
>
> The kernel would need to export the parsed result of SMBIOS at the PCI
> device, which systemd can use. Systemd itself will unlikely parse
> SMBIODS directly. In any case, there can be no concept introduction of
> any cross-device counters though.
>
Is there a reason systemd can't parse SMBIOS data? Code is quite
simple. The kernel maintainer doesn't want to add it to the kernel
and asked if systemd can parse it. So I don't want to go in circles
here.
> Kay
More information about the systemd-devel
mailing list