[systemd-devel] persisting sriov_numvfs

"Jóhann B. Guðmundsson" johannbg at gmail.com
Tue Jan 27 06:01:37 PST 2015


On 01/27/2015 12:40 PM, Tom Gundersen wrote:
> Hi Dan,
>
> On Mon, Jan 19, 2015 at 3:18 PM, Dan Kenigsberg <danken at redhat.com> wrote:
>> I'm an http://oVirt.org developer, and we plan to (finally) support
>> SR-IOV cards natively. Working on this feature, we've noticed that
>> something is missing in the platform OS.
>>
>> If I maintain a host with sr-iov cards, I'd like to use the new kernel
>> method of defining how many virtual functions (VFs) are to be exposed by
>> each physical function:
>>
>>      # echo 3 > /sys/class/net/enp2s0f0/device/sriov_numvfs
>>
>> This spawns 3 new devices, for which udev allocated (on my host) the names
>> enp2s16, enp2s16f2 and enp2s16f4.
>>
>> I can attach these VFs to virtual machines, but I can also use them as
>> yet another host NIC. Let's assume that I did the latter, and persisted
>> its IP address using initscripts in
>> /etc/sysconfig/network-scripts/ifcfg-enp2s16f4.
>>
>> However, on the next boot, sriov_numvfs is reset to 0, there's no
>> device named enp2s16f4, and certainly no IP address asigned to it.
>>
>> The admin can solve his own private issue by writing a service to start
>> after udev allocats device names but before network services kick in,
>> and re-apply his "echo" there. But it feels like something that should
>> be solved in a more generic fashion. It is also not limitted to network
>> device. As similar issue would affect anything that attempts to refer to
>> a VF by its name, and survive reboot.
>>
>> How should this be implemented in the realm of systemd?
> Sorry for the delay in getting back to you.
>
> My understanding is that the number of vfs must be basically set once
> and not changed after that? It seems that it is possible to change it,
> but only at the cost of removing all of them first, which I guess is
> not really an option in case they are in use.

Enabling this stuff via module parameter manually or via .conf file has 
been deprecate and users are encourage to use the pci sysfs interface 
instead.


> If that is the case, and what you essentially want is to just override
> the kernel default (0 VFs), then I think we can add a feature to
> udev's .link files to handle this.
>
> This means the VFs will be allocated very early during boot, as soon
> as the PF appears.
>
> On the downside, there is no mechanism to nicely update this setting
> during run-time (which may not be a problem if that is not really
> supported anyway), you would have to reinsert the PF or reboot the
> machine for the .link file to be applied.

You can create number of VF to the cards maximum per PF via|

|# echo <number> > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs
# echo <number> > /sys/bus/pci/devices/0000\:01\:00.1/sriov_numvfs
...
etc.

( these should be able to be matched in link files via Path as in 
Path=pci-0000:01:00.0-* for the above sample right ?  )

Then you can tweak the VF settings

To set the vNIC MAC address on the Virtual Function

# ip link set <pf> vf <vf_index> mac <vnic_mac>

# ip link set em1 vf 0 mac 00:52:44:11:22:33

It's common to set fixed mac address instead of randomly generated ones 
via bash script at startup

To turn HW packet source mac spoof check on or off for the specified VF

# ip link set <pf> vf <vf_index> spoofchk on|off

# ip link set em1 vf 0 spoofchk on

Change the link state as seen by the VF

# ip link set <pf> vf <vf_index> state auto|enable|disable

# ip link set em1 vf 0 state disabled

To set a VLAN and priority on Virtual Function

# ip link set <dev> down
# ip link set <pf> vf <vf_index> vlan <vlan id> qos <priority>
# ip link set <dev> up

Here for example is em1 is the PF (physical function) , em2 is the 
interface assigned to VF 0.

# ip link set em2 down
# ip link set em1 vf 0 vlan 2 qos 2
# ip link set em2 up

If someone ships you those cards you can verify configuration use ip 
link show command like so

# ip link show dev em1

And it's output be something like this

7: em1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
link/ether 00:02:c9:e6:01:12 brd ff:ff:ff:ff:ff:ff
vf 0 MAC <mac>, vlan <id>, spoof checking off, link-state auto
vf 1 MAC <mac>, vlan <id>, spoof checking on, link-state enable
vf 2 MAC <mac>, vlan <id>, spoof checking off, link-state disable

etc...

>   Moreover, .link files are
> specific to network devices, so this will not help you with other
> kinds of PFs. I think that may be ok, depending on how common it is to
> use this for non-network hardware. If that is a niche usecase, it will
> always be possible to write an udev rule to achieve the same result as
> the .link file (for any kind of hardware), it is just a bit more
> cumbersome.

If I'm not mistaken some of those cards can support for example 
infiniband,fc and etherenet at the same time ( which used to be 
configured when the module was loaded )

But what's missing from link files here?
set the number of VF ?
( Note the maximum number of VFs that you can create and the maximum 
number of VFs that you can use for passthrough can be different.)

That said it's probably best to get the Intel guys on board on this 
since a) Intel is one of the major force behind SR-IOV stuff and b) Alin 
Rauta "Add FDB support" patches to networkd probably added "SR-IOV and 
Para Virtualization" support in the process

JBG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20150127/9b7449c8/attachment.html>


More information about the systemd-devel mailing list