[systemd-devel] systemd-nspawn: cannot join existing macvlan

Fri Jun 19 00:17:13 PDT 2015

Lennart Poettering <lennart at poettering.net> schrieb:

> On Sat, 30.05.15 19:55, Kai Krakow (hurikhan77 at gmail.com) wrote:
> 
>> The next issue with your argument is: AFAIR nspawn doesn't create a
>> macvlan interface based on the machine name. You have to pass the name of
>> a physical interface which transports this macvlan. The man page at least
>> states that you use an existing physical interface:
> 
> True, I was a bit confused there...

:-) Fine. I thought I was totally wrong.

>> So your assumption about macvlan seems to be incorrect. The other network
>> types may be based off the machine name but it doesn't work this way with
>> macvlan.
> 
> Yeah, nspawn creates a n interface "mv-foo" from a network interface
> "foo" on the host.

Yes, it creates it on the host. In the guest, AFAIR (I cannot currently try 
this), it creates host0 as interface.

Correct me if I'm wrong, but I see macvlan as a sort of peer-to-peer level2 
LAN. The host endpoint is mv-foo, each guest has its own endpoint host0, 
spanning a virtual switch accross all peers, thus between mv-foo and each 
host0.

>> I think the logic is wrong here in systemd-nspawn. Instead of trying to
>> create the host-side macvlan itself it should insist of it being there
>> already (to have one well-defined state to start with, and only
>> optionally create it by itself). Then, it can join multiple machines to
>> the same macvlan.
> 
> I don't grok this?
> 
> "the same macvlan"?

Well, the level2 peer-to-peer LAN...

So, in this context, mv-foo should only be created once. Successive guests 
should only be joined to the existing macvlan.

> I have the suspicion that the confusion here stems from the fact that
> nspawn creates the macvlan iface on the host first, then moves it into
> the container. but if you already have an iface by that name on the
> host, then it cannot create the macvlan under that name.

I don't think this is how it worked as far as I remember, but as already 
pointed out: I still have to try that again. Currently my setup refuses to 
run the machines, I need to reconfigure the system first to get one machine 
up and running.

In this context: I think when it worked, it created mv-foo on the host (so 
you are true here), but it won't move it into the container. It creates a 
companion device there called host0. This is a level2 peer-to-peer network 
in the kernel. So maybe host0 is created in the host, then moved into the 
container - I'm not sure. Other peers could be joined.

The mv-foo interface is a virtual MAC address on the host. If you created it 
manually, you would join more virtual interfaces to the physical interface, 
i.e. host0 from the container.

Each peer interface can communicate with the others but not with the 
physical interface directly, except your switch has packet mirroring 
capabilities and would send packages back to the port they originated from - 
this is usually not encouraged by the ethernet switch specification.

The kernel's MACVLAN implementation won't pass packets to the physical 
interface directly but always through the medium connecting to the switch, 
and the switch won't pass it back on the same physical port by sane 
reasoning. However, the kernel would pass packets between MACVLAN peers it 
locally knows without touching the physical interface. The physical 
interface is only a transport medium for non-local (from view of MACVLAN 
known MACs) packets.

To overcome this issue, I need to configure mv-foo to receive my DHCP lease 
instead of the physical interface. Now each peer can communicate with my LAN 
and each other MACVLAN peer of my physical interface (which now includes my 
host mv-foo on layer3). The ḾAC address of the physical interface is more or 
less unused.

> I figure we can fix that by creating the iface under a random name
> first on the host, then move it into the container, and then rename it
> to the right name afterwarads.

The problem is with the interface that stays in the host, not with the 
interface in the container. This fix may be for a second problem I did not 
yet observe.

> A work-around would be to name the .netdev iface of yours something
> else than "mv-enp5s0", call it "waldi" or so, so that it doesn't
> conflict with the name for the contaainer in the short time-frame that
> the iface nspawn creates exists on the host...

I need to create this manually with networkd and configure it as DHCP client 
for the above reasons. Otherwise my host communicates through the physical 
interface "foo" instead of "mv-foo", which effectively disables 
communication with the MACVLAN peers for the above outlined reasons.

> Can you verify if such a change fixes your issue? If so, we can
> randomoize the name initially, as sugegsted above.

I'll first restore a configuration which gets one container up and running 
again with working MACVLAN, then we can figure out where the problem is. I 
somehow believe your guess about the source of the issue is currently not 
quite right. Such a machine could communicate with my router and outside 
world but not with the MAC address of my physical interface, see above why.

PS: MACVLAN is completely different from the semantics of veth, in the way 
that with veth you have a pair of virtual ethernet interface, differently 
prefixed in the host and container. It's a 1:1 association. MACVLAN is 
different, it's a peer-to-peer association, identified by the physical 
interface they are joined to. The host is effectively just only a peer, no 
different than a container. But nspawn currently doesn't handle this fact. 
It tries to apply veth semantics.

-- 
Replies to list only preferred.