[systemd-devel] [PATCH] networkd: Introduce ipip tunnel

Susant Sahani susant at redhat.com
Sun Apr 6 21:35:45 PDT 2014


On 04/04/2014 10:00 PM, Tom Gundersen wrote:
> Hi Susant,
Hi Tom,
       Thanks for reviewing .
>
> Thanks for this, looking forward getting this merged!
>
> I have some comments below though.
I have addressed all your comments. However I have some queries
Please find below.

>
> On Fri, Apr 4, 2014 at 11:25 AM, Susant Sahani <susant at redhat.com> wrote:
>>   This patch enables basic ipip tunnel support.
>> It works with kernel module ipip
>>
>> Example configuration
>> File : ipip.netdev
>>
>>   [NetDev]
>>   Name=ipip-tun
>>   Kind=tunnel
>>
>>   [Tunnel]
>>   Kind=ipip
> Maybe we should simply have
>
> [NetDev]
> Kind=ipip
>
> We can still use the same [Tunnel] section for each of the tunnel
> kinds though. This way we are closer to the rtnl interface, and it
> seems a bit simpler to me.
  My intention of kind=tunnel is to keep the all kind of tunnels under
the umbrella tunnel. But this also nice.

>>   Local=192.168.8.102
>>   Remote=10.4.4.4
>>   Dev=em1
> I don't think we should be using the interface name (anywhere, unless
> we really must). I suggest we do the same with tunnel devices as with
> other netdev devices. Simply add a "Tunnel=ipip-tun" to the [Network]
> section of the corresponding interface and match in this way.
>
>>   Ttl=64
>>   Mtu=1480
> I guess these should be upper-case, and MTUBytes should be used as in
> .link files.
>
> So to sum up, I suggest replacing your example with:
>
> /////////////////
>
> ipip.netdev:
> [NetDev]
> Name=ipip-tun
> Kind=tunnel
>
> [Tunnel]
> Local=192.168.8.102
> Remote=10.4.4.4
> TTL=64
> MTUBytes=1480
>
> foo.network:
> [Match]
> Name=em1
>
> [Network]
> Tunnel=ipip-tun
>
> //////////////
Modified .
> Also, we need to make sure that we only start setting up the tunnel
> when the underlying device (em1) has reached the correct state, so we
> really want to initiate the tunnel creation from networkd-link.c (so
> hooking into this from the .network file is the most convenient).
Yes agreed.
>
> In the future, we may want to allow a short-hand, where separate
> .network and .netdev files are not necessarily in some cases, but
> let's delay that for now.
>
>>   Makefile.am                              |   7 +-
>>   src/network/networkd-netdev-gperf.gperf  |   6 +
>>   src/network/networkd-netdev.c            | 240 ++++++++++++++++++++++++++++++-
>>   src/network/networkd-network-gperf.gperf |   1 +
>>   src/network/networkd-network.c           |  37 +++++
>>   src/network/networkd.h                   |  38 +++++
>>   6 files changed, 322 insertions(+), 7 deletions(-)
>>
>> diff --git a/Makefile.am b/Makefile.am
>> index c51f6ae..60c7016 100644
>> --- a/Makefile.am
>> +++ b/Makefile.am
>> @@ -4144,8 +4144,8 @@ systemd_networkd_SOURCES = \
>>          src/network/networkd.c
>>
>>   systemd_networkd_LDADD = \
>> -       libsystemd-networkd-core.la
>> -
>> +       libsystemd-networkd-core.la \
>> +       -lkmod
>>   noinst_LTLIBRARIES += \
>>          libsystemd-networkd-core.la
>>
>> @@ -4189,7 +4189,8 @@ test_network_SOURCES = \
>>          src/network/test-network.c
>>
>>   test_network_LDADD = \
>> -       libsystemd-networkd-core.la
>> +       libsystemd-networkd-core.la \
>> +       -lkmod
>>
>>   tests += \
>>          test-network
>> diff --git a/src/network/networkd-netdev-gperf.gperf b/src/network/networkd-netdev-gperf.gperf
>> index ea7ba57..ecca2bd 100644
>> --- a/src/network/networkd-netdev-gperf.gperf
>> +++ b/src/network/networkd-netdev-gperf.gperf
>> @@ -24,3 +24,9 @@ NetDev.Name,             config_parse_ifname,                0,
>>   NetDev.Kind,             config_parse_netdev_kind,           0,                             offsetof(NetDev, kind)
>>   VLAN.Id,                 config_parse_uint64,                0,                             offsetof(NetDev, vlanid)
>>   MACVLAN.Mode,            config_parse_macvlan_mode,          0,                             offsetof(NetDev, macvlan_mode)
>> +Tunnel.Kind,             config_parse_tunnel_kind,           0,                             offsetof(NetDev, tunnel_kind)
>> +Tunnel.Dev,              config_parse_ifname,                0,                             offsetof(NetDev, tunnel_dev)
>> +Tunnel.Ttl,              config_parse_int,                   0,                             offsetof(NetDev, tunnel_ttl)
>> +Tunnel.Mtu,              config_parse_int,                   0,                             offsetof(NetDev, tunnel_mtu)
>> +Tunnel.Local,            config_parse_tunnel_address,        0,                             offsetof(NetDev, tunnel_local)
>> +Tunnel.Remote,           config_parse_tunnel_address,        0,                             offsetof(NetDev, tunnel_remote)
>> diff --git a/src/network/networkd-netdev.c b/src/network/networkd-netdev.c
>> index 762eff2..6abaf12 100644
>> --- a/src/network/networkd-netdev.c
>> +++ b/src/network/networkd-netdev.c
>> @@ -18,6 +18,12 @@
>>     You should have received a copy of the GNU Lesser General Public License
>>     along with systemd; If not, see <http://www.gnu.org/licenses/>.
>>   ***/
>> +#include <netinet/ether.h>
>> +#include <arpa/inet.h>
>> +#include <net/if.h>
>> +#include <linux/ip.h>
>> +#include <linux/if_tunnel.h>
>> +#include <libkmod.h>
>>
>>   #include "networkd.h"
>>   #include "network-internal.h"
>> @@ -33,6 +39,7 @@ static const char* const netdev_kind_table[_NETDEV_KIND_MAX] = {
>>           [NETDEV_KIND_BOND] = "bond",
>>           [NETDEV_KIND_VLAN] = "vlan",
>>           [NETDEV_KIND_MACVLAN] = "macvlan",
>> +        [NETDEV_KIND_TUNNEL] = "tunnel",
>>   };
>>
>>   DEFINE_STRING_TABLE_LOOKUP(netdev_kind, NetDevKind);
>> @@ -48,6 +55,16 @@ static const char* const macvlan_mode_table[_NETDEV_MACVLAN_MODE_MAX] = {
>>   DEFINE_STRING_TABLE_LOOKUP(macvlan_mode, MacVlanMode);
>>   DEFINE_CONFIG_PARSE_ENUM(config_parse_macvlan_mode, macvlan_mode, MacVlanMode, "Failed to parse macvlan mode");
>>
>> +static const char* const tunnel_kind_table[_TUNNEL_KIND_MAX] = {
>> +        [TUNNEL_KIND_IPIP] = "ipip",
>> +        [TUNNEL_KIND_GRE] = "gre",
>> +        [TUNNEL_KIND_SIT] = "sit",
>> +};
>> +
>> +DEFINE_STRING_TABLE_LOOKUP(tunnel_kind, TunnelKind);
>> +DEFINE_CONFIG_PARSE_ENUM(config_parse_tunnel_kind, tunnel_kind, TunnelKind, "Failed to parse tunnel kind");
>> +
>> +
>>   void netdev_free(NetDev *netdev) {
>>           netdev_enslave_callback *callback;
>>
>> @@ -66,6 +83,7 @@ void netdev_free(NetDev *netdev) {
>>
>>           free(netdev->description);
>>           free(netdev->name);
>> +        free(netdev->tunnel_dev);
>>
>>           condition_free_list(netdev->match_host);
>>           condition_free_list(netdev->match_virt);
>> @@ -242,6 +260,169 @@ static int netdev_create_handler(sd_rtnl *rtnl, sd_rtnl_message *m, void *userda
>>           return 1;
>>   }
>>
>> +static int load_module(const char *mod_name) {
>> +        struct kmod_ctx *ctx;
>> +        struct kmod_list *list = NULL, *l;
>> +        int r;
>> +
>> +        ctx = kmod_new(NULL, NULL);
>> +        if (!ctx) {
>> +                kmod_unref(ctx);
>> +                return -ENOMEM;
>> +        }
> We probably want to avoid loading modules from networkd (see below),
> but if we need it, then store the kmod context in the manager object,
> so we only need to create it once (though may need to invalidate the
> caches if/when we reload the rest of the daemon config (which we
> currently don't)). The reason is that loading the configs is quite
> expensive, but once they are loaded using kmod is cheap.
  Now modified to context in manager.
>
>> +        r = kmod_module_new_from_lookup(ctx, mod_name, &list);
>> +        if (r < 0)
>> +                return -1;
>> +
>> +        kmod_list_foreach(l, list) {
>> +                struct kmod_module *mod = kmod_module_get_module(l);
>> +
>> +                r = kmod_module_probe_insert_module(mod, 0, NULL, NULL, NULL, NULL);
>> +                if (r >= 0)
>> +                        r = 0;
>> +                else
>> +                        r = -1;
>> +
>> +                kmod_module_unref(mod);
>> +        }
>> +
>> +        kmod_module_unref_list(list);
>> +        kmod_unref(ctx);
>> +
>> +        return r;
>> +}
>> +
>> +int config_parse_tunnel_address(const char *unit,
>> +                                const char *filename,
>> +                                unsigned line,
>> +                                const char *section,
>> +                                unsigned section_line,
>> +                                const char *lvalue,
>> +                                int ltype,
>> +                                const char *rvalue,
>> +                                void *data,
>> +                                void *userdata) {
>> +        NetDev *n = userdata;
>> +        int r;
>> +
>> +        assert(filename);
>> +        assert(lvalue);
>> +        assert(rvalue);
>> +        assert(data);
>> +
>> +        if(streq(lvalue, "Local"))
>> +                r = inet_pton(AF_INET, rvalue , (in_addr_t *)&n->tunnel_local.s_addr);
>> +        else
>> +                r = inet_pton(AF_INET, rvalue , (in_addr_t *)&n->tunnel_remote.s_addr);
>> +
>> +        if (r < 0) {
>> +                log_syntax(unit, LOG_ERR, filename, line, EINVAL,
>> +                           "Tunnel address is invalid, ignoring assignment: %s", rvalue);
>> +                return 0;
>> +        }
>> +
>> +        return 0;
>> +}
> Hm, we can probably reuse some of the existing address parsing
> functions don't you think? And we should also check the address
> faimilies here right?

I think I can reuse net_parse_inaddr .

>
>> +static int netdev_create_tunnel(NetDev *netdev, sd_rtnl_message *m) {
>> +        int r;
>> +
>> +        r = sd_rtnl_message_append_string(m, IFLA_IFNAME, netdev->name);
>> +        if (r < 0) {
>> +                log_error_netdev(netdev,
>> +                                 "Could not append IFLA_IFNAME, attribute: %s",
>> +                                 strerror(-r));
>> +                return r;
>> +        }
>> +
>> +        r = sd_rtnl_message_append_u32(m, IFLA_MTU, netdev->tunnel_mtu);
>> +        if (r < 0) {
>> +                log_error_netdev(netdev,
>> +                                 "Could not append IFLA_MTU attribute: %s",
>> +                                 strerror(-r));
>> +                return r;
>> +        }
>> +
>> +        r = sd_rtnl_message_open_container(m, IFLA_LINKINFO);
>> +        if (r < 0) {
>> +                log_error_netdev(netdev,
>> +                                 "Could not append IFLA_LINKINFO attribute: %s",
>> +                                 strerror(-r));
>> +                return r;
>> +        }
>> +
>> +        r = sd_rtnl_message_append_string(m, IFLA_INFO_KIND,
>> +                                          tunnel_kind_to_string(netdev->tunnel_kind));
>> +        if (r < 0) {
>> +                log_error_netdev(netdev,
>> +                                 "Could not append  IFLA_INFO_KIND attribute: %s",
>> +                                 strerror(-r));
>> +                return r;
>> +        }
> The way the sd_rtnl_message_open_container_union() works is that it
> will automatically add the IFLA_INFO_KIND attribute, so you can drop
> this.
  Done !
>> +        r = sd_rtnl_message_open_container_union(m, IFLA_INFO_DATA, "ipip");
>> +        if (r < 0) {
>> +                log_error_netdev(netdev,
>> +                                 "Could not append IFLA_INFO_DATA attribute: %s",
>> +                                 strerror(-r));
>> +                return r;
>> +        }
> Maybe not hardcode "ipip", but look it up depending on the Kind?
Plan was to replace with a switch case but yes
netdev_kind_to_string should do now.
>
>> +        r = sd_rtnl_message_append_u32(m, IFLA_IPTUN_LINK, if_nametoindex(netdev->tunnel_dev));
> We should never use if_nametoindex(), both because it is racy (the
> interface can be renamed at run-time) and because it is a synchronous
> interfac to RTNL, which we really want to avoid. We keep track of the
> ifindex of netdev's in their Link structure, so we should pass that
> structure into this function and use the ifindex from there.
Done !
>
>> +        if (r < 0) {
>> +                log_error_netdev(netdev,
>> +                                 "Could not append IFLA_IPTUN_LINK attribute: %s",
>> +                                 strerror(-r));
>> +                return r;
>> +        }
>> +
>> +        r = sd_rtnl_message_append_u32(m, IFLA_IPTUN_LOCAL, netdev->tunnel_local.s_addr);
>> +        if (r < 0) {
>> +                log_error_netdev(netdev,
>> +                                 "Could not append IFLA_IPTUN_LOCAL attribute: %s",
>> +                                 strerror(-r));
>> +                return r;
>> +        }
>> +
>> +        r = sd_rtnl_message_append_u32(m, IFLA_IPTUN_REMOTE, netdev->tunnel_remote.s_addr);
>> +        if (r < 0) {
>> +                log_error_netdev(netdev,
>> +                                 "Could not append IFLA_IPTUN_REMOTE attribute: %s",
>> +                                 strerror(-r));
>> +                return r;
>> +        }
> Hm, I guess these should be _append_in_addr() to get the typesafety
> right (might need to verify that we are using the right types for this
> in rtnl-types.c.
  I am missing something in the code . with the current rtnl code
it does not get appended.  Could you please give a example.

  r= sd_rtnl_message_append_in_addr(m, IFLA_IPTUN_LOCAL, (const struct 
in_addr *)
&netdev->tunnel_local.s_addr);

Could not append IFLA_IPTUN_LOCAL attribute: Invalid argument


>
>> +        r = sd_rtnl_message_close_container(m);
>> +        if (r < 0) {
>> +                log_error_netdev(netdev,
>> +                                 "Could not append IFLA_INFO_DATA attribute: %s",
>> +                                 strerror(-r));
>> +                return r;
>> +        }
>> +
>> +        r = sd_rtnl_message_close_container(m);
>> +        if (r < 0) {
>> +                log_error_netdev(netdev,
>> +                                 "Could not append IFLA_LINKINFO attribute: %s",
>> +                                 strerror(-r));
>> +                return r;
>> +        }
>> +
>> +        r = sd_rtnl_call_async(netdev->manager->rtnl, m, &netdev_create_handler, netdev, 0, NULL);
>> +        if (r < 0) {
>> +                log_error_netdev(netdev,
>> +                                 "Could not send rtnetlink message: %s", strerror(-r));
>> +                return r;
>> +        }
>> +
>> +        log_debug_netdev(netdev, "creating netdev tunnel");
>> +
>> +        netdev->state = NETDEV_STATE_CREATING;
>> +
>> +        return 0;
>> +}
>> +
>>   static int netdev_create(NetDev *netdev, Link *link, sd_rtnl_message_handler_t callback) {
>>           _cleanup_rtnl_message_unref_ sd_rtnl_message *req = NULL;
>>           const char *kind;
>> @@ -262,6 +443,10 @@ static int netdev_create(NetDev *netdev, Link *link, sd_rtnl_message_handler_t c
>>                   return r;
>>           }
>>
>> +        if(netdev->kind == NETDEV_KIND_TUNNEL) {
>> +                return netdev_create_tunnel(netdev, req);
>> +        }
>> +
>>           if (link) {
>>                   r = sd_rtnl_message_append_u32(req, IFLA_LINK, link->ifindex);
>>                   if (r < 0) {
>> @@ -418,9 +603,19 @@ int netdev_set_ifindex(NetDev *netdev, sd_rtnl_message *message) {
>>           }
>>
>>           if (!streq(kind, received_kind)) {
>> -                log_error_netdev(netdev, "Received newlink with wrong KIND");
>> -                netdev_enter_failed(netdev);
>> -                return r;
>> +                if(streq(kind, "tunnel")) {
>> +                        if(streq(received_kind, "ipip")) {
> This will be much nicer if we simply use "ipip" as the kind, rather
> than "tunnel".

Done !
>
>> +                                r = 0;
>> +                        } else
>> +                                r = -1;
>> +                } else
>> +                        r = -1;
>> +
>> +                if(r < 0) {
>> +                        log_error_netdev(netdev, "Received newlink with wrong KIND");
>> +                        netdev_enter_failed(netdev);
>> +                        return -EINVAL;
>> +                }
>>           }
>>
>>           r = sd_rtnl_message_link_get_ifindex(message, &ifindex);
>> @@ -474,7 +669,7 @@ static int netdev_load_one(Manager *manager, const char *filename) {
>>           netdev->macvlan_mode = _NETDEV_MACVLAN_MODE_INVALID;
>>           netdev->vlanid = VLANID_MAX + 1;
>>
>> -        r = config_parse(NULL, filename, file, "Match\0NetDev\0VLAN\0MACVLAN\0",
>> +        r = config_parse(NULL, filename, file, "Match\0NetDev\0VLAN\0MACVLAN\0Tunnel\0",
>>                            config_item_perf_lookup, (void*) network_netdev_gperf_lookup,
>>                            false, false, netdev);
>>           if (r < 0) {
>> @@ -510,6 +705,43 @@ static int netdev_load_one(Manager *manager, const char *filename) {
>>                   return 0;
>>           }
>>
>> +
>> +        if(netdev->kind == NETDEV_KIND_TUNNEL) {
>> +                if(!netdev->tunnel_kind == _TUNNEL_KIND_INVALID) {
>> +                        log_error_netdev(netdev, "Tunnel Kind is misssing Ignoring");
>> +                        return 0;
>> +
>> +                }
>> +
>> +                switch(netdev->tunnel_kind) {
>> +                case TUNNEL_KIND_IPIP:
>> +                        r = load_module("ipip");
> This should be fixed in the kernel I think. All that is needed is to
> add MODULE_ALIA_RTNL_LINK("ipip") to the ipip module (and the same for
> the other modules). This is already done for many (most?) netdev
> kinds, which is why this stuff "just works" for bridges, bonds, etc.

I am not sure how it will be fixed in  kernel. If the module not present
kernel will say not supported . Could you please give a example.
The main reason for loading module from networkd is avoid users
loading manually .


>
>> +                        break;
>> +                case TUNNEL_KIND_GRE:
>> +                case TUNNEL_KIND_SIT:
>> +                default:
>> +                        r = -1;
>> +                }
>> +
>> +                if (r < 0) {
>> +                        log_error_netdev(netdev, "Could not load Kernel module . Ignoring");
>> +                        return 0;
>> +
>> +                }
>> +
>> +                if(netdev->tunnel_mtu <= 0) {
>> +                        log_error_netdev(netdev, "MTU size shold be greater than 0. Ignoring");
>> +                        return 0;
>> +                }
> Hm, is this really necessary. Can we not simply ignore the MTU if it
> is unset and let the kernel set a default, or does that not work?
Removed .
>
>> +                r = if_nametoindex(netdev->tunnel_dev);
> Yeah, this is a no-no. We should simply wait for the device to appear
> and then trigger the tunnel creation from networkd-link.c.
Yes done .
>> +                if(!r) {
>> +                        log_error_netdev(netdev,
>

Thanks,
Susant


More information about the systemd-devel mailing list