Docker 网络驱动开发

Aug 11, 2023 23:30 · 3564 words · 8 minute read Docker Container Network Linux

Wu ChunYang

背景知识

常用的 Docker 网络驱动:

  • null:Docker 创建一个 sandbox,但是不会插入任何网卡
  • host:利用宿主机的网络栈(network stack)
  • bridge:利用 Linux bridge 网桥,以及 veth pair 实现跨网络命名空间通信
  • macvlan:把某一个物理网卡,模拟出多个虚拟网卡,每个虚拟网卡拥有各自的 MAC 和 IP 地址

需求

我们的需求是:要把某个网卡插入到一个容器中,来实现网络隔离。比如现在某个特定的虚拟机,绑定了两张网卡,一张管理网卡用于管理平面通信,另一张网卡则是用户的租户网卡,用于和用户的 VPC 内其他资源通信。而这个虚拟机中,还运行了容器,只需要和用户的 VPC 网络通信,而不需要和管理网通信;同样,管理进程也应该只和管理网通信,而不需要和用户的 VPC 内其他资源通信。

方案

创建容器后手动把网卡加入到容器的网络命名空间(netns)中。

如果选择手动把网卡插入到容器中,首选 null driver,因为不会分配任何网卡设备

  1. 查看当前网络

    root@ubuntu:~# docker network list
    NETWORK ID     NAME           DRIVER    SCOPE
    9c2980aec42a   host           host      local
    6153ca0ae1df   none           null      local
    420cba5e13e3   test_macvlan   macvlan   local
    
  2. 创建容器

    root@ubuntu:~# docker run --network none --rm --name test -d busybox sleep inf
    
    root@ubuntu:~# docker exec test ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
    

    其中只有一个 loopback 网卡,没有其他网络设备

  3. 创建虚拟网卡,并把网卡加入到容器中

    1. 创建 dummy0 虚拟设备

      root@ubuntu:~# ip link add type dummy dummy0
      root@ubuntu:~# ip link show dev dummy0
      20: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether 6e:d7:60:95:b7:59 brd ff:ff:ff:ff:ff:ff
      
    2. 链接容器网络命名空间到 /var/run/netns 路径下

      root@ubuntu:~# docker inspect test | jq -r .[0].NetworkSettings.SandboxKey
      /var/run/docker/netns/1f0a15007160
      
      root@ubuntu:~# ln -s /var/run/docker/netns/1f0a15007160 /var/run/netns/
      

      如果 /var/run/netns 目录不存在,则要自行手动创建,或者通过 ip netns add test 命令创建一个 test 网络命名空间,会自动创建 /var/run/netns

    3. 查看网络命名空间,并且把刚才创建的 dummy0 网卡加入到容器网络命名空间中

      root@ubuntu:~# ip netns
      1f0a15007160
      ovnmeta-2cd1a111-1e82-44b9-8473-24b4bbcd90f3 (id: 1)
      ovnmeta-0fc5a758-a903-492d-a87b-3d6215fde2f0 (id: 0)
      
      root@ubuntu:~# ip link set dev dummy0 netns 1f0a15007160
      root@ubuntu:~# ip netns exec 1f0a15007160 ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
          valid_lft forever preferred_lft forever
      20: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
          link/ether 6e:d7:60:95:b7:59 brd ff:ff:ff:ff:ff:ff
      
    4. 配置 IP

      root@ubuntu:~# ip netns exec 1f0a15007160 ip address add dev dummy0 192.168.1.0/24
      root@ubuntu:~# ip netns exec 1f0a15007160 ip link set dummy0 up
      root@ubuntu:~# ip netns exec 1f0a15007160 ip route add default  via 192.168.1.1
      root@ubuntu:~# ip netns exec 1f0a15007160 ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
          valid_lft forever preferred_lft forever
      20: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
          link/ether 6e:d7:60:95:b7:59 brd ff:ff:ff:ff:ff:ff
          inet 192.168.1.0/24 scope global dummy0
          valid_lft forever preferred_lft forever
      

      我们注意到配置 IP 时所有命令都以 ip netns exec 1f0a15007160 开头,表示在 1f0a15007160 网络命名空间中执行动作

  4. 总结

这种方案,可以实现把网卡插入到某个特定的容器(网络命名空间)中,并且还可以配置 IP 地址使其能与外部通信,一般只用于临时测试。缺点是一旦重启容器,网络命名空间(netns)会变化,因此不可用于持久化场景。

Docker host NIC driver

写一个 host NIC driver,来实现自动配置 IP 以及把网卡插入容器中。

强烈建议使用 Go 来开发这个 driver,利用开源的 docker SDK 库开发会更方便。

我们这项目基于 Python 开发,因此只能使用 Python 开发了该驱动,代码与注释如下:

遵循 Docker 的 network driver 接口规范:https://github.com/moby/libnetwork/blob/3c8e06bc0580a2a1b2440fe0792fbfcd43a9feca/docs/remote.md#remote-drivers

import json
import netaddr
import os
import sys

import flask
from flask import Flask
from pyroute2 import IPRoute

# hostnic_config用来保存网卡的相关信息,所有信息放在/etc/docker/hostnic.json中
class hostnic_config(object):
    """this class records network id and its host nic"""
    CONFIG_FILE = "/etc/docker/hostnic.json"

    def __init__(self) -> None:
        if not os.path.exists(self.CONFIG_FILE):
            with open(self.CONFIG_FILE, 'w+') as f:
                f.write(json.dumps({}))

    def get_data(self) -> dict:
        with open(self.CONFIG_FILE, 'r') as cfg:
            data = json.loads(cfg.read())
        return data

    def write_config(self, key: str, value: str):
        data = self.get_data()
        data[key] = value
        with open(self.CONFIG_FILE, 'w+') as cfg:
            cfg.write(json.dumps(data))

    def get_config(self, key: str):
        data = self.get_data()
        return data.get(key, "")

    def delete_config(self, key: str):
        data = self.get_data()
        if not data.get(key):
            return
        data.pop(key)
        with open(self.CONFIG_FILE, 'w+') as cfg:
            cfg.write(json.dumps(data))

driver_config = hostnic_config()

app = Flask(__name__)

SCHEMA={"SUCCESS": {}}

# Docker 在启动时候,会调用该接口测试这个 driver 是否可用,需要返回 Implements 字典,表明实现的是 network driver 还是 volume driver。
@app.route('/Plugin.Activate', methods=['POST', 'GET'])
def plugin_activate():
    """Returns the list of the implemented drivers.

    See the following link for more details about the spec:

      https://github.com/docker/libnetwork/blob/master/docs/remote.md#handshake  # noqa
    """
    return flask.jsonify({"Implements": ["NetworkDriver"]})

# 该函数是用于判断这个 driver 的范围,是 local 还是 global:local 表示这个 network driver 适用于本机,而 global 则是适用于集群。
@app.route('/NetworkDriver.GetCapabilities', methods=['POST'])
def plugin_scope():
    """Returns the capability as the remote network driver.

    This function returns the capability of the remote network driver, which is
    ``global`` or ``local`` and defaults to ``local``. With ``global``
    capability, the network information is shared among multipe Docker daemons
    if the distributed store is appropriately configured.

    See the following link for more details about the spec:

      https://github.com/docker/libnetwork/blob/master/docs/remote.md#set-capability  # noqa
    """
    capabilities = {'Scope': 'local'}
    return flask.jsonify(capabilities)

# 配置是否监听 notification
@app.route('/NetworkDriver.DiscoverNew', methods=['POST'])
def network_driver_discover_new():
    """The callback function for the DiscoverNew notification.

    The DiscoverNew notification includes the type of the
    resource that has been newly discovered and possibly other
    information associated with the resource.

    See the following link for more details about the spec:

      https://github.com/docker/libnetwork/blob/master/docs/remote.md#discovernew-notification  # noqa
    """
    return flask.jsonify(SCHEMA['SUCCESS'])

# 删除消息通知,这里对该事件并不关心,因此返回空字典就行(空字典代表成功)。
@app.route('/NetworkDriver.DiscoverDelete', methods=['POST'])
def network_driver_discover_delete():
    """The callback function for the DiscoverDelete notification.

    See the following link for more details about the spec:

      https://github.com/docker/libnetwork/blob/master/docs/remote.md#discoverdelete-notification  # noqa
    """
    return flask.jsonify(SCHEMA['SUCCESS'])

# 创建网络,即用户执行 docker network create 命令的时候,会到该请求。
@app.route('/NetworkDriver.CreateNetwork', methods=['POST'])
def network_driver_create_network():
    """Creates a new  Network which name is the given NetworkID.
    example:
    docker network create --driver docker-hostnic --gateway 192.168.1.1  --subnet 192.168.1.0/24 -o hostnic_mac=52:54:00:e1:d9:ef  test_network
    See the following link for more details about the spec:

      https://github.com/docker/libnetwork/blob/master/docs/remote.md#create-network  # noqa
    """
    json_data = flask.request.get_json(force=True)
    hostnic_mac = \
        json_data['Options']['com.docker.network.generic']['hostnic_mac']

    # 在创建网络的时候指定 -o hostnic_mac=xxx 参数,我们得记住该 MAC 地址,后续用该 MAC 作为索引去找到对应的 NIC,这里的 networkid 是由 Docker 分配,我们无法指定该 ID,需要自己记录到别的地方。
    if driver_config.get_config(json_data['NetworkID']):
        return flask.jsonify("network already has a host nic")
    gw = json_data.get("IPv4Data")[0].get("Gateway", '')
    netinfo = {"mac_address": hostnic_mac}
    if gw:
        ip = netaddr.IPNetwork(gw)
        netinfo["gateway"] = str(ip.ip)
    # 如果没有网关的话,Docker 会自动生成一个网关,因此最好还是指定网关。之后把这些信息,记录到配置文件即可:一个网络和它需要的网卡信息。
    driver_config.write_config(json_data['NetworkID'], netinfo)
    return flask.jsonify(SCHEMA['SUCCESS'])

# 删除网络的时候,直接从配置文件中删除即可。
@app.route('/NetworkDriver.DeleteNetwork', methods=['POST'])
def network_driver_delete_network():
    # Just remove the network from the config file.
    json_data = flask.request.get_json(force=True)
    driver_config.delete_config(json_data['NetworkID'])
    return flask.jsonify(SCHEMA['SUCCESS'])

# network join 也就是执行 docker network connect 命令的时候,把一个容器加入到某个网络中,在创建容器的时候会调用。
@app.route('/NetworkDriver.Join', methods=['POST'])
def network_driver_join():
    json_data = flask.request.get_json(force=True)
    netid = json_data['NetworkID']
    hostnic_mac = driver_config.get_config(netid).get('mac_address')
    # 使用 iproute 通过 MAC 地址,获取到对应的接口名字
    ipr = IPRoute()
    ifaces = ipr.get_links(address=hostnic_mac)
    ifname = ifaces[0].get_attr('IFLA_IFNAME')
    gw = driver_config.get_config(netid).get('gateway')
    # 这里的 ifname 变量,就是我们所有的网卡名,Docker 只需要你指定该设备名字即可,后续 Docker 会统一把这个设备自动加入到容器中,这一步无需我们来做。DstPrefix,也就是网卡被加入到容器中的名字前缀。
    join_response = {
        "InterfaceName": {
            "SrcName": ifname,
            "DstPrefix": "eth"},
        "Gateway": gw
    }

    return flask.jsonify(join_response)

# leave 也就是执行 docker network disconnet,直接返回就行,容器的 sandbox 被删除后,接口会自动回到 default netns 中。
@app.route('/NetworkDriver.Leave', methods=['POST'])
def network_driver_leave():
    """Unbinds a hostnic from a sandbox.

    This function takes the following JSON data and delete the veth pair
    corresponding to the given info. ::

        {
            "NetworkID": string,
            "EndpointID": string
        }
    we don't need to remove the port from the sandbox explicitly,
    once the sandbox get deleted, the hostnic comes to default
    netns automatically.
    """
    return flask.jsonify(SCHEMA['SUCCESS'])

@app.route('/NetworkDriver.DeleteEndpoint', methods=['POST'])
def network_driver_delete_endpoint():
    return flask.jsonify(SCHEMA['SUCCESS'])

@app.route('/NetworkDriver.CreateEndpoint', methods=['POST'])
def network_driver_create_endpoint():
    # endpoint 是容器中的网卡设备 ID,由 Docker 自己创建并维护,由于我们的 dirver 一个网络只能创建一个容器,
    # 因此并不需要创建 tap/tun 设备等。如果是实现其他 driver,这个方法大概率要实现(创建网卡)
    # 并且要记录下 endpoint 和自己创建的网卡之间的定义关系。
    return flask.jsonify(SCHEMA['SUCCESS'])

#删除 endpoint,由于是物理设备,并不需要
@app.route('/NetworkDriver.EndpointOperInfo', methods=['POST'])
def network_driver_endpoint_operational_info():
    return flask.jsonify(SCHEMA['SUCCESS'])

#网络外网能力,不需要实现
@app.route('/NetworkDriver.ProgramExternalConnectivity', methods=['POST'])
def network_driver_program_external_connectivity():
    """provide external connectivity for the given container."""
    return flask.jsonify(SCHEMA['SUCCESS'])

@app.route('/NetworkDriver.RevokeExternalConnectivity', methods=['POST'])
def network_driver_revoke_external_connectivity():
    """Removes external connectivity for a given container.

    Performs the necessary programming to remove the external connectivity
    of a container

    See the following link for more details about the spec:
      https://github.com/docker/libnetwork/blob/master/driverapi/driverapi.go
    """
    return flask.jsonify(SCHEMA['SUCCESS'])

# driver 的名字为 docker-hostnic.sock,Docker 在启动时,会遍历这个目录下的 Unix Domain Sock 文件,取文件名作为 driver 名字,但是不要在生产中 app.run(),有坑,后续会说。
app.run(host="unix:///run/docker//plugins/docker-hostnic.sock")

安装必要的库:flaskpyroute2netaddr,使用 pip 安装即可:

pip install flask pyroute2 netaddr

运行我们的 network driver:

(test-venv) root@ubuntu:~# python docker-hostnic.py
 * Serving Flask app 'docker-hostnic'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on unix:///run/docker/docker-hostnic.sock
Press CTRL+C to quit

(test-venv) root@ubuntu:~#  ls -al /run/docker/plugins/docker-hostnic.sock
srwxr-xr-x 1 root root 0 Aug 10 15:43 /run/docker/plugins/docker-hostnic.sock

测试 network driver

重新打开一个终端来测试。先重启 Docker,因为 Docker 需要在 driver 之后启动;在 driver 停止之前关闭。

  1. 创建用于插入容器中的测试网卡
root@ubuntu:~# ip link add type dummy dummy0
root@ubuntu:~# ip link show dummy0
21: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 5e:e8:43:c3:39:6b brd ff:ff:ff:ff:ff:ff
  1. 创建网络与容器
$ root@ubuntu:~# docker network create --driver docker-hostnic --gateway 192.168.1.1  --subnet 192.168.1.0/24 -o hostnic_mac=5e:e8:43:c3:39:6b test-hostnic-network
959315bee36c71bd3cfb279f3d979c53a80f641a429ab2bb72a29563ccd51645
root@ubuntu:~# docker network ls
NETWORK ID     NAME                   DRIVER           SCOPE
9c2980aec42a   host                   host             local
6153ca0ae1df   none                   null             local
959315bee36c   test-hostnic-network   docker-hostnic   local
420cba5e13e3   test_macvlan           macvlan          local
root@ubuntu:~# docker run -d --ip 192.168.1.10 --name test --network test-hostnic-network busybox sleep inf
  1. 查看容器中的网卡
root@ubuntu:~# docker exec test ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
21: eth0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue qlen 1000
    link/ether 5e:e8:43:c3:39:6b brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
root@ubuntu:~# docker exec test ip r
default via 192.168.1.1 dev eth0
192.168.1.0/24 dev eth0 scope link  src 192.168.1.10
  1. API 请求输出的日志
Press CTRL+C to quit
<local> - - [10/Aug/2023 15:28:50] "POST /Plugin.Activate HTTP/1.1" 200 -
<local> - - [10/Aug/2023 15:28:50] "POST /NetworkDriver.GetCapabilities HTTP/1.1" 200 -
<local> - - [10/Aug/2023 15:28:50] "POST /NetworkDriver.CreateNetwork HTTP/1.1" 200 -
<local> - - [10/Aug/2023 15:29:00] "POST /NetworkDriver.CreateEndpoint HTTP/1.1" 200 -
<local> - - [10/Aug/2023 15:29:00] "POST /NetworkDriver.Join HTTP/1.1" 200 -
<local> - - [10/Aug/2023 15:29:00] "POST /NetworkDriver.ProgramExternalConnectivity HTTP/1.1" 200 -
<local> - - [10/Aug/2023 15:29:00] "POST /NetworkDriver.EndpointOperInfo HTTP/1.1" 200 -
<local> - - [10/Aug/2023 15:29:00] "POST /NetworkDriver.EndpointOperInfo HTTP/1.1" 200 -
  1. 使用 systemd 的 socket-activate 激活服务

    由于 driver 需要在 Docker 运行前运行,因此推荐使用这种方式来启动服务。socket-active 工作原理是在 systemd 启动的时候,会自动创建本该属于 app 的 Unix Domain Socket 文件,systemd 监听到请求的时候,就激活后端的 service,把 socket 文件的 fd,移交给后面的 serivce 来处理,因此只有在有请求到来的时候,才会启动该服务。

    # /lib/systemd/system/docker-hostnic.socket
    [Unit]
    Description=docker hostnic driver
    
    [Socket]
    ListenStream=/run/docker/plugins/docker-hostnic.sock
    
    [Install]
    WantedBy=sockets.target
    
    # /lib/systemd/system/docker-hostnic.service
    [Unit]
    Description=Docker host NIC plugin Service
    Before=docker.service
    After=network.target docker-hostnic.socket
    Requires=docker-hostnic.socket docker.service
    
    [Service]
    User=root
    Group=root
    ExecStart=/usr/bin/python3 /you/path/docker-hostnic.py
    
    [Install]
    WantedBy=multi-user.target
    

    systemctl enable --now docker-hostnic.socket 设置 socket 开机自启,而非 service 自启。

    查看 systemd unit 状态,socket 处于 active 状态,但 service 不是:

    (test-venv) root@ubuntu:~# systemctl  status docker-hostnic.socket  docker-hostnic.service
    ● docker-hostnic.socket - docker hostnic driver
        Loaded: loaded (/lib/systemd/system/docker-hostnic.socket; enabled; vendor preset: enabled)
        Active: active (listening) since Thu 2023-08-10 15:41:01 UTC; 12s ago
    Triggers: ● docker-hostnic.service
        Listen: /run/docker/plugins/docker-hostnic.sock (Stream)
        CGroup: /system.slice/docker-hostnic.socket
    
    Aug 10 15:41:01 ubuntu systemd[1]: Listening on docker hostnic driver.
    
    ● docker-hostnic.service - Docker hostnic plugin Service
        Loaded: loaded (/lib/systemd/system/docker-hostnic.service; disabled; vendor preset: enabled)
        Active: inactive (dead)
    TriggeredBy: ● docker-hostnic.socket
    

我们的 app 被激活后,不能正处理请求。执行 /plugin.activate 请求,命令会卡阻塞在这一步:

root@ubuntu:~# curl --unix-socket /run/docker/plugins/docker-hostnic.sock http://localhost/Plugin.Activate

查看 systemd unit 状态:

root@ubuntu:~# systemctl status docker-hostnic.socket  docker-hostnic.service
● docker-hostnic.socket - docker hostnic driver
     Loaded: loaded (/lib/systemd/system/docker-hostnic.socket; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-08-10 15:41:01 UTC; 2min 53s ago
   Triggers: ● docker-hostnic.service
     Listen: /run/docker/plugins/docker-hostnic.sock (Stream)
     CGroup: /system.slice/docker-hostnic.socket

Aug 10 15:41:01 ubuntu systemd[1]: Listening on docker hostnic driver.

● docker-hostnic.service - Docker hostnic plugin Service
     Loaded: loaded (/lib/systemd/system/docker-hostnic.service; disabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-08-10 15:43:01 UTC; 53s ago
TriggeredBy: ● docker-hostnic.socket
   Main PID: 270757 (python)
      Tasks: 1 (limit: 30398)
     Memory: 27.3M
     CGroup: /system.slice/docker-hostnic.service
             └─270757 /root/test-venv/bin/python /root/docker-hostnic.py

Aug 10 15:43:01 ubuntu systemd[1]: Started Docker hostnic plugin Service.
Aug 10 15:43:01 ubuntu python[270757]:  * Serving Flask app 'docker-hostnic'
Aug 10 15:43:01 ubuntu python[270757]:  * Debug mode: off
Aug 10 15:43:01 ubuntu python[270757]: WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
Aug 10 15:43:01 ubuntu python[270757]:  * Running on unix:///run/docker//plugins/docker-hostnic.sock
Aug 10 15:43:01 ubuntu python[270757]: Press CTRL+C to quit

发现 server 已经被激活了,但是并没有处理消息,ctrl + c 之后,重新运行即可。切记不要使用 app.run() 的方式,而是要用 uswgi 或者 gunicorn 来运行这个服务。