Multus CNI

Feb 7, 2022 15:30 · 2924 words · 6 minute read

Multus CNI 部署

集群已选用 Calico 作为网络插件并配置为 IPIP 模式。

$ cat /etc/cni/net.d/10-calico.conflist | jq
{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "log_file_path": "/var/log/calico/cni/cni.log",
      "datastore_type": "kubernetes",
      "nodename": "multuscni-test0",
      "mtu": 0,
      "ipam": {
        "type": "calico-ipam"
      },
      "policy": {
        "type": "k8s"
      },
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {
        "portMappings": true
      }
    },
    {
      "type": "bandwidth",
      "capabilities": {
        "bandwidth": true
      }
    }
  ]
}
  1. 部署 Multus CNI 网络插件

    $ kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick-plugin.yml
    customresourcedefinition.apiextensions.k8s.io/network-attachment-definitions.k8s.cni.cncf.io created
    clusterrole.rbac.authorization.k8s.io/multus created
    clusterrolebinding.rbac.authorization.k8s.io/multus created
    serviceaccount/multus created
    daemonset.apps/kube-multus-ds created
    $ kubectl get po -n kube-system | grep multus
    kube-multus-ds-5wmn9                       1/1     Running   0          51s
    
  2. 创建 NetworkAttachmentDefinition

    $ cat <<EOF | kubectl apply -f -
    apiVersion: "k8s.cni.cncf.io/v1"
    kind: NetworkAttachmentDefinition
    metadata:
    name: macvlan-conf
    spec:
    config: '{
        "cniVersion": "0.3.1",
        "type": "macvlan",
        "master": "eth1",
        "mode": "bridge",
        "ipam": {
            "type": "host-local",
            "ranges": [
            [
                {
                "subnet": "10.37.132.0/24",
                "rangeStart": "10.37.132.20",
                "rangeEnd": "10.37.132.50",
                "gateway": "10.37.132.1"
                }
            ]
            ]
        }
        }'
    EOF
    networkattachmentdefinition.k8s.cni.cncf.io/macvlan-conf created
    $ kubectl get network-attachment-definition
    NAME           AGE
    macvlan-conf   28s
    
  3. 创建一个测试 Pod

    $ cat <<EOF | kubectl create -f -
    apiVersion: v1
    kind: Pod
    metadata:
    name: pod-case-01
    annotations:
        k8s.v1.cni.cncf.io/networks: macvlan-conf
    spec:
    containers:
    - name: pod-case-01
        image: docker.io/centos/tools:latest
        command:
        - /sbin/init
    EOF
    pod/pod-case-01 created
    $ ps -ef | grep "/sbin/init"
    root     30104 30083  0 11:08 ?        00:00:00 /sbin/init
    $ nsenter -n -t 30104
    $ ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
    2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
        link/ipip 0.0.0.0 brd 0.0.0.0
    4: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
        link/ether 9a:76:9b:84:bd:17 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 192.168.1.12/32 scope global eth0
        valid_lft forever preferred_lft forever
    5: net1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        link/ether 62:74:fb:df:a8:35 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 10.37.132.20/24 brd 10.37.132.255 scope global net1
        valid_lft forever preferred_lft forever
    $ ip r
    default via 169.254.1.1 dev eth0
    10.37.132.0/24 dev net1 proto kernel scope link src 10.37.132.20
    169.254.1.1 dev eth0 scope link
    
    • 第一张网卡 eth0,IP 为 192.168.1.12
    • 第二张网卡 net1,IP 为 10.37.132.20

Multus CNI 工作原理

multus CNI 网络插件的配置文件:

$ cat /etc/cni/net.d/00-multus.conf | jq
{
  "capabilities": {
    "bandwidth": true,
    "portMappings": true
  },
  "cniVersion": "0.3.1",
  "delegates": [
    {
      "cniVersion": "0.3.1",
      "name": "k8s-pod-network",
      "plugins": [
        {
          "datastore_type": "kubernetes",
          "ipam": {
            "type": "calico-ipam"
          },
          "kubernetes": {
            "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
          },
          "log_file_path": "/var/log/calico/cni/cni.log",
          "log_level": "info",
          "mtu": 0,
          "nodename": "multuscni-test0",
          "policy": {
            "type": "k8s"
          },
          "type": "calico"
        },
        {
          "capabilities": {
            "portMappings": true
          },
          "snat": true,
          "type": "portmap"
        },
        {
          "capabilities": {
            "bandwidth": true
          },
          "type": "bandwidth"
        }
      ]
    }
  ],
  "logLevel": "verbose",
  "logToStderr": true,
  "kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig",
  "name": "multus-cni-network",
  "type": "multus"
}

calico CNI 配置文件完整地出现在了 multus 插件配置的 delegates 字段中。

我们在部署 multus 插件时创建了一个类型为 NetworkAttachmentDefinition 的 CRD(CustomResourceDefinition)对象,而 NetworkAttachmentDefinition 来自于 k8snetworkplumbingwg/network-attachment-definition-client 项目的 API 定义:https://github.com/k8snetworkplumbingwg/network-attachment-definition-client/blob/master/pkg/apis/k8s.cni.cncf.io/v1/types.go

// +genclient
// +genclient:noStatus
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// +resourceName=network-attachment-definitions

type NetworkAttachmentDefinition struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec NetworkAttachmentDefinitionSpec `json:"spec"`
}

创建容器网络栈

测试 Pod pod-case-01 的资源定义中携带了一条注解 k8s.v1.cni.cncf.io/networks: macvlan-conf,macvlan-conf 正是我们所创建的 NetworkAttachmentDefinition。

multus 作为 CNI 插件的一种实现,同样遵循 CNI 接口规范,实现 ADDDEL 等操作:

本文只研究在 Kubernetes 启动 Pod 时 Multus CNI 如何为其创建网络栈,即 ADD 操作的实现,销毁 Pod 时的删除操作不做讨论。

  1. 加载委托插件(delegate)并将其添加至 multus 配置 https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.8/pkg/k8sclient/k8sclient.go#L309-L374

    _, kc, err := k8s.TryLoadPodDelegates(pod, n, kubeClient, resourceMap)
    if err != nil {
        return nil, cmdErr(k8sArgs, "error loading k8s delegates k8s args: %v", err)
    }
    
    1. 尝试解析 Pod 注解中是否携带 v1.multus-cni.io/default-network 键值对(用户指定默认网络):

      https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.8/pkg/k8sclient/k8sclient.go#L586-L613

      // TryLoadPodDelegates attempts to load Kubernetes-defined delegates and add them to the Multus config.
      // Returns the number of Kubernetes-defined delegates added or an error.
      func TryLoadPodDelegates(pod *v1.Pod, conf *types.NetConf, clientInfo *ClientInfo, resourceMap map[string]*types.ResourceInfo) (int, *ClientInfo, error) {
          // a lot of code here
          delegate, err := tryLoadK8sPodDefaultNetwork(clientInfo, pod, conf)
          if err != nil {
              return 0, nil, logging.Errorf("TryLoadPodDelegates: error in loading K8s cluster default network from pod annotation: %v", err)
          }
          if delegate != nil {
              logging.Debugf("TryLoadPodDelegates: Overwrite the cluster default network with %v from pod annotations", delegate)
      
              conf.Delegates[0] = delegate
          }
      }
      
    2. 尝试解析 Pod 注解中是否携带 k8s.v1.cni.cncf.io/networks 键值对(用户指定 NetworkAttachmentDefinition):

      https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.8/pkg/k8sclient/k8sclient.go#L436-L452

      // GetPodNetwork gets net-attach-def annotation from pod
      func GetPodNetwork(pod *v1.Pod) ([]*types.NetworkSelectionElement, error) {
          logging.Debugf("GetPodNetwork: %v", pod)
      
          netAnnot := pod.Annotations[networkAttachmentAnnot]
          defaultNamespace := pod.ObjectMeta.Namespace
      
          if len(netAnnot) == 0 {
              return nil, &NoK8sNetworkError{"no kubernetes network found"}
          }
      
          networks, err := parsePodNetworkAnnotation(netAnnot, defaultNamespace)
          if err != nil {
              return nil, err
          }
          return networks, nil
      }
      

      我们定义的 pod-case-01 Pod 确实携带了注解 k8s.v1.cni.cncf.io/networks: macvlan-conf 键值对。

      parsePodNetworkAnnotation 函数中,拆解 k8s.v1.cni.cncf.io/networks 对应的值用于初始化 types.NetworkSelectionElement 并追加至切片:https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.8/pkg/k8sclient/k8sclient.go#L173-L240

      func parsePodNetworkAnnotation(podNetworks, defaultNamespace string) ([]*types.NetworkSelectionElement, error) {
          if strings.IndexAny(podNetworks, "[{\"") >= 0 {
              if err := json.Unmarshal([]byte(podNetworks), &networks); err != nil {
                  return nil, logging.Errorf("parsePodNetworkAnnotation: failed to parse pod Network Attachment Selection Annotation JSON format: %v", err)
              }
          } else {
              // Comma-delimited list of network attachment object names
              for _, item := range strings.Split(podNetworks, ",") {
                  // Remove leading and trailing whitespace.
                  item = strings.TrimSpace(item)
      
                  // Parse network name (i.e. <namespace>/<network name>@<ifname>)
                  netNsName, networkName, netIfName, err := parsePodNetworkObjectName(item)
                  if err != nil {
                      return nil, logging.Errorf("parsePodNetworkAnnotation: %v", err)
                  }
      
                  networks = append(networks, &types.NetworkSelectionElement{
                      Name:             networkName,
                      Namespace:        netNsName,
                      InterfaceRequest: netIfName,
                  })
              }
          }
      }
      

      networkName 就是 macvlan-conf。

    3. 从 Kubernetes 集群中获取指定的 NetworkAttachmentDefinition:https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.8/pkg/k8sclient/k8sclient.go#L242-L294

      func getKubernetesDelegate(client *ClientInfo, net *types.NetworkSelectionElement, confdir string, pod *v1.Pod, resourceMap map[string]*types.ResourceInfo) (*types.DelegateNetConf, map[string]*types.ResourceInfo, error) {
      
          logging.Debugf("getKubernetesDelegate: %v, %v, %s, %v, %v", client, net, confdir, pod, resourceMap)
          customResource, err := client.NetClient.NetworkAttachmentDefinitions(net.Namespace).Get(context.TODO(), net.Name, metav1.GetOptions{})
          if err != nil {
              errMsg := fmt.Sprintf("cannot find a network-attachment-definition (%s) in namespace (%s): %v", net.Name, net.Namespace, err)
              if client != nil {
                  client.Eventf(pod, v1.EventTypeWarning, "NoNetworkFound", errMsg)
              }
              return nil, resourceMap, logging.Errorf("getKubernetesDelegate: " + errMsg)
          }
      
          // a lot of code here
      
          configBytes, err := netutils.GetCNIConfig(customResource, confdir)
          if err != nil {
              return nil, resourceMap, err
          }
      
          delegate, err := types.LoadDelegateNetConf(configBytes, net, deviceID, resourceName)
          if err != nil {
              return nil, resourceMap, err
          }
      }
      

      解析后得到 NetworkAttachmentDefinition 的 spec 字段中的配置字符串:

      {
          "cniVersion":"0.3.1",
          "type":"macvlan",
          "master":"eth1",
          "mode":"bridge",
          "ipam":{
              "type":"host-local",
              "ranges":[
                  [
                      {
                          "subnet":"10.37.132.0/24",
                          "rangeStart":"10.37.132.20",
                          "rangeEnd":"10.37.132.50",
                          "gateway":"10.37.132.1"
                      }
                  ]
              ]
          }
      }
      

      即 macvlan CNI 插件。

      我们通过 NetworkAttachmentDefinition 定义了 macvlan CNI 插件(用于配置 Pod 网络栈第二张网卡)的配置,而 multus 通过 Pod 注解中携带的 NetworkAttachmentDefinition 名称读取到 CNI 插件作为 delegate。

      k8s.v1.cni.cncf.io/networks 的值的数量对应了 Pod 网络栈除 eh0 外网卡的数量,多个 CNI 插件配置都会被追加至 delegates 切片。

      最后 delegates 会被追加至 multus 的 CNI 配置结构的 Delegates 字段:https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.8/pkg/k8sclient/k8sclient.go#L338-L351

      delegates, err := GetNetworkDelegates(clientInfo, pod, networks, conf, resourceMap)
      
      if err != nil {
          if _, ok := err.(*NoK8sNetworkError); ok {
              return 0, clientInfo, nil
          }
          return 0, nil, logging.Errorf("TryLoadPodDelegates: error in getting k8s network for pod: %v", err)
      }
      
      if err = conf.AddDelegates(delegates); err != nil {
          return 0, nil, err
      }
      
  2. 在补充完配置结构体后,遍历它的 Delegates 字段:https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.8/pkg/multus/multus.go#L600-L686

    1. 获取网卡名称:https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.8/pkg/multus/multus.go#L93-106

      func getIfname(delegate *types.DelegateNetConf, argif string, idx int) string {
          logging.Debugf("getIfname: %v, %s, %d", delegate, argif, idx)
          if delegate.IfnameRequest != "" {
              return delegate.IfnameRequest
          }
          if delegate.MasterPlugin {
              // master plugin always uses the CNI-provided interface name
              return argif
          }
      
          // Otherwise construct a unique interface name from the delegate's
          // position in the delegate list
          return fmt.Sprintf("net%d", idx)
      }
      

      所以我们看到 Pod 网络栈中第二张网卡名称一般都是 net1(默认)。

    2. 然后调用 delegate 网络插件的 ADD 操作为 Pod 配置网络栈:

      https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.8/pkg/multus/multus.go#L344-L349

      result, err = confAdd(rt, delegate.Bytes, multusNetconf, exec)
      if err != nil {
          return nil, err
      }
      

      https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.8/pkg/multus/multus.go#L168-L186

      func confAdd(rt *libcni.RuntimeConf, rawNetconf []byte, multusNetconf *types.NetConf, exec invoke.Exec) (cnitypes.Result, error) {
          logging.Debugf("confAdd: %v, %s", rt, string(rawNetconf))
          // In part, adapted from K8s pkg/kubelet/dockershim/network/cni/cni.go
          binDirs := filepath.SplitList(os.Getenv("CNI_PATH"))
          binDirs = append([]string{multusNetconf.BinDir}, binDirs...)
          cniNet := libcni.NewCNIConfigWithCacheDir(binDirs, multusNetconf.CNIDir, exec)
      
          conf, err := libcni.ConfFromBytes(rawNetconf)
          if err != nil {
              return nil, logging.Errorf("error in converting the raw bytes to conf: %v", err)
          }
      
          result, err := cniNet.AddNetwork(context.Background(), conf, rt)
          if err != nil {
              return nil, err
          }
      
          return result, nil
      }
      
    3. 有多少 delegate 就执行多少次对应 CNI 网络插件的 ADD 操作。

    4. 根据 delegate CNI 插件的配置添加默认网关:https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.8/pkg/multus/multus.go#L650-L656

      if adddefaultgateway {
          tmpResult, err = netutils.SetDefaultGW(args, ifName, delegate.GatewayRequest, &tmpResult)
          if err != nil {
              return nil, cmdErr(k8sArgs, "error setting default gateway: %v", err)
          }
      }
      

以上就是 multus CNI 网络插件在 Pod 创建时初始化网络栈的完整过程。multus CNI 网络插件本身并不会执行创建网卡、设置路由表等操作,而是读取 Kubernetes 集群的默认网络插件和 NetworkAttachmentDefinition 定义的 CNI 网络插件配置来调用它们执行 ADD 操作完成 Pod 网络栈配置。

Logo