KubeVirt 虚拟机 HPA 原理

Jan 11, 2024 21:30 · 1231 words · 3 minute read Kubernetes Virtualization Linux KubeVirt

本文中 KubeVirt 版本为 v1.0.0

使用

KubeVirt 支持虚拟机自动水平扩缩(HPA):根据 CPU 使用率等指标自动调整实例的数量。

类似 Deployment,首先要创建 VirtualMachineInstanceReplicaSet 对象来管理多个副本的实例(VMI):

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceReplicaSet
metadata:
  name: testreplicaset
spec:
  replicas: 3
  selector:
    matchLabels:
      kubevirt.io/vm: myvmi
  template:
    metadata:
      name: test
      labels:
        kubevirt.io/vm: myvmi
    spec:
      domain:
        cpu:
          sockets: 2
          cores: 1
          threads: 1
        memory:
          guest: 1Gi
        devices:
          disks:
          - disk:
              bus: virtio
            name: bootvol
          - disk:
              bus: virtio
            name: cloudinitdisk
        resources:
          requests:
            cpu: 2
            memory: 1Gi
      volumes:
      - name: bootvol
        containerDisk:
          image: crazytaxii/fedora36:latest
      - name: cloudinitdisk
        cloudInitNoCloud:
          userData: |-
            #cloud-config
            user: root
            password: atomic
            ssh_pwauth: True
            chpasswd: { expire: False }

和 Deployment 中包含 Pod 模板一样,VirtualMachineInstanceReplicaSet 中包含 VMI 模板。

$ kubectl get vmi -l "kubevirt.io/vm=myvmi"
NAME        AGE     PHASE     IP            NODENAME   READY
test8d4px   2m54s   Running   172.10.3.22   node172    True
test8n5rb   2m54s   Running   172.10.3.12   node173    True
test9jhrj   2m54s   Running   172.10.3.43   node172    True

virt-contorller 监听 VirtualMachineInstanceReplicaSet 对象后创建 3 副本数量的 VMI。

通过 kubectl scale 命令将 VMI 副本数调整至 2 个:

$ kubectl scale vmirs testreplicaset --replicas 2
virtualmachineinstancereplicaset.kubevirt.io/testreplicaset scaled

$ kubectl get vmirs testreplicaset -o jsonpath='{.spec.replicas}'
2

$ kubectl get vmi -l "kubevirt.io/vm=myvmi"
NAME        AGE     PHASE     IP            NODENAME   READY
test8n5rb   6m56s   Running   172.10.3.12   node173    True
test9jhrj   6m56s   Running   172.10.3.43   node172    True

创建 HorizontalPodAutoscaler 对象:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: test-hpa
spec:
  scaleTargetRef:
    kind: VirtualMachineInstanceReplicaSet
    name: testreplicaset
    apiVersion: kubevirt.io/v1
  minReplicas: 2
  maxReplicas: 5
  targetCPUUtilizationPercentage: 50

.spec.scaleTargetRef 字段中指向上面创建的 VirtualMachineInstanceReplicaSet 对象。

原理

首先来探究 Kubernetes 如何通过 kubectl scale 命令来调整 VirtualMachineInstanceReplicaSet 对象 .spec.replicas 字段。

$ kubectl scale vmirs testreplicaset --replicas 3 -v 8
# a log of log here
I0111 11:16:51.800286  280527 request.go:1188] Request Body: {"spec":{"replicas":3}}
I0111 11:16:51.800326  280527 round_trippers.go:463] PATCH https://vip.node173.com:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachineinstancereplicasets/testreplicaset/scale
I0111 11:16:51.800333  280527 round_trippers.go:469] Request Headers:
I0111 11:16:51.800342  280527 round_trippers.go:473]     Accept: application/json, */*
I0111 11:16:51.800350  280527 round_trippers.go:473]     Content-Type: application/merge-patch+json
I0111 11:16:51.800360  280527 round_trippers.go:473]     User-Agent: kubectl/v1.27.2 (linux/amd64) kubernetes/7f6f68f
I0111 11:16:51.807862  280527 round_trippers.go:574] Response Status: 200 OK in 7 milliseconds
I0111 11:16:51.807885  280527 round_trippers.go:577] Response Headers:
I0111 11:16:51.807897  280527 round_trippers.go:580]     Cache-Control: no-cache, private
I0111 11:16:51.807913  280527 round_trippers.go:580]     Content-Type: application/json
I0111 11:16:51.807925  280527 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: 821802c2-752e-4e08-9e23-42554cf7e27a
I0111 11:16:51.807939  280527 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: 9290f24e-fbc0-4d26-a16c-1b4404e1fcb1
I0111 11:16:51.807952  280527 round_trippers.go:580]     Content-Length: 304
I0111 11:16:51.807965  280527 round_trippers.go:580]     Date: Thu, 11 Jan 2024 03:16:51 GMT
I0111 11:16:51.807979  280527 round_trippers.go:580]     Audit-Id: 168328e6-fb44-48d0-a848-7c45de20acbf
I0111 11:16:51.808008  280527 request.go:1188] Response Body: {"kind":"Scale","apiVersion":"autoscaling/v1","metadata":{"name":"testreplicaset","namespace":"default","uid":"ff1dfaa6-044f-487e-a5ec-fcab48248d3d","resourceVersion":"107985186","creationTimestamp":"2024-01-11T02:53:45Z"},"spec":{"replicas":3},"status":{"replicas":2,"selector":"kubevirt.io/vm=myvmi"}}
virtualmachineinstancereplicaset.kubevirt.io/testreplicaset scaled

kubectl 调用的 API 路径为 /xxx/testreplicaset/scale,结合 HPA 官方文档 https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work:

The HorizontalPodAutoscaler controller accesses corresponding workload resources that support scaling (such as Deployments and StatefulSet). These resources each have a subresource named scale, an interface that allows you to dynamically set the number of replicas and examine each of their current states.

和 Deployment 还有 StatefulSet 一样,VirtualMachineInstanceReplicaSet 有一个 scale 子资源,对于 Kubernetes 来说这个子资源实现了 kubectl scale 命令调整副本数量的接口。

$ kubectl get crd virtualmachineinstancereplicasets.kubevirt.io -o yaml | grep -A 5 subresources
    subresources:
      scale:
        labelSelectorPath: .status.labelSelector
        specReplicasPath: .spec.replicas
        statusReplicasPath: .status.replicas
      status: {}
--
    subresources:
      scale:
        labelSelectorPath: .status.labelSelector
        specReplicasPath: .spec.replicas
        statusReplicasPath: .status.replicas
      status: {}

在 VirtualMachineInstanceReplicaSet 的 CRD 中确实定义了 scale 子资源,路径为 .spec.replicas,这就与上面的 PATCH API 请求体 {"spec":{"replicas":3}} 以及 VirtualMachineInstanceReplicaSet 本身的定义对应起来了:

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceReplicaSet
metadata:
  name: testreplicaset
spec:
  replicas: 3

我们继续来看 kube-apiserver 接收到请求后的处理:

https://github.com/kubernetes/kubernetes/blob/a1f97a35fcb3835b6731455d34fcd0ae766bdb13/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go#L344

func (r *crdHandler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
    ctx := req.Context()
    requestInfo, ok := apirequest.RequestInfoFrom(ctx)
    if !ok {
        responsewriters.ErrorNegotiated(
            apierrors.NewInternalError(fmt.Errorf("no RequestInfo found in the context")),
            Codecs, schema.GroupVersion{}, w, req,
        )
        return
    }

    // a lot of code here
    subresource := requestInfo.Subresource

    // a lot of code here
    switch {
    case subresource == "status" && subresources != nil && subresources.Status != nil:
        handlerFunc = r.serveStatus(w, req, requestInfo, crdInfo, terminating, supportedTypes)
    case subresource == "scale" && subresources != nil && subresources.Scale != nil:
        handlerFunc = r.serveScale(w, req, requestInfo, crdInfo, terminating, supportedTypes)
    case len(subresource) == 0:
        handlerFunc = r.serveResource(w, req, requestInfo, crdInfo, crd, terminating, supportedTypes)
    }

    // a lof of code here
}

subresource 为 scale,来到 serveScale 方法:

https://github.com/kubernetes/kubernetes/blob/a1f97a35fcb3835b6731455d34fcd0ae766bdb13/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go#L429-L447

func (r *crdHandler) serveScale(w http.ResponseWriter, req *http.Request, requestInfo *apirequest.RequestInfo, crdInfo *crdInfo, terminating bool, supportedTypes []string) http.HandlerFunc {
    // a lot of code here

    switch requestInfo.Verb {
    case "get":
        return handlers.GetResource(storage, requestScope)
    case "update":
        return handlers.UpdateResource(storage, requestScope, r.admission)
    case "patch":
        return handlers.PatchResource(storage, requestScope, r.admission, supportedTypes)
    }
}

PatchResource 方法中,apiserver 会直接去更新 CRD 资源 VirtualMachineInstanceReplicaSet(的 .spec.replicas 字段)。

接下来根据资源当前和期望指标去自动调整副本数量就交给 Kubernetes 原生的 HPA 功能了。

targetCPUUtilizationPercentage 仅存在于 v1 版本的 HorizontalPodAutoscaler

与 VirtualMachineInstanceReplicaSet 类似,KubeVirt 的 VirtualMachinePool 对象也以同样的方法实现了 HPA,就不再赘述。