KubeVirt 虚拟机 HPA 原理
Jan 11, 2024 21:30 · 1231 words · 3 minute read
本文中 KubeVirt 版本为 v1.0.0。
使用
KubeVirt 支持虚拟机自动水平扩缩(HPA):根据 CPU 使用率等指标自动调整实例的数量。
类似 Deployment,首先要创建 VirtualMachineInstanceReplicaSet 对象来管理多个副本的实例(VMI):
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceReplicaSet
metadata:
name: testreplicaset
spec:
replicas: 3
selector:
matchLabels:
kubevirt.io/vm: myvmi
template:
metadata:
name: test
labels:
kubevirt.io/vm: myvmi
spec:
domain:
cpu:
sockets: 2
cores: 1
threads: 1
memory:
guest: 1Gi
devices:
disks:
- disk:
bus: virtio
name: bootvol
- disk:
bus: virtio
name: cloudinitdisk
resources:
requests:
cpu: 2
memory: 1Gi
volumes:
- name: bootvol
containerDisk:
image: crazytaxii/fedora36:latest
- name: cloudinitdisk
cloudInitNoCloud:
userData: |-
#cloud-config
user: root
password: atomic
ssh_pwauth: True
chpasswd: { expire: False }
和 Deployment 中包含 Pod 模板一样,VirtualMachineInstanceReplicaSet 中包含 VMI 模板。
$ kubectl get vmi -l "kubevirt.io/vm=myvmi"
NAME AGE PHASE IP NODENAME READY
test8d4px 2m54s Running 172.10.3.22 node172 True
test8n5rb 2m54s Running 172.10.3.12 node173 True
test9jhrj 2m54s Running 172.10.3.43 node172 True
virt-contorller 监听 VirtualMachineInstanceReplicaSet 对象后创建 3 副本数量的 VMI。
通过 kubectl scale
命令将 VMI 副本数调整至 2 个:
$ kubectl scale vmirs testreplicaset --replicas 2
virtualmachineinstancereplicaset.kubevirt.io/testreplicaset scaled
$ kubectl get vmirs testreplicaset -o jsonpath='{.spec.replicas}'
2
$ kubectl get vmi -l "kubevirt.io/vm=myvmi"
NAME AGE PHASE IP NODENAME READY
test8n5rb 6m56s Running 172.10.3.12 node173 True
test9jhrj 6m56s Running 172.10.3.43 node172 True
创建 HorizontalPodAutoscaler 对象:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: test-hpa
spec:
scaleTargetRef:
kind: VirtualMachineInstanceReplicaSet
name: testreplicaset
apiVersion: kubevirt.io/v1
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 50
.spec.scaleTargetRef
字段中指向上面创建的 VirtualMachineInstanceReplicaSet 对象。
原理
首先来探究 Kubernetes 如何通过 kubectl scale
命令来调整 VirtualMachineInstanceReplicaSet 对象 .spec.replicas
字段。
$ kubectl scale vmirs testreplicaset --replicas 3 -v 8
# a log of log here
I0111 11:16:51.800286 280527 request.go:1188] Request Body: {"spec":{"replicas":3}}
I0111 11:16:51.800326 280527 round_trippers.go:463] PATCH https://vip.node173.com:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachineinstancereplicasets/testreplicaset/scale
I0111 11:16:51.800333 280527 round_trippers.go:469] Request Headers:
I0111 11:16:51.800342 280527 round_trippers.go:473] Accept: application/json, */*
I0111 11:16:51.800350 280527 round_trippers.go:473] Content-Type: application/merge-patch+json
I0111 11:16:51.800360 280527 round_trippers.go:473] User-Agent: kubectl/v1.27.2 (linux/amd64) kubernetes/7f6f68f
I0111 11:16:51.807862 280527 round_trippers.go:574] Response Status: 200 OK in 7 milliseconds
I0111 11:16:51.807885 280527 round_trippers.go:577] Response Headers:
I0111 11:16:51.807897 280527 round_trippers.go:580] Cache-Control: no-cache, private
I0111 11:16:51.807913 280527 round_trippers.go:580] Content-Type: application/json
I0111 11:16:51.807925 280527 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: 821802c2-752e-4e08-9e23-42554cf7e27a
I0111 11:16:51.807939 280527 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: 9290f24e-fbc0-4d26-a16c-1b4404e1fcb1
I0111 11:16:51.807952 280527 round_trippers.go:580] Content-Length: 304
I0111 11:16:51.807965 280527 round_trippers.go:580] Date: Thu, 11 Jan 2024 03:16:51 GMT
I0111 11:16:51.807979 280527 round_trippers.go:580] Audit-Id: 168328e6-fb44-48d0-a848-7c45de20acbf
I0111 11:16:51.808008 280527 request.go:1188] Response Body: {"kind":"Scale","apiVersion":"autoscaling/v1","metadata":{"name":"testreplicaset","namespace":"default","uid":"ff1dfaa6-044f-487e-a5ec-fcab48248d3d","resourceVersion":"107985186","creationTimestamp":"2024-01-11T02:53:45Z"},"spec":{"replicas":3},"status":{"replicas":2,"selector":"kubevirt.io/vm=myvmi"}}
virtualmachineinstancereplicaset.kubevirt.io/testreplicaset scaled
kubectl 调用的 API 路径为 /xxx/testreplicaset/scale,结合 HPA 官方文档 https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work:
The HorizontalPodAutoscaler controller accesses corresponding workload resources that support scaling (such as Deployments and StatefulSet). These resources each have a subresource named scale, an interface that allows you to dynamically set the number of replicas and examine each of their current states.
和 Deployment 还有 StatefulSet 一样,VirtualMachineInstanceReplicaSet 有一个 scale
子资源,对于 Kubernetes 来说这个子资源实现了 kubectl scale
命令调整副本数量的接口。
$ kubectl get crd virtualmachineinstancereplicasets.kubevirt.io -o yaml | grep -A 5 subresources
subresources:
scale:
labelSelectorPath: .status.labelSelector
specReplicasPath: .spec.replicas
statusReplicasPath: .status.replicas
status: {}
--
subresources:
scale:
labelSelectorPath: .status.labelSelector
specReplicasPath: .spec.replicas
statusReplicasPath: .status.replicas
status: {}
在 VirtualMachineInstanceReplicaSet 的 CRD 中确实定义了 scale
子资源,路径为 .spec.replicas
,这就与上面的 PATCH API 请求体 {"spec":{"replicas":3}}
以及 VirtualMachineInstanceReplicaSet 本身的定义对应起来了:
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceReplicaSet
metadata:
name: testreplicaset
spec:
replicas: 3
我们继续来看 kube-apiserver 接收到请求后的处理:
func (r *crdHandler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
ctx := req.Context()
requestInfo, ok := apirequest.RequestInfoFrom(ctx)
if !ok {
responsewriters.ErrorNegotiated(
apierrors.NewInternalError(fmt.Errorf("no RequestInfo found in the context")),
Codecs, schema.GroupVersion{}, w, req,
)
return
}
// a lot of code here
subresource := requestInfo.Subresource
// a lot of code here
switch {
case subresource == "status" && subresources != nil && subresources.Status != nil:
handlerFunc = r.serveStatus(w, req, requestInfo, crdInfo, terminating, supportedTypes)
case subresource == "scale" && subresources != nil && subresources.Scale != nil:
handlerFunc = r.serveScale(w, req, requestInfo, crdInfo, terminating, supportedTypes)
case len(subresource) == 0:
handlerFunc = r.serveResource(w, req, requestInfo, crdInfo, crd, terminating, supportedTypes)
}
// a lof of code here
}
subresource 为 scale
,来到 serveScale
方法:
func (r *crdHandler) serveScale(w http.ResponseWriter, req *http.Request, requestInfo *apirequest.RequestInfo, crdInfo *crdInfo, terminating bool, supportedTypes []string) http.HandlerFunc {
// a lot of code here
switch requestInfo.Verb {
case "get":
return handlers.GetResource(storage, requestScope)
case "update":
return handlers.UpdateResource(storage, requestScope, r.admission)
case "patch":
return handlers.PatchResource(storage, requestScope, r.admission, supportedTypes)
}
}
在 PatchResource
方法中,apiserver 会直接去更新 CRD 资源 VirtualMachineInstanceReplicaSet(的 .spec.replicas
字段)。
接下来根据资源当前和期望指标去自动调整副本数量就交给 Kubernetes 原生的 HPA 功能了。
targetCPUUtilizationPercentage
仅存在于 v1 版本的 HorizontalPodAutoscaler
与 VirtualMachineInstanceReplicaSet 类似,KubeVirt 的 VirtualMachinePool 对象也以同样的方法实现了 HPA,就不再赘述。