自定义 JSON 反序列化搞定 KubeVirt 向前兼容
Jun 16, 2024 00:30 · 1235 words · 3 minute read
这是修复海光 CPU Windows libvirt 虚机蓝屏时衍生出的新问题。由于修改了 KubeVirt Domain 结构中的 CPU Model 定义,需要更新 virt-handler 与 virt-launcher 镜像。通过 virt-operator 来更新 KubeVirt 组件并不会自动重启现存的 VM,这些 VMI(virt-launcher Pod)会继续以原先镜像运行,因为更新系统不应该影响到正在运行的客户业务。
以下是 KubeVirt v1.0.0 版本中 CPU 的定义,其中 Model
成员是一个字符串:
type CPU struct {
Mode string `xml:"mode,attr,omitempty"`
Model string `xml:"model,omitempty"`
Features []CPUFeature `xml:"feature"`
Topology *CPUTopology `xml:"topology"`
NUMA *NUMA `xml:"numa,omitempty"`
}
我们根据 libvirt 文档修为:
type CPUModel struct {
Fallback string `xml:"fallback,attr,omitempty"`
VendorID string `xml:"vendor_id,attr,omitempty"`
Value string `xml:",chardata"`
}
type CPU struct {
Check string `xml:"check,attr,omitempty"`
Mode string `xml:"mode,attr,omitempty"`
Model *CPUModel `xml:"model,omitempty"`
Features []CPUFeature `xml:"feature"`
Topology *CPUTopology `xml:"topology"`
NUMA *NUMA `xml:"numa,omitempty"`
}
在更新了 virt-handler 镜像后,现存的 VM/VMI 出现了问题。查看 VMI 所在节点上 virt-handler 的日志:
{"component":"virt-handler","level":"error","msg":"unable to retrieve domain at socket //pods/c0467721-7b7b-4be0-b3fb-5fdef4708a30/volumes/kubernetes.io~empty-dir/sockets/launcher-sock during resync","pos":"cache.go:401","reason":"json: cannot unmarshal string into Go struct field CPU.Spec.CPU.Model of type api.CPUModel","timestamp":"2024-05-05T02:28:31.381264Z"}
{"component":"virt-handler","level":"error","msg":"error unmarshalling domain","pos":"client.go:581","reason":"json: cannot unmarshal string into Go struct field CPU.Spec.CPU.Model of type api.CPUModel","timestamp":"2024-05-05T02:28:31.385046Z"}
这是因为 virt-handler 中的 DomainWatcher 会定期通过 virt-launcher 中 cmd-server 的 GetDomain API 来获取虚机(Domain)的真实状态:
func (l *Launcher) GetDomain(_ context.Context, _ *cmdv1.EmptyRequest) (*cmdv1.DomainResponse, error) {
response := &cmdv1.DomainResponse{
Response: &cmdv1.Response{
Success: true,
},
}
list, err := l.domainManager.ListAllDomains()
if err != nil {
response.Response.Success = false
response.Response.Message = getErrorMessage(err)
return response, nil
}
if len(list) > 0 {
domainObj := list[0]
if osInfo := l.domainManager.GetGuestOSInfo(); osInfo != nil {
domainObj.Status.OSInfo = *osInfo
}
if interfaces := l.domainManager.InterfacesStatus(); interfaces != nil {
domainObj.Status.Interfaces = interfaces
}
if domain, err := json.Marshal(domainObj); err != nil {
log.Log.Reason(err).Errorf("Failed to marshal domain")
response.Response.Success = false
response.Response.Message = getErrorMessage(err)
return response, nil
} else {
response.Domain = string(domain)
}
}
return response, nil
}
virt-handler 通过 virt-launcher cmd-client 来与 virt-launcher 中的 cmd-server 通信,实现对 Domain 的读取与控制(冻结、软重启实例等等),这部分暂不赘述。
virt-handler https://github.com/kubevirt/kubevirt/blob/25b37338917f1a7bfbe3fc07d87672ef1e39389c/pkg/virt-handler/cache/cache.go#L377-L410:
func (d *DomainWatcher) handleResync() {
socketFiles, err := listSockets()
if err != nil {
log.Log.Reason(err).Error("failed to list sockets")
return
}
log.Log.Infof("resyncing virt-launcher domains")
for _, socket := range socketFiles {
client, err := cmdclient.NewClient(socket)
if err != nil {
log.Log.Reason(err).Error("failed to connect to cmd client socket during resync")
// Ignore failure to connect to client.
// These are all local connections via unix socket.
// A failure to connect means there's nothing on the other
// end listening.
continue
}
defer client.Close()
domain, exists, err := client.GetDomain()
if err != nil {
// this resync is best effort only.
log.Log.Reason(err).Errorf("unable to retrieve domain at socket %s during resync", socket)
continue
} else if !exists {
// nothing to sync if it doesn't exist
continue
}
d.eventChan <- watch.Event{Type: watch.Modified, Object: domain}
}
}
因为未重启的 virt-launcher 中的 CPU Model 仍为字符串类型,而 virt-handler 中已经更新为自定义的结构体,两边匹配不上,导致 Golang JSON 反序列化失败 json: cannot unmarshal string into Go struct field CPU.Spec.CPU.Model of type api.CPUModel
。
这时我们就可以利用 Golang 的自定义 JSON Unmarshal 来处理该类问题,为 CPUModel
结构自定义 UnmarshalJSON
方法:
type CPUModel struct {
Fallback string `xml:"fallback,attr,omitempty"`
VendorID string `xml:"vendor_id,attr,omitempty"`
Value string `xml:",chardata"`
}
// Compatible with the original string.
func (m *CPUModel) UnmarshalJSON(data []byte) (err error) {
if str := string(data); strings.Contains(str, `"Fallback":`) || strings.Contains(str, `"VendorID":`) || strings.Contains(str, `"Value":`) {
type _cpumodel CPUModel
tmp := &_cpumodel{}
if err = json.Unmarshal(data, tmp); err != nil {
return
}
m.Fallback = tmp.Fallback
m.VendorID = tmp.VendorID
m.Value = tmp.Value
} else {
m.Value = str
}
return
}
检测 JSON 字符串中的关键字:
- 如果存在
Fallback
、VendorID
、Value
等字样,virt-launcher 已是最新(两边一致),直接使用最新的CPUModel
来反序列化是没问题的。至于为什么要在内部另外定义一个新类型_cpumodel
请读者自行思考,或者亲自动手尝试一下便知。 - 如果都不存在,那么 virt-launcher 还是原先的(两边不一致),将字符串填充至
Value
成员。
如此保证了 virt-handler 的向前兼容性,不需要重启存量的虚机(重建 virt-launcher Pod 来使用最新的镜像)。
virt-operator 并不会向 Kubernetes 注册 Domain CRD,虽然它是遵循 Kubernetes API 规范来定义的:
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
type Domain struct {
metav1.TypeMeta
metav1.ObjectMeta `json:"ObjectMeta"`
Spec DomainSpec
Status DomainStatus
}
但在 virt-handler 的控制器中确实存在着 Domain 的 EventHandler,曾经有同学表示对这个感兴趣,之后我会单独开一篇讲讲 virt-handler 如何为 Domain API 实现一个不一般的 Informer。