CPUID

Jun 20, 2024 23:30 · 1496 words · 3 minute read Linux Golang

Linux 内核在初始化时是如何拿到 CPU 硬件信息(厂商、物理核数量等等)的呢?我们这次从一个 Golang 三方库 klauspost/cpuid 下手来逐步探索。

Golang

package main

import (
    "fmt"

    . "github.com/klauspost/cpuid/v2"
)

func main() {
    fmt.Println("Name:", CPU.BrandName)
    fmt.Println("PhysicalCores:", CPU.PhysicalCores)
    fmt.Println("ThreadsPerCore:", CPU.ThreadsPerCore)
}

执行以上代码片段:

$ go run main.go
Name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
PhysicalCores: 18
ThreadsPerCore: 2

找到 CPU 变量 https://github.com/klauspost/cpuid/blob/95e7626938069ea64e5c91ca2fe36945786fead9/cpuid.go#L331-L335

// CPUInfo contains information about the detected system CPU.
type CPUInfo struct {
    BrandName      string  // Brand name reported by the CPU
    VendorID       Vendor  // Comparable CPU vendor ID
    VendorString   string  // Raw vendor string.
    featureSet     flagSet // Features of the CPU
    PhysicalCores  int     // Number of physical processor cores in your CPU. Will be 0 if undetectable.
    ThreadsPerCore int     // Number of threads per physical core. Will be 1 if undetectable.
    LogicalCores   int     // Number of physical cores times threads that can run on each core through the use of hyperthreading. Will be 0 if undetectable.
    Family         int     // CPU family number
    Model          int     // CPU model number
    Stepping       int     // CPU stepping info
    CacheLine      int     // Cache line size in bytes. Will be 0 if undetectable.
    Hz             int64   // Clock speed, if known, 0 otherwise. Will attempt to contain base clock speed.
    BoostFreq      int64   // Max clock speed, if known, 0 otherwise
    Cache          struct {
        L1I int // L1 Instruction Cache (per core or shared). Will be -1 if undetected
        L1D int // L1 Data Cache (per core or shared). Will be -1 if undetected
        L2  int // L2 Cache (per core or shared). Will be -1 if undetected
        L3  int // L3 Cache (per core, per ccx or shared). Will be -1 if undetected
    }
    SGX              SGXSupport
    AMDMemEncryption AMDMemEncryptionSupport
    AVX10Level       uint8
    maxFunc          uint32
    maxExFunc        uint32
}

var CPU CPUInfo

README 中各种 CPU.XXX 字段的定义就在 CPUInfo 结构中,重点看这个变量是如何被赋值的(初始化)。

首先看这个项目的布局:

$ ll | grep .go
-rw-r--r-- 1 root root  52K Jun 20 09:37 cpuid.go
-rw-r--r-- 1 root root 7.9K Apr 30 23:26 cpuid_test.go
-rw-r--r-- 1 root root 9.8K Apr 30 23:26 detect_arm64.go
-rw-r--r-- 1 root root  529 Apr 30 23:26 detect_ref.go
-rw-r--r-- 1 root root 1.3K Apr 30 23:26 detect_x86.go
-rw-r--r-- 1 root root 8.7K Jun 20 09:37 featureid_string.go
-rw-r--r-- 1 root root   79 Apr 30 23:26 go.mod
-rw-r--r-- 1 root root  358 Apr 30 23:26 go.sum
-rw-r--r-- 1 root root 5.7K Apr 30 23:26 mockcpu_test.go
-rw-r--r-- 1 root root 3.8K Apr 30 23:26 os_darwin_arm64.go
-rw-r--r-- 1 root root 1.4K Apr 30 23:26 os_darwin_test.go
-rw-r--r-- 1 root root 3.9K Apr 30 23:26 os_linux_arm64.go
-rw-r--r-- 1 root root  367 Apr 30 23:26 os_other_arm64.go
-rw-r--r-- 1 root root  151 Apr 30 23:26 os_safe_linux_arm64.go
-rw-r--r-- 1 root root  237 Apr 30 23:26 os_unsafe_linux_arm64.go

在 cpuid.go 文件中:

func init() {
    initCPU()
    Detect()
}

其中 initCPU 方法,在各个 CPU 架构中均有不同的实现(Go 条件编译/编译约束):

x86_64 CPU 架构的 asmCpuid 方法由汇编实现:

// func asmCpuid(op uint32) (eax, ebx, ecx, edx uint32)
TEXT ·asmCpuid(SB), 7, $0
    XORL CX, CX
    MOVL op+0(FP), AX
    CPUID
    MOVL AX, eax+4(FP)
    MOVL BX, ebx+8(FP)
    MOVL CX, ecx+12(FP)
    MOVL DX, edx+16(FP)
    RET

这里我们就能看出来 CPUID 也是 CPU 的一条指令,也就是说硬件上直接就支持了这种获取 CPU 信息的方法。

再看看一下 CPUInfo 结构的 PhysicalCores 字段是如何填充的:

https://github.com/klauspost/cpuid/blob/f89c8c58bdd5348f54ac22d0d58cf797c35bdc2b/detect_x86.go#L33

func addInfo(c *CPUInfo, safe bool) {
    // a lot of code
    c.PhysicalCores = physicalCores()
    c.VendorID, c.VendorString = vendorID()
    c.AVX10Level = c.supportAVX10()
    c.cacheSize()
    c.frequencies()
}

https://github.com/klauspost/cpuid/blob/95e7626938069ea64e5c91ca2fe36945786fead9/cpuid.go#L847-L868

func logicalCores() int {
    mfi := maxFunctionID()
    v, _ := vendorID()
    switch v {
    case Intel:
        // Use this on old Intel processors
        if mfi < 0xb {
            if mfi < 1 {
                return 0
            }
            // CPUID.1:EBX[23:16] represents the maximum number of addressable IDs (initial APIC ID)
            // that can be assigned to logical processors in a physical package.
            // The value may not be the same as the number of logical processors that are present in the hardware of a physical package.
            _, ebx, _, _ := cpuid(1)
            logical := (ebx >> 16) & 0xff
            return int(logical)
        }
        _, b, _, _ := cpuidex(0xb, 1)
        return int(b & 0xffff)
    case AMD, Hygon:
        _, b, _, _ := cpuid(1)
        return int((b >> 16) & 0xff)
    default:
        return 0
    }
}

func physicalCores() int {
    v, _ := vendorID()
    switch v {
    case Intel:
        return logicalCores() / threadsPerCore()
    case AMD, Hygon:
        lc := logicalCores()
        tpc := threadsPerCore()
        if lc > 0 && tpc > 0 {
            return lc / tpc
        }

        // a lot of code here
    }
    return 0
}

还是调用了 cpuid 函数,而它在 x86_64 CPU 上的实现就是汇编代码 asmCpuid

Linux

有了 Golang 三方库通过 CPUID 查询 CPU 信息的经验,我们回到正题Linux 内核在初始化时是如何拿到 CPU 信息的:

https://github.com/torvalds/linux/blob/97873a3daf611594a7f92cc88bd8c5c8c526e1a3/arch/x86/boot/cpucheck.c#L102-L197

int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr)
{
    // a lot of code here
     else if (err == 0x01 && is_transmeta()) {
        /* Transmeta might have masked feature bits in word 0 */

        u32 ecx = 0x80860004;
        u32 eax, edx;
        u32 level = 1;

        asm("rdmsr" : "=a" (eax), "=d" (edx) : "c" (ecx));
        asm("wrmsr" : : "a" (~0), "d" (edx), "c" (ecx));
        asm("cpuid"
            : "+a" (level), "=d" (cpu.flags[0])
            : : "ecx", "ebx");
        asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));

        err = check_cpuflags();
    }
    // a lot of code here
}

看到了“熟悉”的汇编指令。而 check_cpu 函数由 validate_cpu 函数调用:

int validate_cpu(void)
{
    u32 *err_flags;
    int cpu_level, req_level;

    check_cpu(&cpu_level, &req_level, &err_flags);

    if (cpu_level < req_level) {
        printf("This kernel requires an %s CPU, ",
               cpu_name(req_level));
        printf("but only detected an %s CPU.\n",
               cpu_name(cpu_level));
        return -1;
    }

    // a lot of code here
}

validate_cpu 函数则由入口函数 main 调用:

void main(void)
{
    /* First, copy the boot header into the "zeropage" */
    copy_boot_params();

    /* Initialize the early-boot console */
    console_init();
    if (cmdline_find_option_bool("debug"))
        puts("early console in setup code\n");

    /* End of heap check */
    init_heap();

    /* Make sure we have all the proper CPU support */
    if (validate_cpu()) {
        puts("Unable to boot - please use a kernel appropriate "
             "for your CPU.\n");
        die();
    }

    // a lot of core here
}

当 GRUB 引导至 Linux 内核时,就先开始执行上面的入口函数,初始化内核的运行环境。最后 jmpl *%eax 跳转到内核入口 start_kernel