Go Map

Feb 27, 2022 11:30 · 4992 words · 10 minute read Golang

One of the most useful data structures in computer science is the hash table. Many hash table implementations exist with varying properties, but in general they offer fast lookups, adds, and deletes. Go provides a built-in map type that implements a hash table.

哈希表是最高效的数据结构之一(平均时间复杂度O(1)),而 Go 内置的 map 结构基于哈希表,并使用链表解决哈希冲突。

在 Go 中 map 的底层实现是 hmap 结构体:

type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

    extra *mapextra // optional fields
}
  • count map 存储的元素个数,len() 函数会直接返回此值
  • B 哈希桶的数量为 2^B 个
  • noverflow 溢出哈希桶数量的近似数
  • hash0 哈希种子,为哈希算法引入随机性
  • buckets 指向哈希桶数组的指针,一共有 2^B 个数组元素也就是 2^B 个哈希桶
  • oldbuckets 扩容会导致哈希桶数组长度翻倍,扩容前的 buckets 指针将保存于此
  • flags

我们先来看 bucket(哈希桶)的底层实现 bmap

const (
    // Maximum number of key/elem pairs a bucket can hold.
    bucketCntBits = 3
    bucketCnt     = 1 << bucketCntBits
)

type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // NOTE: packing all the keys together and then all the elems together makes the
    // code a bit more complicated than alternating key/elem/key/elem/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}
  • tophash 是一个长度为 8 的 uint8 数组,每个元素都是这个哈希桶内所有 key 的哈希值的首字节

运行期间的 bmap 结构会包含更多的成员,因为哈希桶要存储数据。在 Go 编译时,会重写 bmap 结构,根据 bmap 函数推断出来增强后的 bmap 结构:

type bmap struct {
    topbits  [8]uint8
    keys     [8]keytype
    elems    [8]elemtype
    overflow uintptr
}
  • topbits 和 tophash 一样,存放哈希桶内所有 key 的哈希值首字节,最多 8 个
  • keys 所有的 key 连续存放
  • elems 所有与 key 对应的值,在 Go 底层被实现为 element
  • overflow 指向溢出桶,处理哈希冲突

创建 map

我们在写代码时通过内置的 make 函数来创建 map:

m1 := make(map[string]int) // makemap_small
m2 := make(map[string]int, 8) // makemap

不提供 hint 的 make 或者 hint <= 8 时,编译器会使用 makemap_small 初始化;hint > 8 时,make(map[k]v, hint) 语句在编译期间会被转换为 makemap

  • makemap_small

    func makemap_small() *hmap {
        h := new(hmap)
        h.hash0 = fastrand()
        return h
    }
    
  • makemap

    // maxAlloc is the maximum size of an allocation. On 64-bit,
    // it's theoretically possible to allocate 1<<heapAddrBits bytes. On
    // 32-bit, however, this is one less than 1<<32 because the
    // number of bytes in the address space doesn't actually fit
    // in a uintptr.
    const (
        maxAlloc = (1 << heapAddrBits) - (1-_64bit)*1
    )
    
    func makemap(t *maptype, hint int, h *hmap) *hmap {
        mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
        if overflow || mem > maxAlloc {
            hint = 0
        }
    
        // initialize Hmap
        if h == nil {
            h = new(hmap)
        }
        h.hash0 = fastrand()
    
        // Find the size parameter B which will hold the requested # of elements.
        // For hint < 0 overLoadFactor returns false since hint < bucketCnt.
        B := uint8(0)
        for overLoadFactor(hint, B) {
            B++
        }
        h.B = B
    
        // allocate initial hash table
        // if B == 0, the buckets field is allocated lazily later (in mapassign)
        // If hint is large zeroing this memory could take a while.
        if h.B != 0 {
            var nextOverflow *bmap
            h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
            if nextOverflow != nil {
                h.extra = new(mapextra)
                h.extra.nextOverflow = nextOverflow
            }
        }
    
        return h
    }
    

    makemap 函数做了这些操作:

    1. 初始化 hmap 结构
    2. 设置哈希种子 hash0
    3. 找到一个最小的 B 值使得 2^B 刚好大于 hint(make 时指定的容量),哈希桶数组就可以容纳下所有的元素
    4. 如果 B 不等于 0,makeBucketArray 函数初始化长度为 2^B 的哈希桶数组;反之则在 map 赋值时再分配

编译器判定使用哪种 makemap:https://github.com/golang/go/blob/go1.16.10/src/cmd/compile/internal/gc/walk.go#L1205-L1315

哈希计算

在 Go runtime 调度器初始化时 https://github.com/golang/go/blob/go1.16.10/src/runtime/proc.go#L592-L687

func schedinit() {
    cpuinit()       // must run before alginit
    alginit()       // maps must not be used before this call
}

alginit 函数会根据 cpuinit 函数解析得到的 CPU 指令集的支持情况选择合适的哈希算法 https://github.com/golang/go/blob/go1.16.10/src/runtime/proc.go#L328-L352

func alginit() {
    // Install AES hash algorithms if the instructions needed are present.
    if (GOARCH == "386" || GOARCH == "amd64") &&
        cpu.X86.HasAES && // AESENC
        cpu.X86.HasSSSE3 && // PSHUFB
        cpu.X86.HasSSE41 { // PINSR{D,Q}
        initAlgAES()
        return
    }
    if GOARCH == "arm64" && cpu.ARM64.HasAES {
        initAlgAES()
        return
    }
    getRandomData((*[len(hashkey) * sys.PtrSize]byte)(unsafe.Pointer(&hashkey))[:])
    hashkey[0] |= 1 // make sure these numbers are odd
    hashkey[1] |= 1
    hashkey[2] |= 1
    hashkey[3] |= 1
}

func initAlgAES() {
    useAeshash = true
    // Initialize with random data so hash collisions will be hard to engineer.
    getRandomData(aeskeysched[:])
}

在指令集支持良好的情况下,amd64 平台下 Go runtime 会使用 AES 哈希算法,并使用随机数初始化以减小哈希冲突概率。

string 类型的 key 的哈希函数 strhash

func strhash(p unsafe.Pointer, h uintptr) uintptr

真正的实现在汇编代码 https://github.com/golang/go/blob/go1.16.10/src/runtime/asm_amd64.go#L895-L905 中,具体实现也会因 CPU 架构而异:

// func strhash(p unsafe.Pointer, h uintptr) uintptr
TEXT runtime·strhash(SB),NOSPLIT,$0-24
    CMPB runtime·useAeshash(SB), $0
    JEQ noaes
    MOVQ p+0(FP), AX // ptr to string struct
    MOVQ 8(AX), CX // length of string
    MOVQ (AX), AX // string data
    LEAQ ret+16(FP), DX
    JMP aeshashbody<>(SB)
noaes:
    JMP runtime·strhashFallback(SB)

某些类型还有特定的哈希函数来优化计算效率:

map 访问

k1 := m["k1"] // mapaccess1
_, prs := m["k2"] // mapaccess2

key 经过哈希计算后得到一个 64 位的值,

在编译时,按 key 访问 map 中其对应的值,都会被转换为 mapaccess1(单返回值)或 mapaccess2(双返回值):

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    // a lot of code here
    hash := t.hasher(key, uintptr(h.hash0))
    m := bucketMask(h.B)
    b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
    if c := h.oldbuckets; c != nil {
        if !h.sameSizeGrow() {
            // There used to be half as many buckets; mask down one more power of two.
            m >>= 1
        }
        oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
        if !evacuated(oldb) {
            b = oldb
        }
    }
    top := tophash(hash)
bucketloop:
    for ; b != nil; b = b.overflow(t) {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                if b.tophash[i] == emptyRest {
                    break bucketloop
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey() {
                k = *((*unsafe.Pointer)(k))
            }
            if t.key.equal(key, k) {
                e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                if t.indirectelem() {
                    e = *((*unsafe.Pointer)(e))
                }
                return e
            }
        }
    }
    return unsafe.Pointer(&zeroVal[0])
}
  1. t.hasher 对当前 key 进行哈希计算
  2. bucketMask 计算掩码,掩码和哈希值相与得到哈希桶序号,进一步得到哈希桶的位置
  3. tophash 计算哈希值的首字节
  4. 首先查看哈希桶的 tophash 数组中是否存在上述计算出的哈希值的首字节
  5. 如果存在,通过偏移量获取哈希桶中存储的 key[N]
  6. 如果与 key 相同,就返回对应值 value[N]
  7. 如果没找到,就返回零值

如果存在哈希冲突,那将遍历哈希桶链表 b = b.overflow(t)

mapaccess2mapaccess1 差不多,就不说了。

编译器还会根据 key 的类型替换成相应的 mapaccess1_xxx、mapaccess2_xxx 函数实现优化:

map 写入

m["k1"] = 7

在编译期间,对 map 按 key 赋值会被转换为 mapassign 函数:

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
    // a lot of code here
    hash := t.hasher(key, uintptr(h.hash0))

    // Set hashWriting after calling t.hasher, since t.hasher may panic,
    // in which case we have not actually done a write.
    h.flags ^= hashWriting

    if h.buckets == nil {
        h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
    }

again:
    bucket := hash & bucketMask(h.B)
    if h.growing() {
        growWork(t, h, bucket)
    }
    b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))
    top := tophash(hash)

    var inserti *uint8
    var insertk unsafe.Pointer
    var elem unsafe.Pointer
bucketloop:
    for {
        for i := uintptr(0); i < bucketCnt; i++ {
            if b.tophash[i] != top {
                if isEmpty(b.tophash[i]) && inserti == nil {
                    inserti = &b.tophash[i]
                    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
                    elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
                }
                if b.tophash[i] == emptyRest {
                    break bucketloop
                }
                continue
            }
            k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
            if t.indirectkey() {
                k = *((*unsafe.Pointer)(k))
            }
            if !t.key.equal(key, k) {
                continue
            }
            // already have a mapping for key. Update it.
            if t.needkeyupdate() {
                typedmemmove(t.key, k, key)
            }
            elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
            goto done
        }
        ovf := b.overflow(t)
        if ovf == nil {
            break
        }
        b = ovf
    }

    // Did not find mapping for key. Allocate new cell & add entry.

    // If we hit the max load factor or we have too many overflow buckets,
    // and we're not already in the middle of growing, start growing.
    if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
        hashGrow(t, h)
        goto again // Growing the table invalidates everything, so try again
    }

    if inserti == nil {
        // The current bucket and all the overflow buckets connected to it are full, allocate a new one.
        newb := h.newoverflow(t, b)
        inserti = &newb.tophash[0]
        insertk = add(unsafe.Pointer(newb), dataOffset)
        elem = add(insertk, bucketCnt*uintptr(t.keysize))
    }

    // store new key/elem at insert position
    if t.indirectkey() {
        kmem := newobject(t.key)
        *(*unsafe.Pointer)(insertk) = kmem
        insertk = kmem
    }
    if t.indirectelem() {
        vmem := newobject(t.elem)
        *(*unsafe.Pointer)(elem) = vmem
    }
    typedmemmove(t.key, insertk, key)
    *inserti = top
    h.count++
}
  1. t.hasher 对当前 key 进行哈希计算
  2. 如果先前未初始化哈希桶数组,就直接在此分配,长度为 1(此时 B 为 0)
  3. 掩码和哈希值相与得到哈希桶序号,进一步得到哈希桶位置
  4. tophash 计算哈希值的首字节
  5. 遍历哈希桶的 tophash 数组
  6. 如果数组中某个元素与哈希值的首字节相等,将键值对写入对应的地址(基地址 + 偏移量)
  7. 如果当前哈希桶满了(insertinil),newoverflow 创建一个新的哈希桶,然后挂到原哈希桶的 overflow 指针上,并自增 hmap 对象的溢出哈希桶计数器 noverflow

扩容

在 map 写入时可能造成 hmap 扩容 https://github.com/golang/go/blob/go1.16.10/src/runtime/map.go#L647-L650

// If we hit the max load factor or we have too many overflow buckets,
// and we're not already in the middle of growing, start growing.
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
    hashGrow(t, h)
    goto again // Growing the table invalidates everything, so try again
}

以下两种情况会触发哈希表扩容:

  1. 装载因子超过阈值 6.5(很多文章中都提到)

    https://github.com/golang/go/blob/go1.16.10/src/runtime/map.go#L1069-L1072

    func overLoadFactor(count int, B uint8) bool {
        return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
    }
    

    6.5 怎么来的呢?loadFactorNum / loadFactorDen = 6.5

  2. 使用了太多溢出桶(哈希冲突严重)

    https://github.com/golang/go/blob/go1.16.10/src/runtime/map.go#L1074-L1087

    // tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
    // Note that most of these overflow buckets must be in sparse use;
    // if use was dense, then we'd have already triggered regular map growth.
    func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
        // If the threshold is too low, we do extraneous work.
        // If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.
        // "too many" means (approximately) as many overflow buckets as regular buckets.
        // See incrnoverflow for more details.
        if B > 15 {
            B = 15
        }
        // The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
        return noverflow >= uint16(1)<<(B&15)
    }
    

总之哈希表的性能会严重下降。

当判定当前 hmap 需要扩容,首先会调用 hashGrow 函数

func hashGrow(t *maptype, h *hmap) {
    // If we've hit the load factor, get bigger.
    // Otherwise, there are too many overflow buckets,
    // so keep the same number of buckets and "grow" laterally.
    bigger := uint8(1)
    if !overLoadFactor(h.count+1, h.B) {
        bigger = 0
        h.flags |= sameSizeGrow
    }
    oldbuckets := h.buckets
    newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

    flags := h.flags &^ (iterator | oldIterator)
    if h.flags&iterator != 0 {
        flags |= oldIterator
    }
    // commit the grow (atomic wrt gc)
    h.B += bigger
    h.flags = flags
    h.oldbuckets = oldbuckets
    h.buckets = newbuckets
    h.nevacuate = 0
    h.noverflow = 0

    if h.extra != nil && h.extra.overflow != nil {
        // Promote current overflow buckets to the old generation.
        if h.extra.oldoverflow != nil {
            throw("oldoverflow is not nil")
        }
        h.extra.oldoverflow = h.extra.overflow
        h.extra.overflow = nil
    }
    if nextOverflow != nil {
        if h.extra == nil {
            h.extra = new(mapextra)
        }
        h.extra.nextOverflow = nextOverflow
    }

    // the actual copying of the hash table data is done incrementally
    // by growWork() and evacuate().
}
  • 如果装载因子超过阈值,h.B 将 +1,也就是说哈希桶的数量(2^B)翻倍。
  • 否则就是溢出桶数量过多导致的扩容,这种情况叫等量扩容 sameSizeGrow,哈希桶数组长度不会翻倍。makeBucketArray 创建一个新的哈希桶数组,将原来哈希桶数组挂到 oldbuckets 成员,并将新建的哈希桶数组挂到 buckets 指针。
  • 更新 h.flags

真正的数据迁移由 growWorkevacuate 函数完成:

  • growWork

    func growWork(t *maptype, h *hmap, bucket uintptr) {
        // make sure we evacuate the oldbucket corresponding
        // to the bucket we're about to use
        evacuate(t, h, bucket&h.oldbucketmask())
    
        // evacuate one more oldbucket to make progress on growing
        if h.growing() {
            evacuate(t, h, h.nevacuate)
        }
    }
    
  • evacuate

    func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
        // 定位老的哈希桶数组位置
        b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))
        newbit := h.noldbuckets()
        // 如果未迁移过
        if !evacuated(b) {
            // TODO: reuse overflow buckets instead of using new ones, if there
            // is no iterator using the old buckets.  (If !oldIterator.)
    
            // xy contains the x and y (low and high) evacuation destinations.
            var xy [2]evacDst
            x := &xy[0]
            // 默认等量扩容,使用 x 进行迁移
            x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.bucketsize)))
            x.k = add(unsafe.Pointer(x.b), dataOffset) // key
            x.e = add(x.k, bucketCnt*uintptr(t.keysize)) // elem
    
            // 非等量扩容
            if !h.sameSizeGrow() {
                // Only calculate y pointers if we're growing bigger.
                // Otherwise GC can see bad pointers.
                y := &xy[1]
                // 使用 y 进行迁移
                y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.bucketsize)))
                y.k = add(unsafe.Pointer(y.b), dataOffset) // key
                y.e = add(y.k, bucketCnt*uintptr(t.keysize)) // elem
            }
    
            // 遍历哈希桶,包括溢出桶
            for ; b != nil; b = b.overflow(t) {
                k := add(unsafe.Pointer(b), dataOffset)
                e := add(k, bucketCnt*uintptr(t.keysize))
    
                // 遍历每个哈希桶中的所有键值对
                for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.keysize)), add(e, uintptr(t.elemsize)) {
                    // 当前哈希桶的 tophash 数组
                    top := b.tophash[i]
                    // 判断是否已经迁移
                    if isEmpty(top) {
                        b.tophash[i] = evacuatedEmpty
                        continue
                    }
                    if top < minTopHash {
                        throw("bad map state")
                    }
                    k2 := k
                    // 如果 key 是指针,解引用
                    if t.indirectkey() {
                        k2 = *((*unsafe.Pointer)(k2))
                    }
                    var useY uint8
                    if !h.sameSizeGrow() {
                        // Compute hash to make our evacuation decision (whether we need
                        // to send this key/elem to bucket x or bucket y).
                        // 重新计算哈希值
                        hash := t.hasher(k2, uintptr(h.hash0))
                        // 如果有 goroutine 正在遍历 map
                        if h.flags&iterator != 0 && !t.reflexivekey() && !t.key.equal(k2, k2) {
                            // If key != key (NaNs), then the hash could be (and probably
                            // will be) entirely different from the old hash. Moreover,
                            // it isn't reproducible. Reproducibility is required in the
                            // presence of iterators, as our evacuation decision must
                            // match whatever decision the iterator made.
                            // Fortunately, we have the freedom to send these keys either
                            // way. Also, tophash is meaningless for these kinds of keys.
                            // We let the low bit of tophash drive the evacuation decision.
                            // We recompute a new random tophash for the next level so
                            // these keys will get evenly distributed across all buckets
                            // after multiple grows.
                            useY = top & 1
                            top = tophash(hash)
                        } else {
                            if hash&newbit != 0 {
                                useY = 1
                            }
                        }
                    }
    
                    if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
                        throw("bad evacuatedN")
                    }
    
                    b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
                    // 使用 X 还是 Y 迁移
                    dst := &xy[useY]                 // evacuation destination
    
                    // 发生哈希冲突,溢出了
                    if dst.i == bucketCnt {
                        // 创建新的溢出哈希桶
                        dst.b = h.newoverflow(t, dst.b)
                        dst.i = 0
                        dst.k = add(unsafe.Pointer(dst.b), dataOffset)
                        dst.e = add(dst.k, bucketCnt*uintptr(t.keysize))
                    }
                    dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
                    if t.indirectkey() {
                        // 将原来的 key(如果是指针)复制到新位置
                        *(*unsafe.Pointer)(dst.k) = k2 // copy pointer
                    } else {
                        // 将原来的 key(如果是值)复制到新位置
                        typedmemmove(t.key, dst.k, k) // copy elem
                    }
                    if t.indirectelem() {
                        // 将原来的值(如果是指针)复制到新位置
                        *(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
                    } else {
                        // 将原来的值(如果是值)复制到新位置
                        typedmemmove(t.elem, dst.e, e)
                    }
                    dst.i++
                    // These updates might push these pointers past the end of the
                    // key or elem arrays.  That's ok, as we have the overflow pointer
                    // at the end of the bucket to protect against pointing past the
                    // end of the bucket.
                    dst.k = add(dst.k, uintptr(t.keysize))
                    dst.e = add(dst.e, uintptr(t.elemsize))
                }
            }
            // Unlink the overflow buckets & clear key/elem to help GC.
            // 如果没有 goroutine 在使用原先的哈希桶,就清除掉
            if h.flags&oldIterator == 0 && t.bucket.ptrdata != 0 {
                b := add(h.oldbuckets, oldbucket*uintptr(t.bucketsize))
                // Preserve b.tophash because the evacuation
                // state is maintained there.
                ptr := add(b, dataOffset)
                n := uintptr(t.bucketsize) - dataOffset
                memclrHasPointers(ptr, n)
            }
        }
    
        if oldbucket == h.nevacuate {
            advanceEvacuationMark(h, t, newbit)
        }
    }
    

考虑到性能,Go hmap 采用类似 redis 的渐进式扩容策略,每次最多只迁移两个哈希桶。

如果只是等量扩容,只会使用 X 哈希桶进行迁移;如果哈希桶数组长度翻倍,那么同时使用 X(前一半)和 Y(后一半)桶迁移,是为了应对在扩容期间存在对 map 的读写而设计。所以扩容过程并不是原子的。

遍历

for k, v := range m1 {
    // do something
}

在编译时,遍历 map 会被转换为 mapiterinit 函数的调用:

func mapiterinit(t *maptype, h *hmap, it *hiter) {
    if raceenabled && h != nil {
        callerpc := getcallerpc()
        racereadpc(unsafe.Pointer(h), callerpc, funcPC(mapiterinit))
    }

    if h == nil || h.count == 0 {
        return
    }

    if unsafe.Sizeof(hiter{})/sys.PtrSize != 12 {
        throw("hash_iter size incorrect") // see cmd/compile/internal/gc/reflect.go
    }
    it.t = t
    it.h = h

    // grab snapshot of bucket state
    it.B = h.B
    it.buckets = h.buckets
    if t.bucket.ptrdata == 0 {
        // Allocate the current slice and remember pointers to both current and old.
        // This preserves all relevant overflow buckets alive even if
        // the table grows and/or overflow buckets are added to the table
        // while we are iterating.
        h.createOverflow()
        it.overflow = h.extra.overflow
        it.oldoverflow = h.extra.oldoverflow
    }

    // decide where to start
    r := uintptr(fastrand())
    if h.B > 31-bucketCntBits {
        r += uintptr(fastrand()) << 31
    }
    it.startBucket = r & bucketMask(h.B)
    it.offset = uint8(r >> h.B & (bucketCnt - 1))

    // iterator state
    it.bucket = it.startBucket

    // Remember we have an iterator.
    // Can run concurrently with another mapiterinit().
    if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {
        atomic.Or8(&h.flags, iterator|oldIterator)
    }

    mapiternext(it)
}

hiter 是迭代器结构:

type hiter struct {
    key         unsafe.Pointer // Must be in first position.  Write nil to indicate iteration end (see cmd/compile/internal/gc/range.go).
    elem        unsafe.Pointer // Must be in second position (see cmd/compile/internal/gc/range.go).
    t           *maptype
    h           *hmap
    buckets     unsafe.Pointer // bucket ptr at hash_iter initialization time
    bptr        *bmap          // current bucket
    overflow    *[]*bmap       // keeps overflow buckets of hmap.buckets alive
    oldoverflow *[]*bmap       // keeps overflow buckets of hmap.oldbuckets alive
    startBucket uintptr        // bucket iteration started at
    offset      uint8          // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1)
    wrapped     bool           // already wrapped around from end of bucket array to beginning
    B           uint8
    i           uint8
    bucket      uintptr
    checkBucket uintptr
}

mapiterinit 函数的主要工作是初始化一个迭代器 hiter,调用 mapiternext 函数进行迭代。

为什么 map 遍历是无序的,因为在迁移后键值对的位置就发生了变化。

https://github.com/golang/go/blob/go1.16.10/src/runtime/map.go#L831-L837

    // decide where to start
    r := uintptr(fastrand())
    if h.B > 31-bucketCntBits {
        r += uintptr(fastrand()) << 31
    }
    it.startBucket = r & bucketMask(h.B)
    it.offset = uint8(r >> h.B & (bucketCnt - 1))

Go 为了杜绝偶然性,甚至在遍历 map 时从一个随机序号的哈希桶开始,而且对哈希桶内元素的遍历也是随机的。