go 的棧為什麼在堆上
記憶體模型和垃圾回收和 go
的高併發特性是息息相關
go
的協程棧的作用:
協程的執行路徑
區域性變數:
方法內部宣告的區域性變數,如果只是在內部使用的話,會記錄在協程棧上
方法中的引數傳遞
函式返回值的傳遞
go
的協程棧位於 go
的堆記憶體上,棧記憶體的釋放也是透過 gc
來釋放的
go
的堆記憶體位於作業系統虛擬記憶體上(作業系統會為每個程序分配一個虛擬記憶體)
協程棧不夠用了咋辦
協程棧初始空間大概 2~4kb
,區域性變數太大,棧幀太多這兩種情況會造成協程棧空間不夠用
逃逸分析就是把一些放在協程棧上的變數放到堆上
不是所有的變數都會放在協程棧上:
棧幀回收之後,需要繼續使用的變數
本地變數太大,棧幀放不下
有三種逃逸:
指標逃逸
函式返回了物件的指標
空介面逃逸
如果函式的引數型別是
interface{}
(因為interface{}
型別的函式往往會使用反射)空介面作為引數不會導致實參逃逸,但方法中使用反射的手段就會導致實參逃逸
大變數逃逸
在
64
位的機器中,一般超過64k
的變數會逃逸
棧幀太多導致棧空間不夠用
協程在函式呼叫前會先判斷棧空間(morestack
)是否足夠,如果棧空間不夠,就需要進行擴容
棧空間擴容有兩種方式:
分段棧(
go1.13
之前)不夠了開闢一塊新的棧,但是如果下一個函式做完之後,又要回到老的棧幀上(之前開闢的會被釋放),又呼叫一個新的函式,就需要開闢一個新的棧幀,這個函式處理完之後,又要回到老的棧幀上(這塊新開闢的棧幀會被釋放)
優點:沒有空間浪費
缺點:棧指標會在不連續的空間來回跳轉
連續棧(
go1.14
之後)當棧空間不足時,擴容為原來的
2
倍當棧空間使用率不足原來的
1/4
時,縮容為原來的1/2
當棧空間不夠時,開闢一塊新的棧空間,把老的棧裡面的內容都複製過來
優點:空間一直連續
缺點:伸縮時開銷比較大
go 的堆記憶體結構
go
模仿了 TCmalloc
建立了堆記憶體架構
作業系統會給應用提供虛擬記憶體空間(不是 windows
的虛擬記憶體),因為作業系統是不允許程序去碰實體記憶體,所以作業系統會給程序提供虛擬記憶體空間
linxu
可以透過 mmap
和 madvice
來分配
go
是透過 heapArena
來獲取虛擬記憶體的,go
每次獲取虛擬記憶體為 64mb
,最多能夠申請 4194304
個虛擬記憶體單元(2^22
)
記憶體單元也叫 heapArena
,所有的 heapArena
組成了mheap
(go
堆記憶體)
heapArena
描述了一個 64mb
的虛擬記憶體空間
// A heapArena stores metadata for a heap arena. heapArenas are stored // outside of the Go heap and accessed via the mheap_.arenas index. type heapArena struct { // bitmap stores the pointer/scalar bitmap for the words in // this arena. See mbitmap.go for a description. // This array uses 1 bit per word of heap, or 1.6% of the heap size (for 64-bit). bitmap [heapArenaBitmapWords]uintptr }
所有的 heapArena
組成了mheap
// Main malloc heap. // The heap itself is the "free" and "scav" treaps, // but all the other global data is here too. // // mheap must not be heap-allocated because it contains mSpanLists, // which must not be heap-allocated. type mheap struct { // arenas is the heap arena map. It points to the metadata for // the heap for every arena frame of the entire usable virtual // address space. // // Use arenaIndex to compute indexes into this array. // // For regions of the address space that are not backed by the // Go heap, the arena map contains nil. // // Modifications are protected by mheap_.lock. Reads can be // performed without locking; however, a given entry can // transition from nil to non-nil at any time when the lock // isn't held. (Entries never transitions back to nil.) // // In general, this is a two-level mapping consisting of an L1 // map and possibly many L2 maps. This saves space when there // are a huge number of arena frames. However, on many // platforms (even 64-bit), arenaL1Bits is 0, making this // effectively a single-level map. In this case, arenas[0] // will never be nil. arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena // central free lists for small size classes. // the padding makes sure that the mcentrals are // spaced CacheLinePadSize bytes apart, so that each mcentral.lock // gets its own cache line. // central is indexed by spanClass. central [numSpanClasses]struct { mcentral mcentral pad [(cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize) % cpu.CacheLinePadSize]byte } }
heapArena 記憶體分配——線性分配
線性分配就是按照物件或者值的大小依此往後排列,如下所示
如果垃圾回收,刪除了一些物件或者值,如下所示
如果再次分配記憶體,會從上次分配的位置開始分配,這是線性分配,如下所示
把這些空擋統計起來,做一個類似連結串列的數據結構,如果有一個小的物件就可以放進來,那如果有一個比較大的物件,還是要放到後面去
所以在記憶體中,由於物件分配和回收就會產生很多碎片,大物件是放不下去的,只能放一些小物件
go
爲了解決這個問題,使用了一種分級分配的方案
go
對 heapArena
切成了很多小塊,給物件分配能放進去的最小的塊
這個分成一個個小的塊,相同的小塊組成的一組成為 mspan
,go
稱為 mspan
每個 mspan
為 n
個相同大小的格子
go
中一共有 67
種 mspan
第一級別最小的格子是 8 bit
,一共有 1024
個,大小有 8k
,最大浪費 87.50%
的空間
具體的對應關係如下,一共有 68
(包括 0
級)
class | bytes/obj | bytes/span | objects | tail waste | max waste | min align |
---|---|---|---|---|---|---|
1 | 8 | 8192 | 1024 | 0 | 87.50% | 8 |
2 | 16 | 8192 | 512 | 0 | 43.75% | 16 |
3 | 24 | 8192 | 341 | 0 | 29.24% | 8 |
4 | 32 | 8192 | 256 | 0 | 21.88% | 32 |
5 | 48 | 8192 | 170 | 32 | 31.52% | 16 |
6 | 64 | 8192 | 128 | 0 | 23.44% | 64 |
7 | 80 | 8192 | 102 | 32 | 19.07% | 16 |
8 | 96 | 8192 | 85 | 32 | 15.95% | 32 |
9 | 112 | 8192 | 73 | 16 | 13.56% | 16 |
10 | 128 | 8192 | 64 | 0 | 11.72% | 128 |
11 | 144 | 8192 | 56 | 128 | 11.82% | 16 |
12 | 160 | 8192 | 51 | 32 | 9.73% | 32 |
13 | 176 | 8192 | 46 | 96 | 9.59% | 16 |
14 | 192 | 8192 | 42 | 128 | 9.25% | 64 |
15 | 208 | 8192 | 39 | 80 | 8.12% | 16 |
16 | 224 | 8192 | 36 | 128 | 8.15% | 32 |
17 | 240 | 8192 | 34 | 32 | 6.62% | 16 |
18 | 256 | 8192 | 32 | 0 | 5.86% | 256 |
19 | 288 | 8192 | 28 | 128 | 12.16% | 32 |
20 | 320 | 8192 | 25 | 192 | 11.80% | 64 |
21 | 352 | 8192 | 23 | 96 | 9.88% | 32 |
22 | 384 | 8192 | 21 | 128 | 9.51% | 128 |
23 | 416 | 8192 | 19 | 288 | 10.71% | 32 |
24 | 448 | 8192 | 18 | 128 | 8.37% | 64 |
25 | 480 | 8192 | 17 | 32 | 6.82% | 32 |
26 | 512 | 8192 | 16 | 0 | 6.05% | 512 |
27 | 576 | 8192 | 14 | 128 | 12.33% | 64 |
28 | 640 | 8192 | 12 | 512 | 15.48% | 128 |
29 | 704 | 8192 | 11 | 448 | 13.93% | 64 |
30 | 768 | 8192 | 10 | 512 | 13.94% | 256 |
31 | 896 | 8192 | 9 | 128 | 15.52% | 128 |
32 | 1024 | 8192 | 8 | 0 | 12.40% | 1024 |
33 | 1152 | 8192 | 7 | 128 | 12.41% | 128 |
34 | 1280 | 8192 | 6 | 512 | 15.55% | 256 |
35 | 1408 | 16384 | 11 | 896 | 14.00% | 128 |
36 | 1536 | 8192 | 5 | 512 | 14.00% | 512 |
37 | 1792 | 16384 | 9 | 256 | 15.57% | 256 |
38 | 2048 | 8192 | 4 | 0 | 12.45% | 2048 |
39 | 2304 | 16384 | 7 | 256 | 12.46% | 256 |
40 | 2688 | 8192 | 3 | 128 | 15.59% | 128 |
41 | 3072 | 24576 | 8 | 0 | 12.47% | 1024 |
42 | 3200 | 16384 | 5 | 384 | 6.22% | 128 |
43 | 3456 | 24576 | 7 | 384 | 8.83% | 128 |
44 | 4096 | 8192 | 2 | 0 | 15.60% | 4096 |
45 | 4864 | 24576 | 5 | 256 | 16.65% | 256 |
46 | 5376 | 16384 | 3 | 256 | 10.92% | 256 |
47 | 6144 | 24576 | 4 | 0 | 12.48% | 2048 |
48 | 6528 | 32768 | 5 | 128 | 6.23% | 128 |
49 | 6784 | 40960 | 6 | 256 | 4.36% | 128 |
50 | 6912 | 49152 | 7 | 768 | 3.37% | 256 |
51 | 8192 | 8192 | 1 | 0 | 15.61% | 8192 |
52 | 9472 | 57344 | 6 | 512 | 14.28% | 256 |
53 | 9728 | 49152 | 5 | 512 | 3.64% | 512 |
54 | 10240 | 40960 | 4 | 0 | 4.99% | 2048 |
55 | 10880 | 32768 | 3 | 128 | 6.24% | 128 |
56 | 12288 | 24576 | 2 | 0 | 11.45% | 4096 |
57 | 13568 | 40960 | 3 | 256 | 9.99% | 256 |
58 | 14336 | 57344 | 4 | 0 | 5.35% | 2048 |
59 | 16384 | 16384 | 1 | 0 | 12.49% | 8192 |
60 | 18432 | 73728 | 4 | 0 | 11.11% | 2048 |
61 | 19072 | 57344 | 3 | 128 | 3.57% | 128 |
62 | 20480 | 40960 | 2 | 0 | 6.87% | 4096 |
63 | 21760 | 65536 | 3 | 256 | 6.25% | 256 |
64 | 24576 | 24576 | 1 | 0 | 11.45% | 8192 |
65 | 27264 | 81920 | 3 | 128 | 10.00% | 128 |
66 | 28672 | 57344 | 2 | 0 | 4.91% | 4096 |
67 | 32768 | 32768 | 1 | 0 | 12.50% | 8192 |
mspan
是一個連結串列,next
指向下一個 mspan
的地址,prev
指向上一個 mspan
的地址
heapArena
內部儲存著 spans
陣列
type mspan struct { next *mspan // next span in list, or nil if none prev *mspan // previous span in list, or nil if none } type heapArena struct { // spans maps from virtual address page ID within this arena to *mspan. // For allocated spans, their pages map to the span itself. // For free spans, only the lowest and highest pages map to the span itself. // Internal pages map to an arbitrary span. // For pages that have never been allocated, spans entries are nil. // // Modifications are protected by mheap.lock. Reads can be // performed without locking, but ONLY from indexes that are // known to contain in-use or stack spans. This means there // must not be a safe-point between establishing that an // address is live and looking it up in the spans array. spans [pagesPerArena]*mspan }
每一個 heapArena
不是有所有的 mspan
,它是根據物件需要去開闢的,所以就需要一箇中心索引 mcentral
有 136
個 mcentral
結構體,其中 68
個組需要 GC
掃描的 mspan
,68
個組不需要 GC
掃描的 mspan
mcentral
就是把 heapArena
中某一級的 span
用連結串列串起來,需要 gc
的放在需要的 gc
的列表中,不需要 gc
的放在不需要的 gc
的列表中
// Central list of free objects of a given size. type mcentral struct { _ sys.NotInHeap spanclass spanClass // partial and full contain two mspan sets: one of swept in-use // spans, and one of unswept in-use spans. These two trade // roles on each GC cycle. The unswept set is drained either by // allocation or by the background sweeper in every GC cycle, // so only two roles are necessary. // // sweepgen is increased by 2 on each GC cycle, so the swept // spans are in partial[sweepgen/2%2] and the unswept spans are in // partial[1-sweepgen/2%2]. Sweeping pops spans from the // unswept set and pushes spans that are still in-use on the // swept set. Likewise, allocating an in-use span pushes it // on the swept set. // // Some parts of the sweeper can sweep arbitrary spans, and hence // can't remove them from the unswept set, but will add the span // to the appropriate swept list. As a result, the parts of the // sweeper and mcentral that do consume from the unswept list may // encounter swept spans, and these should be ignored. partial [2]spanSet // list of spans with a free object full [2]spanSet // list of spans with no free objects }
mcentral
好是好,但是有效能問題,問題在於如果要修改某一個 span
,需要上鎖,上鎖就涉及到了鎖競爭問題
在講協程的時候提到,每個協程有一個本地列隊,我們在這個本地佇列中給它開闢一個 mcache
快取,這樣就減少了鎖競爭
mcache
結構如下
// Per-thread (in Go, per-P) cache for small objects. // This includes a small object cache and local allocation stats. // No locking needed because it is per-thread (per-P). // // mcaches are allocated from non-GC'd memory, so any heap pointers // must be specially handled. type mcache struct { _ sys.NotInHeap // The following members are accessed on every malloc, // so they are grouped here for better caching. nextSample uintptr // trigger heap sample after allocating this many bytes scanAlloc uintptr // bytes of scannable heap allocated // Allocator cache for tiny objects w/o pointers. // See "Tiny allocator" comment in malloc.go. // tiny points to the beginning of the current tiny block, or // nil if there is no current tiny block. // // tiny is a heap pointer. Since mcache is in non-GC'd memory, // we handle it by clearing it in releaseAll during mark // termination. // // tinyAllocs is the number of tiny allocations performed // by the P that owns this mcache. tiny uintptr tinyoffset uintptr tinyAllocs uintptr // The rest is not accessed on every malloc. alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass stackcache [_NumStackOrders]stackfreelist // flushGen indicates the sweepgen during which this mcache // was last flushed. If flushGen != mheap_.sweepgen, the spans // in this mcache are stale and need to the flushed so they // can be swept. This is done in acquirep. flushGen atomic.Uint32 }
p
結構體中有一個 mcache
的屬性
type p struct { mcache *mcache }
GO 的是如何分配堆記憶體的?
物件分級:
Tiny
微物件(0
,16B
)無指標Small
小物件[16B
,32KB
]Large
大物件(32K
,+∞
)
微小物件分配至普通的 mspan
(1~67
),大物件量身定做 mspan
(0
)
微物件分配
從
mcache
中拿到2
級mspan
(一個2
級的mspan
大小是16
位元組)將多個微物件合併成一個
16Byte
存入
// Allocate an object of size bytes. // Small objects are allocated from the per-P cache's free lists. // Large objects (> 32 kB) are allocated straight from the heap. func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer { delayedZeroing := false if size <= maxSmallSize { if noscan && size < maxTinySize { // Tiny allocator. // // Tiny allocator combines several tiny allocation requests // into a single memory block. The resulting memory block // is freed when all subobjects are unreachable. The subobjects // must be noscan (don't have pointers), this ensures that // the amount of potentially wasted memory is bounded. // // Size of the memory block used for combining (maxTinySize) is tunable. // Current setting is 16 bytes, which relates to 2x worst case memory // wastage (when all but one subobjects are unreachable). // 8 bytes would result in no wastage at all, but provides less // opportunities for combining. // 32 bytes provides more opportunities for combining, // but can lead to 4x worst case wastage. // The best case winning is 8x regardless of block size. // // Objects obtained from tiny allocator must not be freed explicitly. // So when an object will be freed explicitly, we ensure that // its size >= maxTinySize. // // SetFinalizer has a special case for objects potentially coming // from tiny allocator, it such case it allows to set finalizers // for an inner byte of a memory block. // // The main targets of tiny allocator are small strings and // standalone escaping variables. On a json benchmark // the allocator reduces number of allocations by ~12% and // reduces heap size by ~20%. off := c.tinyoffset // Align tiny pointer for required (conservative) alignment. if size&7 == 0 { off = alignUp(off, 8) } else if goarch.PtrSize == 4 && size == 12 { // Conservatively align 12-byte objects to 8 bytes on 32-bit // systems so that objects whose first field is a 64-bit // value is aligned to 8 bytes and does not cause a fault on // atomic access. See issue 37262. // TODO(mknyszek): Remove this workaround if/when issue 36606 // is resolved. off = alignUp(off, 8) } else if size&3 == 0 { off = alignUp(off, 4) } else if size&1 == 0 { off = alignUp(off, 2) } if off+size <= maxTinySize && c.tiny != 0 { // The object fits into existing tiny block. x = unsafe.Pointer(c.tiny + off) c.tinyoffset = off + size c.tinyAllocs++ mp.mallocing = 0 releasem(mp) return x } // Allocate a new maxTinySize block. span = c.alloc[tinySpanClass] v := nextFreeFast(span) if v == 0 { v, span, shouldhelpgc = c.nextFree(tinySpanClass) } x = unsafe.Pointer(v) (*[2]uint64)(x)[0] = 0 (*[2]uint64)(x)[1] = 0 // See if we need to replace the existing tiny block with the new one // based on amount of remaining free space. if !raceenabled && (size < c.tinyoffset || c.tiny == 0) { // Note: disabled when race detector is on, see comment near end of this function. c.tiny = uintptr(x) c.tinyoffset = size } size = maxTinySize } } return x }
小物件分配
// Allocate an object of size bytes. // Small objects are allocated from the per-P cache's free lists. // Large objects (> 32 kB) are allocated straight from the heap. func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer { delayedZeroing := false if size <= maxSmallSize { if !(noscan && size < maxTinySize) { var sizeclass uint8 if size <= smallSizeMax-8 { sizeclass = size_to_class8[divRoundUp(size, smallSizeDiv)] } else { sizeclass = size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)] } size = uintptr(class_to_size[sizeclass]) spc := makeSpanClass(sizeclass, noscan) span = c.alloc[spc] v := nextFreeFast(span) if v == 0 { v, span, shouldhelpgc = c.nextFree(spc) } x = unsafe.Pointer(v) if needzero && span.needzero != 0 { memclrNoHeapPointers(x, size) } } } return x }
mcache
的替換
mcache
中,每個級別的mspan
中只有一個當
mspan
滿了之後,會從mcentral
中換一個新的
mcentral
的擴容
mcentral
中,只有有限數量的mspan
當
mspan
缺少時,會從heapArena
開闢新的mspan
refill
從中央的 mspan
找新的 mspan
,如果中央的 mspan
滿了,就擴容
// refill acquires a new span of span class spc for c. This span will // have at least one free object. The current span in c must be full. // // Must run in a non-preemptible context since otherwise the owner of // c could change. func (c *mcache) refill(spc spanClass) { // Return the current cached span to the central lists. s := c.alloc[spc] if uintptr(s.allocCount) != s.nelems { throw("refill of span with free space remaining") } if s != &emptymspan { // Mark this span as no longer cached. if s.sweepgen != mheap_.sweepgen+3 { throw("bad sweepgen in refill") } mheap_.central[spc].mcentral.uncacheSpan(s) // Count up how many slots were used and record it. stats := memstats.heapStats.acquire() slotsUsed := int64(s.allocCount) - int64(s.allocCountBeforeCache) atomic.Xadd64(&stats.smallAllocCount[spc.sizeclass()], slotsUsed) // Flush tinyAllocs. if spc == tinySpanClass { atomic.Xadd64(&stats.tinyAllocCount, int64(c.tinyAllocs)) c.tinyAllocs = 0 } memstats.heapStats.release() // Count the allocs in inconsistent, internal stats. bytesAllocated := slotsUsed * int64(s.elemsize) gcController.totalAlloc.Add(bytesAllocated) // Clear the second allocCount just to be safe. s.allocCountBeforeCache = 0 } // Get a new cached span from the central lists. s = mheap_.central[spc].mcentral.cacheSpan() if s == nil { throw("out of memory") } if uintptr(s.allocCount) == s.nelems { throw("span has no free space") } // Indicate that this span is cached and prevent asynchronous // sweeping in the next sweep phase. s.sweepgen = mheap_.sweepgen + 3 // Store the current alloc count for accounting later. s.allocCountBeforeCache = s.allocCount // Update heapLive and flush scanAlloc. // // We have not yet allocated anything new into the span, but we // assume that all of its slots will get used, so this makes // heapLive an overestimate. // // When the span gets uncached, we'll fix up this overestimate // if necessary (see releaseAll). // // We pick an overestimate here because an underestimate leads // the pacer to believe that it's in better shape than it is, // which appears to lead to more memory used. See #53738 for // more details. usedBytes := uintptr(s.allocCount) * s.elemsize gcController.update(int64(s.npages*pageSize)-int64(usedBytes), int64(c.scanAlloc)) c.scanAlloc = 0 c.alloc[spc] = s }
大物件的分配
直接從
heapArena
開闢0
級的mspan
0
級的mspan
為大物件定製
func mallocgc(){ if size > maxSmallSize { shouldhelpgc = true // For large allocations, keep track of zeroed state so that // bulk zeroing can be happen later in a preemptible context. span = c.allocLarge(size, noscan) span.freeindex = 1 span.allocCount = 1 size = span.elemsize x = unsafe.Pointer(span.base()) if needzero && span.needzero != 0 { if noscan { delayedZeroing = true } else { memclrNoHeapPointers(x, size) // We've in theory cleared almost the whole span here, // and could take the extra step of actually clearing // the whole thing. However, don't. Any GC bits for the // uncleared parts will be zero, and it's just going to // be needzero = 1 once freed anyway. } } } }
垃圾回收
垃圾回收有三種思路:
標記-清除
從根節點出發,標記所有可以到達的物件,然後清除沒有標記的物件
缺點:會有記憶體碎片
標記-整理
比標記清除多了一步整理,把所有的物件往前移動
缺點:整理的開銷比較大
複製
在標記清除的基礎上,把存活的物件整理後複製到另外一塊記憶體中
缺點:浪費一半的記憶體
go
因為堆記憶體結構的獨特優勢,選擇了最簡單的標記-清除,找到有引用的物件,剩下的就是沒有引用的物件
從哪開始找
被棧上的指標引用
被全域性變數指標引用
被暫存器中的指標引用(現在正在操作的變數)
上述變數被稱為
Root Set
(GC Root
)
透過廣度優先(BFS
)的方式搜尋,這種分析也被稱為可達性分析標記法
三色標記法:(每次掃描時,所有的物件恢復爲白色)
黑色:有用,已經分析掃描
它自己是有用的,並且它內部的屬性也都已經分析過了
灰色:有用,還為分析掃描
它自己已經已經分析過了,但是它內部的屬性還沒有分析過
白色:暫時無用(最後是要被清除的)
要麼是還沒有掃描到
要麼是已經掃描了,但是沒有用
序列 GC
步驟
Stop The World
,暫停所有其他協程透過可達性分析,找到無用的堆記憶體
釋放堆記憶體
併發垃圾回收的難點:
標記階段
插入屏障:插入的節點強制置為灰色
刪除屏障:釋放的節點強制置為灰色
如果已經分析過的節點,它的內部屬性釋放了,同時那個釋放的那個屬性又被分析過的節點引用了,就會造成這個節點無法被掃描到,導致被刪除
如果已經分析過的節點,由於業務操作,在的內部新增了一個屬性,因為這個節點已經被掃描過了,所以新增的屬性無法被掃描到,導致被刪除
GC
觸發時機:
系統定時觸發
sysmon
定時檢查,是runtime
背後的一個迴圈,如果2
分鐘之內沒有GC
,會強制觸發使用者顯示觸發
使用者呼叫
runtime.GC
申請記憶體時觸發
GC
最佳化:
儘量少在堆上產生垃圾
空結構體指向一個固定的地址,不佔用空間,比如用
channel
傳遞空結構體逃逸會使原本在棧上的物件進入堆中:
fmt
包返回了指標而不是複製
快取性質的物件:
channel
快取區頻繁建立和刪除
記憶體池化
減少逃逸
使用空結構體
GC
分析工具
go tool pprof
go tool trace
go build -gcflags="-m"
GODEBUG="gctrace=1"