pycub

Go stack allocation optimizations: a visual guide

Why this matters

Go programs allocate memory in two places: the stack and the heap.

  • Stack — fast, free cleanup. Memory lives in the function’s stack frame and vanishes when the function returns. No garbage collector needed.
  • Heap — slower, tracked by GC. The runtime must find free space, record the pointer, and eventually reclaim it.

The fewer heap allocations your program makes, the faster it runs and the less pressure on the garbage collector. Across Go 1.24, 1.25, and 1.26, the compiler learned to keep common slice patterns on the stack instead of the heap — automatically, with no code changes from you.

This post walks through each optimization step by step, with diagrams showing exactly what happens in memory.


Part 1: The problem

Here’s the most natural way to build a slice in Go:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Simple and readable. But before Go 1.26, this code has a hidden cost: every time the slice outgrows its backing array, Go allocates a new, larger array on the heap and copies everything over. The old arrays become garbage.

Click through the steps below to see what happens when we append 5 elements:

First append
The slice starts as nil. append allocates a brand-new backing array on the heap with capacity 1.
1 heap alloc
Stack (fast, auto-freed)
tasks (len=1, cap=1)IN USE
Heap (slower, GC-managed)
[1]taskIN USE
One element, one heap allocation. So far so good.
1 of 5

The takeaway: for just 5 elements, we made 4 heap allocations and created 3 garbage arrays. This “startup cost” of doubling is what Go 1.24–1.26 progressively eliminates.


Part 2: Go 1.24 — constant-capacity slices move to stack

The first optimization targets a specific pattern: when you call make with a compile-time constant capacity.

tasks := make([]task, 0, 10)   // capacity is the constant 10

Because the compiler knows the exact size at compile time, it can reserve space for the backing array directly in the function’s stack frame — no heap allocation at all.

0 heap allocs
Stack (fast, auto-freed)
tasks (len=0, cap=10)IN USE
[10]task backing arrayIN USE
Heap (slower, GC-managed)
(nothing allocated)
The backing array lives on the stack. Zero heap allocations, zero GC pressure.

As long as you don’t exceed the capacity, every append is free. If you do exceed it, Go falls back to a heap allocation — same as before.

The catch: the capacity must be a constant. If it comes from a variable, this doesn’t help:

tasks := make([]task, 0, n)   // n is a variable → still heap-allocated in Go 1.24

Part 3: Go 1.25 — the 32-byte speculative buffer

Go 1.25 asks a simple question: what if we optimistically reserve a small fixed buffer on the stack, and use it when the variable-sized request fits?

The compiler inserts a 32-byte buffer into the stack frame. At runtime, if the requested make capacity fits within 32 bytes, the slice uses the stack buffer. If not, it falls back to the heap.

tasks := make([]task, 0, n)   // variable capacity — but might fit in 32 bytes

Conceptually, the compiler transforms this into:

var buf [32]byte              // always on the stack
if n * sizeof(task) <= 32 {
    tasks = buf[:0:n]         // point slice at the stack buffer
} else {
    tasks = make([]task, 0, n) // too big — use the heap
}
0 heap allocs
Stack (fast, auto-freed)
tasks (len=0, cap=n)IN USE
[32]byte speculative bufferIN USE
Heap (slower, GC-managed)
(nothing allocated)
If your task struct is 8 bytes, up to 4 tasks fit in the 32-byte buffer — zero heap allocations.

This happens automatically — no code changes. The 32-byte cost is always paid in stack space, but stack space is effectively free.


Part 4: Go 1.26 — append on nil slices uses the stack

This is the big one. Go 1.26 applies the same 32-byte buffer trick to the most common pattern of all: append on a nil slice.

var tasks []task              // nil slice — no make at all
for t := range c {
    tasks = append(tasks, t)  // first append uses the stack buffer
}

No make, no capacity guess. Just declare and append. The compiler handles the rest.

The comparison below shows the same 5 appends side by side — the old heap-heavy behavior on the left, Go 1.26’s stack-optimized behavior on the right:

First append
The slice starts as nil. Watch where the first element goes.
Before (Go < 1.26)
1 heap alloc
Stack (fast, auto-freed)
tasks (len=1, cap=1)IN USE
Heap (slower, GC-managed)
[1]taskIN USE
Go 1.26
0 heap allocs
Stack (fast, auto-freed)
tasks (len=1, cap=4)IN USE
[32]byte bufferIN USE
Heap (slower, GC-managed)
(nothing allocated)
1 of 5

The result for 5 elements:

  • Before (Part 1): 4 heap allocations, 3 garbage arrays
  • Go 1.26: 1 heap allocation, 0 garbage arrays

And if your slice stays under 4 elements (for 8-byte structs)? Zero heap allocations entirely.


Part 5: What about returned slices?

There’s one more case: what if the function returns the slice? The slice “escapes” the function, so its data can’t stay on the caller’s stack — the stack frame is about to be freed.

func extract(c chan task) []task {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    return tasks   // slice escapes!
}

Go 1.26 handles this with runtime.move2heap: right before returning, the compiler checks if the slice still points to the stack buffer. If so, it copies the data to a right-sized heap allocation. If the slice already overflowed to the heap, it does nothing.

Building the slice (same as Part 4)
append uses the stack buffer while building. In this example we have 3 elements — they all fit.
0 heap allocs
Stack (fast, auto-freed)
tasks (len=3, cap=4)IN USE
[32]byte bufferIN USE
Heap (slower, GC-managed)
(nothing allocated)
So far, identical to the non-escaping case.
1 of 3

Summary: allocation counts by Go version

Pattern1.241.251.26
Constant make000
Variable make10–10–1
append on nil1+N1+N0–1
Returned slice1+1+1

Bold = version that introduced the optimization. Counts are heap allocations.


What this means for your code

The natural, simple way to write Go is now also the efficient way:

Old style: manual optimization
func collect(c chan task) []task {
    tasks := make([]task, 0, 8)
    for t := range c {
        tasks = append(tasks, t)
    }
    result := make([]task, len(tasks))
    copy(result, tasks)
    return result
}
Go 1.26+: just write it
func collect(c chan task) []task {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    return tasks
}

When you should still pre-allocate manually:

  • You have a good size estimate and the data exceeds 32 bytes — make([]T, 0, n) avoids the startup doubling entirely
  • Hot paths where even 1 allocation matters — but profile first
  • Very large slices where the 32-byte buffer won’t help

For everything else, write the simple code and let the compiler optimize it. Go is steadily closing the gap between “easy to write” and “fast to run.”

Based on Keith Randall’s Go blog post on allocation optimizations.