|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +nav-class: dark |
| 4 | +categories: mizvekov, clang |
| 5 | +title: Making the Clang AST Leaner and Faster |
| 6 | +author-id: mizvekov |
| 7 | +author-name: Matheus Izvekov |
| 8 | +--- |
| 9 | + |
| 10 | +Modern C++ codebases — from browsers to GPU frameworks — rely heavily on templates, and that often means *massive* abstract syntax trees. Even small inefficiencies in Clang’s AST representation can add up to noticeable compile-time overhead. |
| 11 | + |
| 12 | +This post walks through a set of structural improvements I recently made to Clang’s AST that make type representation smaller, simpler, and faster to create — leading to measurable build-time gains in real-world projects. |
| 13 | + |
| 14 | +--- |
| 15 | + |
| 16 | +A couple of months ago, I landed [a large patch](https://github.com/llvm/llvm-project/pull/147835) in Clang that brought substantial compile-time improvements for heavily templated C++ code. |
| 17 | + |
| 18 | +For example, in [stdexec](https://github.com/NVIDIA/stdexec) — the reference implementation of the `std::execution` [feature slated for C++26](http://wg21.link/p2300) — the slowest test ([`test_on2.cpp`](https://github.com/NVIDIA/stdexec/blob/main/test/stdexec/algos/adaptors/test_on2.cpp)) saw a **7% reduction in build time**. |
| 19 | + |
| 20 | +Also the [Chromium](https://www.chromium.org/Home/) build showed a **5% improvement** ([source](https://github.com/llvm/llvm-project/pull/147835#issuecomment-3278893447)). |
| 21 | + |
| 22 | +At a high level, the patch makes the Clang AST *leaner*: it reduces the memory footprint of type representations and lowers the cost of creating and uniquing them. |
| 23 | + |
| 24 | +These improvements will ship with **Clang 22**, expected in the next few months. |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## How elaboration and qualified names used to work |
| 29 | + |
| 30 | +Consider this simple snippet: |
| 31 | + |
| 32 | +```cpp |
| 33 | +namespace NS { |
| 34 | + struct A {}; |
| 35 | +} |
| 36 | +using T = struct NS::A; |
| 37 | +``` |
| 38 | +
|
| 39 | +The type of `T` (`struct NS::A`) carries two pieces of information: |
| 40 | +
|
| 41 | +1. It’s *elaborated* — the `struct` keyword appears. |
| 42 | +2. It’s *qualified* — `NS::` acts as a [*nested-name-specifier*](https://eel.is/c++draft/expr.prim.id.qual#:nested-name-specifier). |
| 43 | +
|
| 44 | +Here’s how the [AST dump](https://compiler-explorer.com/z/WEWc4817x) looked before this patch: |
| 45 | +
|
| 46 | +``` |
| 47 | +ElaboratedType 'struct NS::A' sugar |
| 48 | +`-RecordType 'test::NS::A' |
| 49 | + `-CXXRecord 'A' |
| 50 | +``` |
| 51 | +
|
| 52 | +The `RecordType` represents a direct reference to the previously declared `struct A` — a kind of *canonical* view of the type, stripped of syntactic details like `struct` or namespace qualifiers. |
| 53 | +
|
| 54 | +Those syntactic details were stored separately in an `ElaboratedType` node that wrapped the `RecordType`. |
| 55 | +
|
| 56 | +Interestingly, an `ElaboratedType` node existed even when no elaboration or qualification appeared in the source ([example](https://compiler-explorer.com/z/ncW5bzWrc)). This was needed to distinguish between an explicitly unqualified type and one that lost its qualifiers through template substitution. |
| 57 | +
|
| 58 | +However, this design was expensive: every `ElaboratedType` node consumed **48 bytes**, and creating one required extra work to uniquify it — an important step for Clang’s fast type comparisons. |
| 59 | +
|
| 60 | +--- |
| 61 | +
|
| 62 | +## A more compact representation |
| 63 | +
|
| 64 | +The new approach removes `ElaboratedType` entirely. Instead, elaboration and qualifiers are now stored **directly inside `RecordType`**. |
| 65 | +
|
| 66 | +The [new AST dump](https://compiler-explorer.com/z/asz5q5YGj) for the same example looks like this: |
| 67 | +
|
| 68 | +```cpp |
| 69 | +RecordType 'struct NS::A' struct |
| 70 | +|-NestedNameSpecifier Namespace 'NS' |
| 71 | +`-CXXRecord 'A' |
| 72 | +``` |
| 73 | + |
| 74 | +The `struct` elaboration now fits into previously unused bits within `RecordType`, while the qualifier is *tail-allocated* when present — making the node variably sized. |
| 75 | + |
| 76 | +This change both shrinks the memory footprint and eliminates one level of indirection when traversing the AST. |
| 77 | + |
| 78 | +--- |
| 79 | + |
| 80 | +## Representing `NestedNameSpecifier` |
| 81 | + |
| 82 | +`NestedNameSpecifier` is Clang’s internal representation for name qualifiers. |
| 83 | + |
| 84 | +Before this patch, it was represented by a pointer (`NestedNameSpecifier*`) to a uniqued structure that could describe: |
| 85 | + |
| 86 | +1. The global namespace (`::`) |
| 87 | +2. A named namespace (including aliases) |
| 88 | +3. A type |
| 89 | +4. An identifier naming an unknown entity |
| 90 | +5. A `__super` reference (Microsoft extension) |
| 91 | + |
| 92 | +For all but cases (1) and (5), each `NestedNameSpecifier` also held a *prefix* — the qualifier to its left. |
| 93 | + |
| 94 | +For example: |
| 95 | + |
| 96 | +```cpp |
| 97 | +Namespace::Class::NestedClassTemplate<T>::XX |
| 98 | +``` |
| 99 | + |
| 100 | +This would be stored as a linked list: |
| 101 | + |
| 102 | +``` |
| 103 | +[id: XX] -> [type: NestedClassTemplate<T>] -> [type: Class] -> [namespace: Namespace] |
| 104 | +``` |
| 105 | + |
| 106 | +Internally, that meant **seven allocations** totaling around **160 bytes**: |
| 107 | + |
| 108 | +1. `NestedNameSpecifier` (identifier) – 16 bytes |
| 109 | +2. `NestedNameSpecifier` (type) – 16 bytes |
| 110 | +3. `TemplateSpecializationType` – 48 bytes |
| 111 | +4. `QualifiedTemplateName` – 16 bytes |
| 112 | +5. `NestedNameSpecifier` (type) – 16 bytes |
| 113 | +6. `RecordType` – 32 bytes |
| 114 | +7. `NestedNameSpecifier` (namespace) – 16 bytes |
| 115 | + |
| 116 | +The real problem wasn’t just size — it was the *uniquing cost*. Every prospective node has to be looked up in a hash table for a pre-existing instance. |
| 117 | + |
| 118 | +To make matters worse, `ElaboratedType` nodes sometimes leaked into these chains, which wasn’t supposed to happen and led to [several](https://github.com/llvm/llvm-project/issues/43179) [long-standing](https://github.com/llvm/llvm-project/issues/68670) [bugs](https://github.com/llvm/llvm-project/issues/92757). |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +## A new, smarter `NestedNameSpecifier` |
| 123 | + |
| 124 | +After this patch, `NestedNameSpecifier` becomes a **compact, tagged pointer** — just one machine word wide. |
| 125 | + |
| 126 | +The pointer uses 8-byte alignment, leaving three spare bits. Two bits are used for kind discrimination, and one remains available for arbitrary use. |
| 127 | + |
| 128 | +When non-null, the tag bits encode: |
| 129 | + |
| 130 | +1. A type |
| 131 | +2. A declaration (either a `__super` class or a namespace) |
| 132 | +3. A namespace prefixed by the global scope (`::Namespace`) |
| 133 | +4. A special object combining a namespace with its prefix |
| 134 | + |
| 135 | +When null, the tag bits instead encode: |
| 136 | + |
| 137 | +1. An empty nested name (the terminator) |
| 138 | +2. The global name |
| 139 | +3. An invalid/tombstone entry (for hash tables) |
| 140 | + |
| 141 | +Other changes include: |
| 142 | + |
| 143 | +* The “unknown identifier” case is now represented by a `DependentNameType`. |
| 144 | +* Type prefixes are handled directly in the type hierarchy. |
| 145 | + |
| 146 | +Revisiting the earlier example, after the patch its AST dump becomes: |
| 147 | + |
| 148 | +``` |
| 149 | +DependentNameType 'Namespace::Class::NestedClassTemplate<T>::XX' dependent |
| 150 | +`-NestedNameSpecifier TemplateSpecializationType 'Namespace::Class::NestedClassTemplate<T>' dependent |
| 151 | + `-name: 'Namespace::Class::NestedClassTemplate' qualified |
| 152 | + |-NestedNameSpecifier RecordType 'Namespace::Class' |
| 153 | + | |-NestedNameSpecifier Namespace 'Namespace' |
| 154 | + | `-CXXRecord 'Class' |
| 155 | + `-ClassTemplate NestedClassTemplate |
| 156 | +``` |
| 157 | + |
| 158 | +This representation now requires only **four allocations (156 bytes total):** |
| 159 | + |
| 160 | +1. `DependentNameType` – 48 bytes |
| 161 | +2. `TemplateSpecializationType` – 48 bytes |
| 162 | +3. `QualifiedTemplateName` – 16 bytes |
| 163 | +4. `RecordType` – 40 bytes |
| 164 | + |
| 165 | +That’s almost half the number of nodes. |
| 166 | + |
| 167 | +While `DependentNameType` is larger than the previous 16-byte “identifier” node, the additional space isn’t wasted — it holds cached answers to common queries such as “does this type reference a template parameter?” or “what is its canonical form?”. |
| 168 | + |
| 169 | +These caches make those operations significantly cheaper, further improving performance. |
| 170 | + |
| 171 | +--- |
| 172 | + |
| 173 | +## Wrapping up |
| 174 | + |
| 175 | +There’s more in the patch than what I’ve covered here, including: |
| 176 | + |
| 177 | +* `RecordType` now points directly to the declaration found at creation, enriching the AST without measurable overhead. |
| 178 | +* `RecordType` nodes are now created lazily. |
| 179 | +* The redesigned `NestedNameSpecifier` simplified several template instantiation transforms. |
| 180 | + |
| 181 | +Each of these could warrant its own write-up, but even this high-level overview shows how careful structural changes in the AST can lead to tangible compile-time wins. |
| 182 | + |
| 183 | +I hope you found this deep dive into Clang’s internals interesting — and that it gives a glimpse of the kind of small, structural optimizations that add up to real performance improvements in large C++ builds. |
0 commit comments