Skip to content

Commit babedc0

Browse files
committed
Matheus Q3 2025 report
1 parent 7ef7d17 commit babedc0

1 file changed

Lines changed: 183 additions & 0 deletions

File tree

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
---
2+
layout: post
3+
nav-class: dark
4+
categories: mizvekov, clang
5+
title: Making the Clang AST Leaner and Faster
6+
author-id: mizvekov
7+
author-name: Matheus Izvekov
8+
---
9+
10+
Modern C++ codebases — from browsers to GPU frameworks — rely heavily on templates, and that often means *massive* abstract syntax trees. Even small inefficiencies in Clang’s AST representation can add up to noticeable compile-time overhead.
11+
12+
This post walks through a set of structural improvements I recently made to Clang’s AST that make type representation smaller, simpler, and faster to create — leading to measurable build-time gains in real-world projects.
13+
14+
---
15+
16+
A couple of months ago, I landed [a large patch](https://github.com/llvm/llvm-project/pull/147835) in Clang that brought substantial compile-time improvements for heavily templated C++ code.
17+
18+
For example, in [stdexec](https://github.com/NVIDIA/stdexec) — the reference implementation of the `std::execution` [feature slated for C++26](http://wg21.link/p2300) — the slowest test ([`test_on2.cpp`](https://github.com/NVIDIA/stdexec/blob/main/test/stdexec/algos/adaptors/test_on2.cpp)) saw a **7% reduction in build time**.
19+
20+
Also the [Chromium](https://www.chromium.org/Home/) build showed a **5% improvement** ([source](https://github.com/llvm/llvm-project/pull/147835#issuecomment-3278893447)).
21+
22+
At a high level, the patch makes the Clang AST *leaner*: it reduces the memory footprint of type representations and lowers the cost of creating and uniquing them.
23+
24+
These improvements will ship with **Clang 22**, expected in the next few months.
25+
26+
---
27+
28+
## How elaboration and qualified names used to work
29+
30+
Consider this simple snippet:
31+
32+
```cpp
33+
namespace NS {
34+
struct A {};
35+
}
36+
using T = struct NS::A;
37+
```
38+
39+
The type of `T` (`struct NS::A`) carries two pieces of information:
40+
41+
1. It’s *elaborated* — the `struct` keyword appears.
42+
2. It’s *qualified* — `NS::` acts as a [*nested-name-specifier*](https://eel.is/c++draft/expr.prim.id.qual#:nested-name-specifier).
43+
44+
Here’s how the [AST dump](https://compiler-explorer.com/z/WEWc4817x) looked before this patch:
45+
46+
```
47+
ElaboratedType 'struct NS::A' sugar
48+
`-RecordType 'test::NS::A'
49+
`-CXXRecord 'A'
50+
```
51+
52+
The `RecordType` represents a direct reference to the previously declared `struct A` — a kind of *canonical* view of the type, stripped of syntactic details like `struct` or namespace qualifiers.
53+
54+
Those syntactic details were stored separately in an `ElaboratedType` node that wrapped the `RecordType`.
55+
56+
Interestingly, an `ElaboratedType` node existed even when no elaboration or qualification appeared in the source ([example](https://compiler-explorer.com/z/ncW5bzWrc)). This was needed to distinguish between an explicitly unqualified type and one that lost its qualifiers through template substitution.
57+
58+
However, this design was expensive: every `ElaboratedType` node consumed **48 bytes**, and creating one required extra work to uniquify it — an important step for Clang’s fast type comparisons.
59+
60+
---
61+
62+
## A more compact representation
63+
64+
The new approach removes `ElaboratedType` entirely. Instead, elaboration and qualifiers are now stored **directly inside `RecordType`**.
65+
66+
The [new AST dump](https://compiler-explorer.com/z/asz5q5YGj) for the same example looks like this:
67+
68+
```cpp
69+
RecordType 'struct NS::A' struct
70+
|-NestedNameSpecifier Namespace 'NS'
71+
`-CXXRecord 'A'
72+
```
73+
74+
The `struct` elaboration now fits into previously unused bits within `RecordType`, while the qualifier is *tail-allocated* when present — making the node variably sized.
75+
76+
This change both shrinks the memory footprint and eliminates one level of indirection when traversing the AST.
77+
78+
---
79+
80+
## Representing `NestedNameSpecifier`
81+
82+
`NestedNameSpecifier` is Clang’s internal representation for name qualifiers.
83+
84+
Before this patch, it was represented by a pointer (`NestedNameSpecifier*`) to a uniqued structure that could describe:
85+
86+
1. The global namespace (`::`)
87+
2. A named namespace (including aliases)
88+
3. A type
89+
4. An identifier naming an unknown entity
90+
5. A `__super` reference (Microsoft extension)
91+
92+
For all but cases (1) and (5), each `NestedNameSpecifier` also held a *prefix* — the qualifier to its left.
93+
94+
For example:
95+
96+
```cpp
97+
Namespace::Class::NestedClassTemplate<T>::XX
98+
```
99+
100+
This would be stored as a linked list:
101+
102+
```
103+
[id: XX] -> [type: NestedClassTemplate<T>] -> [type: Class] -> [namespace: Namespace]
104+
```
105+
106+
Internally, that meant **seven allocations** totaling around **160 bytes**:
107+
108+
1. `NestedNameSpecifier` (identifier) – 16 bytes
109+
2. `NestedNameSpecifier` (type) – 16 bytes
110+
3. `TemplateSpecializationType` – 48 bytes
111+
4. `QualifiedTemplateName` – 16 bytes
112+
5. `NestedNameSpecifier` (type) – 16 bytes
113+
6. `RecordType` – 32 bytes
114+
7. `NestedNameSpecifier` (namespace) – 16 bytes
115+
116+
The real problem wasn’t just size — it was the *uniquing cost*. Every prospective node has to be looked up in a hash table for a pre-existing instance.
117+
118+
To make matters worse, `ElaboratedType` nodes sometimes leaked into these chains, which wasn’t supposed to happen and led to [several](https://github.com/llvm/llvm-project/issues/43179) [long-standing](https://github.com/llvm/llvm-project/issues/68670) [bugs](https://github.com/llvm/llvm-project/issues/92757).
119+
120+
---
121+
122+
## A new, smarter `NestedNameSpecifier`
123+
124+
After this patch, `NestedNameSpecifier` becomes a **compact, tagged pointer** — just one machine word wide.
125+
126+
The pointer uses 8-byte alignment, leaving three spare bits. Two bits are used for kind discrimination, and one remains available for arbitrary use.
127+
128+
When non-null, the tag bits encode:
129+
130+
1. A type
131+
2. A declaration (either a `__super` class or a namespace)
132+
3. A namespace prefixed by the global scope (`::Namespace`)
133+
4. A special object combining a namespace with its prefix
134+
135+
When null, the tag bits instead encode:
136+
137+
1. An empty nested name (the terminator)
138+
2. The global name
139+
3. An invalid/tombstone entry (for hash tables)
140+
141+
Other changes include:
142+
143+
* The “unknown identifier” case is now represented by a `DependentNameType`.
144+
* Type prefixes are handled directly in the type hierarchy.
145+
146+
Revisiting the earlier example, after the patch its AST dump becomes:
147+
148+
```
149+
DependentNameType 'Namespace::Class::NestedClassTemplate<T>::XX' dependent
150+
`-NestedNameSpecifier TemplateSpecializationType 'Namespace::Class::NestedClassTemplate<T>' dependent
151+
`-name: 'Namespace::Class::NestedClassTemplate' qualified
152+
|-NestedNameSpecifier RecordType 'Namespace::Class'
153+
| |-NestedNameSpecifier Namespace 'Namespace'
154+
| `-CXXRecord 'Class'
155+
`-ClassTemplate NestedClassTemplate
156+
```
157+
158+
This representation now requires only **four allocations (156 bytes total):**
159+
160+
1. `DependentNameType` – 48 bytes
161+
2. `TemplateSpecializationType` – 48 bytes
162+
3. `QualifiedTemplateName` – 16 bytes
163+
4. `RecordType` – 40 bytes
164+
165+
That’s almost half the number of nodes.
166+
167+
While `DependentNameType` is larger than the previous 16-byte “identifier” node, the additional space isn’t wasted — it holds cached answers to common queries such as “does this type reference a template parameter?” or “what is its canonical form?”.
168+
169+
These caches make those operations significantly cheaper, further improving performance.
170+
171+
---
172+
173+
## Wrapping up
174+
175+
There’s more in the patch than what I’ve covered here, including:
176+
177+
* `RecordType` now points directly to the declaration found at creation, enriching the AST without measurable overhead.
178+
* `RecordType` nodes are now created lazily.
179+
* The redesigned `NestedNameSpecifier` simplified several template instantiation transforms.
180+
181+
Each of these could warrant its own write-up, but even this high-level overview shows how careful structural changes in the AST can lead to tangible compile-time wins.
182+
183+
I hope you found this deep dive into Clang’s internals interesting — and that it gives a glimpse of the kind of small, structural optimizations that add up to real performance improvements in large C++ builds.

0 commit comments

Comments
 (0)