Skip to content

Commit 34e8d4a

Browse files
committed
Respond to feedback
Signed-off-by: James Sturtevant <jsturtevant@gmail.com>
1 parent 108f7a2 commit 34e8d4a

1 file changed

Lines changed: 45 additions & 90 deletions

File tree

docs/paging-development-notes.md

Lines changed: 45 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -7,46 +7,15 @@ design in which the guest is aware of a readonly snapshot from
77
which it is being run, and manages its own copy-on-write.
88

99
Because of this, there are two very fundamental regions of the guest
10-
physical address space, which are always populated: one, near the
11-
bottom of memory (starting at GPA `0x1000`), is a
12-
(hypervisor-enforced) readonly mapping of the base snapshot from which
13-
this guest is being evolved. Another, at the top of memory, is simply
10+
physical address space, which are always populated: one, at the very
11+
bottom of memory, is a (hypervisor-enforced) readonly mapping of the
12+
base snapshot from which this guest is being evolved. Another, at the top of memory, is simply
1413
a large bag of blank pages: scratch memory into which this VM can
1514
write.
1615

17-
```
18-
Guest Physical Address Space (GPA)
19-
20-
+-------------------------------+ MAX_GPA
21-
| Exn Stack, Bookkeeping |
22-
| (scratch size, allocator |
23-
| state, reserved PT slot) |
24-
+-------------------------------+
25-
| Free Scratch Memory |
26-
+-------------------------------+
27-
| Output Data |
28-
+-------------------------------+
29-
| Input Data |
30-
+-------------------------------+
31-
| |
32-
| (unmapped — no RAM |
33-
| backing these addrs) |
34-
| |
35-
+-------------------------------+
36-
| |
37-
| Snapshot (RO / CoW on write) |
38-
| Guest Page Tables |
39-
| Init Data |
40-
| Guest Heap |
41-
| PEB |
42-
| Guest Binary |
43-
| |
44-
+-------------------------------+ 0x1000
45-
| (null guard page) |
46-
+-------------------------------+ 0x0000
47-
```
48-
49-
16+
For the detailed layout of each region, including field offsets, see
17+
the diagrams and comments in [`src/hyperlight_host/src/mem/layout.rs`](../src/hyperlight_host/src/mem/layout.rs)
18+
and the constants in [`hyperlight_common::layout`](../src/hyperlight_common/src/layout.rs).
5019

5120
## The scratch map
5221

@@ -92,63 +61,56 @@ original virtual address to point to the new page.
9261
Snapshot page at GPA 0x5000 is untouched.
9362
```
9463

95-
The page table
96-
entries to do this will likely need to be copied themselves, and so a
64+
The page table entries to do this will likely need to be copied themselves, and so a
9765
ready supply of already-mapped scratch pages to use for replacement
98-
page tables is set up by the Host. The guest keeps a mapping of the entire scratch
99-
physical memory into virtual memory at a fixed offset
100-
(`scratch_base_gva - scratch_base_gpa`), so that any scratch physical
101-
address can be accessed by adding this offset.
66+
page tables is needed. Currently, the guest accomplishes this by
67+
keeping an identity mapping of the entire scratch memory around.
10268

10369
The host and the guest need to agree on the location of this mapping,
10470
so that (a) the host can create it when first setting up a blank guest
10571
and (b) the host can ignore it when taking a snapshot (see below).
10672

107-
The host creates the scratch map at the top of virtual memory
108-
(`MAX_GVA - scratch_size + 1`) and at the top of physical memory
109-
(`MAX_GPA - scratch_size + 1`). In the future, we may add support for a guest to
73+
Currently, the host always creates the scratch map at the top of
74+
virtual memory. In the future, we may add support for a guest to
11075
request that it be moved.
11176

11277
## The snapshot mapping
11378

11479
The snapshot page tables must be mapped at some virtual address so
115-
that the guest can read and copy them during CoW operations.
116-
Today, the host simply
117-
copies the page tables into scratch when restoring a sandbox, and the
118-
guest works on those scratch copies directly.
80+
that the guest can read and copy them during CoW operations. The
81+
preferred approach is to map the snapshot page tables directly from
82+
the snapshot region into the guest's virtual address space.
11983

120-
## Top-of-scratch metadata layout
84+
However, on amd64, this is complicated by architectural constraints.
85+
Currently, the host simply copies the page tables into scratch when
86+
restoring a sandbox, and the guest works on those scratch copies
87+
directly. In the near future, we expect to be able to use the
88+
preferred approach on aarch64, and with some minor hypervisor changes,
89+
on amd64 as well.
12190

122-
The top page of the scratch region contains structured metadata at
123-
fixed offsets down from the top:
124-
125-
| Offset from top | Field |
126-
|-----------------|--------------------------------------|
127-
| `0x08` | Scratch size (`u64`) |
128-
| `0x10` | Allocator state (`u64`) |
129-
| `0x18` | Reserved snapshot PT base (`u64`) |
130-
| `0x20` | Exception stack starts here |
91+
## Top-of-scratch metadata layout
13192

93+
The top of the scratch region contains structured metadata at fixed
94+
offsets such as the scratch size, allocator state and where the excpetions starts.
13295
These offsets are defined as `SCRATCH_TOP_*` constants in
133-
`hyperlight_common::layout`.
96+
[`hyperlight_common::layout`](../src/hyperlight_common/src/layout.rs), which has detailed comments on each
97+
field.
13498

13599
## The physical page allocator
136100

137101
The host needs to be able to reset the state of the physical page
138-
allocator when resuming from a snapshot. We use a simple bump
139-
allocator as a physical page allocator, with no support for free,
102+
allocator when resuming from a snapshot. Currently, we use a simple
103+
bump allocator as a physical page allocator, with no support for free,
140104
since pages not in use will automatically be omitted from a snapshot.
141105
The allocator state is a single `u64` tracking the address of the
142-
first free page, located at offset `0x10` from the top of scratch
143-
(see layout above). The guest advances it atomically via `lock xadd`.
106+
first free page, located below the metadata at the top of scratch.
107+
The guest advances it atomically.
144108

145109
## The guest exception stack
146110

147111
Similarly, the guest needs a stack that is always writable, in order
148-
to be able to take exceptions to it. The exception stack begins at
149-
offset `0x20` from the top of the scratch region (below the metadata
150-
fields described above) and grows downward through the remainder of
151-
the top page.
112+
to be able to take exceptions to it. The exception stack begins below
113+
the metadata at the top of the scratch region and grows downward.
152114

153115
## Taking a snapshot
154116

@@ -177,26 +139,17 @@ calls, i.e. there may be no calls in flight at the time of
177139
snapshotting. This is not enforced, but odd things may happen if it is
178140
violated.
179141

180-
I/O buffers are statically allocated at the bottom of the scratch
181-
region:
142+
Buffer management between the host and guest is needed to pass call
143+
arguments and return values. Ideally, buffers would be dynamically
144+
allocated from the scratch region as needed.
182145

183-
```
184-
+-------------------------------------------+ (top of scratch)
185-
| Exn Stack, Bookkeeping |
186-
| (scratch size, allocator state, |
187-
| reserved PT base) |
188-
+-------------------------------------------+
189-
| Free Scratch Memory |
190-
+-------------------------------------------+
191-
| Output Data |
192-
+-------------------------------------------+
193-
| Input Data |
194-
+-------------------------------------------+ (scratch base)
195-
```
146+
Currently, I/O buffers are statically allocated at the bottom of the
147+
scratch region. This is a stopgap pending improved
148+
physical allocation and buffer management.
196149

197-
The minimum scratch size (`min_scratch_size()`) accounts for these
198-
buffers plus overhead for the Task State Segment (TSS), Interrupt Descriptor Table (IDT), page table CoW, a minimal
199-
non-exception stack, and the exception stack and metadata.
150+
The minimum scratch size is calculated by `min_scratch_size()` in the
151+
architecture-specific layout modules under `hyperlight_common`; see
152+
that function for the detailed breakdown of required overhead.
200153

201154
## Creating a fresh guest
202155

@@ -207,9 +160,11 @@ virtual memory. If the ELF has segments whose virtual addresses
207160
overlap with the scratch map, an error will be returned.
208161

209162
In the current startup path, the host enters the guest with
210-
`RSP` pointing to the exception stack. Early guest init then
211-
allocates the main stack at `MAIN_STACK_TOP_GVA`, switches to it,
212-
and continues generic initialization.
163+
the stack pointer pointing to the exception stack. Early guest init
164+
then allocates the main stack at `MAIN_STACK_TOP_GVA`, switches to
165+
it, and continues generic initialization. Note that exception stack
166+
overflows can be difficult to detect, since there is no guard page
167+
below the exception stack within the scratch region.
213168

214169
# Architecture-specific details of virtual memory setup
215170

0 commit comments

Comments
 (0)