@@ -7,46 +7,15 @@ design in which the guest is aware of a readonly snapshot from
77which it is being run, and manages its own copy-on-write.
88
99Because of this, there are two very fundamental regions of the guest
10- physical address space, which are always populated: one, near the
11- bottom of memory (starting at GPA ` 0x1000 ` ), is a
12- (hypervisor-enforced) readonly mapping of the base snapshot from which
13- this guest is being evolved. Another, at the top of memory, is simply
10+ physical address space, which are always populated: one, at the very
11+ bottom of memory, is a (hypervisor-enforced) readonly mapping of the
12+ base snapshot from which this guest is being evolved. Another, at the top of memory, is simply
1413a large bag of blank pages: scratch memory into which this VM can
1514write.
1615
17- ```
18- Guest Physical Address Space (GPA)
19-
20- +-------------------------------+ MAX_GPA
21- | Exn Stack, Bookkeeping |
22- | (scratch size, allocator |
23- | state, reserved PT slot) |
24- +-------------------------------+
25- | Free Scratch Memory |
26- +-------------------------------+
27- | Output Data |
28- +-------------------------------+
29- | Input Data |
30- +-------------------------------+
31- | |
32- | (unmapped — no RAM |
33- | backing these addrs) |
34- | |
35- +-------------------------------+
36- | |
37- | Snapshot (RO / CoW on write) |
38- | Guest Page Tables |
39- | Init Data |
40- | Guest Heap |
41- | PEB |
42- | Guest Binary |
43- | |
44- +-------------------------------+ 0x1000
45- | (null guard page) |
46- +-------------------------------+ 0x0000
47- ```
48-
49-
16+ For the detailed layout of each region, including field offsets, see
17+ the diagrams and comments in [ ` src/hyperlight_host/src/mem/layout.rs ` ] ( ../src/hyperlight_host/src/mem/layout.rs )
18+ and the constants in [ ` hyperlight_common::layout ` ] ( ../src/hyperlight_common/src/layout.rs ) .
5019
5120## The scratch map
5221
@@ -92,63 +61,56 @@ original virtual address to point to the new page.
9261 Snapshot page at GPA 0x5000 is untouched.
9362```
9463
95- The page table
96- entries to do this will likely need to be copied themselves, and so a
64+ The page table entries to do this will likely need to be copied themselves, and so a
9765ready supply of already-mapped scratch pages to use for replacement
98- page tables is set up by the Host. The guest keeps a mapping of the entire scratch
99- physical memory into virtual memory at a fixed offset
100- (` scratch_base_gva - scratch_base_gpa ` ), so that any scratch physical
101- address can be accessed by adding this offset.
66+ page tables is needed. Currently, the guest accomplishes this by
67+ keeping an identity mapping of the entire scratch memory around.
10268
10369The host and the guest need to agree on the location of this mapping,
10470so that (a) the host can create it when first setting up a blank guest
10571and (b) the host can ignore it when taking a snapshot (see below).
10672
107- The host creates the scratch map at the top of virtual memory
108- (` MAX_GVA - scratch_size + 1 ` ) and at the top of physical memory
109- (` MAX_GPA - scratch_size + 1 ` ). In the future, we may add support for a guest to
73+ Currently, the host always creates the scratch map at the top of
74+ virtual memory. In the future, we may add support for a guest to
11075request that it be moved.
11176
11277## The snapshot mapping
11378
11479The snapshot page tables must be mapped at some virtual address so
115- that the guest can read and copy them during CoW operations.
116- Today, the host simply
117- copies the page tables into scratch when restoring a sandbox, and the
118- guest works on those scratch copies directly.
80+ that the guest can read and copy them during CoW operations. The
81+ preferred approach is to map the snapshot page tables directly from
82+ the snapshot region into the guest's virtual address space.
11983
120- ## Top-of-scratch metadata layout
84+ However, on amd64, this is complicated by architectural constraints.
85+ Currently, the host simply copies the page tables into scratch when
86+ restoring a sandbox, and the guest works on those scratch copies
87+ directly. In the near future, we expect to be able to use the
88+ preferred approach on aarch64, and with some minor hypervisor changes,
89+ on amd64 as well.
12190
122- The top page of the scratch region contains structured metadata at
123- fixed offsets down from the top:
124-
125- | Offset from top | Field |
126- | -----------------| --------------------------------------|
127- | ` 0x08 ` | Scratch size (` u64 ` ) |
128- | ` 0x10 ` | Allocator state (` u64 ` ) |
129- | ` 0x18 ` | Reserved snapshot PT base (` u64 ` ) |
130- | ` 0x20 ` | Exception stack starts here |
91+ ## Top-of-scratch metadata layout
13192
93+ The top of the scratch region contains structured metadata at fixed
94+ offsets such as the scratch size, allocator state and where the excpetions starts.
13295These offsets are defined as ` SCRATCH_TOP_* ` constants in
133- ` hyperlight_common::layout ` .
96+ [ ` hyperlight_common::layout ` ] ( ../src/hyperlight_common/src/layout.rs ) , which has detailed comments on each
97+ field.
13498
13599## The physical page allocator
136100
137101The host needs to be able to reset the state of the physical page
138- allocator when resuming from a snapshot. We use a simple bump
139- allocator as a physical page allocator, with no support for free,
102+ allocator when resuming from a snapshot. Currently, we use a simple
103+ bump allocator as a physical page allocator, with no support for free,
140104since pages not in use will automatically be omitted from a snapshot.
141105The allocator state is a single ` u64 ` tracking the address of the
142- first free page, located at offset ` 0x10 ` from the top of scratch
143- (see layout above). The guest advances it atomically via ` lock xadd ` .
106+ first free page, located below the metadata at the top of scratch.
107+ The guest advances it atomically.
144108
145109## The guest exception stack
146110
147111Similarly, the guest needs a stack that is always writable, in order
148- to be able to take exceptions to it. The exception stack begins at
149- offset ` 0x20 ` from the top of the scratch region (below the metadata
150- fields described above) and grows downward through the remainder of
151- the top page.
112+ to be able to take exceptions to it. The exception stack begins below
113+ the metadata at the top of the scratch region and grows downward.
152114
153115## Taking a snapshot
154116
@@ -177,26 +139,17 @@ calls, i.e. there may be no calls in flight at the time of
177139snapshotting. This is not enforced, but odd things may happen if it is
178140violated.
179141
180- I/O buffers are statically allocated at the bottom of the scratch
181- region:
142+ Buffer management between the host and guest is needed to pass call
143+ arguments and return values. Ideally, buffers would be dynamically
144+ allocated from the scratch region as needed.
182145
183- ```
184- +-------------------------------------------+ (top of scratch)
185- | Exn Stack, Bookkeeping |
186- | (scratch size, allocator state, |
187- | reserved PT base) |
188- +-------------------------------------------+
189- | Free Scratch Memory |
190- +-------------------------------------------+
191- | Output Data |
192- +-------------------------------------------+
193- | Input Data |
194- +-------------------------------------------+ (scratch base)
195- ```
146+ Currently, I/O buffers are statically allocated at the bottom of the
147+ scratch region. This is a stopgap pending improved
148+ physical allocation and buffer management.
196149
197- The minimum scratch size ( ` min_scratch_size() ` ) accounts for these
198- buffers plus overhead for the Task State Segment (TSS), Interrupt Descriptor Table (IDT), page table CoW, a minimal
199- non-exception stack, and the exception stack and metadata .
150+ The minimum scratch size is calculated by ` min_scratch_size() ` in the
151+ architecture-specific layout modules under ` hyperlight_common ` ; see
152+ that function for the detailed breakdown of required overhead .
200153
201154## Creating a fresh guest
202155
@@ -207,9 +160,11 @@ virtual memory. If the ELF has segments whose virtual addresses
207160overlap with the scratch map, an error will be returned.
208161
209162In the current startup path, the host enters the guest with
210- ` RSP ` pointing to the exception stack. Early guest init then
211- allocates the main stack at ` MAIN_STACK_TOP_GVA ` , switches to it,
212- and continues generic initialization.
163+ the stack pointer pointing to the exception stack. Early guest init
164+ then allocates the main stack at ` MAIN_STACK_TOP_GVA ` , switches to
165+ it, and continues generic initialization. Note that exception stack
166+ overflows can be difficult to detect, since there is no guard page
167+ below the exception stack within the scratch region.
213168
214169# Architecture-specific details of virtual memory setup
215170
0 commit comments