Skip to content

Commit afd1a29

Browse files
committed
Refactor
1 parent b5d6513 commit afd1a29

84 files changed

Lines changed: 708 additions & 7504 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.tool-versions

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,3 @@ golang 1.24.0
22
zig 0.13.0
33
bun 1.2.2
44
buf 1.50.0
5-
wrk 4.2.0

Readme.md

Lines changed: 46 additions & 144 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
* [Build a NoSQL Database From Scratch in 1000 Lines of Code](https://medium.com/better-programming/build-a-nosql-database-from-the-scratch-in-1000-lines-of-code-8ed1c15ed924)
1010
* [Writing a SQL database from scratch in Go: 1. SELECT, INSERT, CREATE and a REPL](https://notes.eatonphil.com/database-basics.html)
1111

12+
* https://github.com/cmu-db/bustub
13+
1214
##
1315

1416
```
@@ -17,147 +19,47 @@
1719
3. Concurrency. How to handle multiple (large number of ) clients. And transactions.
1820
```
1921

20-
### Persistence
21-
Why do we need databases? Why not dump the data directly into files
22-
23-
Let’s say your process crashed middle-way while writing to a file, or you lost power, what’s
24-
the state of the file?
25-
• Does the file just lose the last write?
26-
• Or ends up with a half-written file?
27-
• Or ends up in an even more corrupted state?
28-
Any outcome is possible. Your data is not guaranteed to persist on a disk when you simply
29-
write to files. This is a concern of databases. And a database will recover to a usable state
30-
when started after an unexpected shutdown.
31-
Can we achieve persistence without using a database? There is a way:
32-
1. Write the whole updated dataset to a new file.
33-
2. Call fsync on the new file.
34-
3. Overwrite the old file by renaming the new file to the old file, which is guaranteed
35-
by the file systems to be atomic.
36-
This is only acceptable when the dataset is tiny. A database like SQLite can do incremental
37-
updates.
38-
39-
### Indexing
40-
41-
• Analytical (OLAP) queries typically involve a large amount of data, with aggregation,
42-
grouping, or join operations.
43-
• In contrast, transactional (OLTP) queries usually only touch a small amount of
44-
indexed data. The most common types of queries are indexed point queries and
45-
indexed range queries.
46-
47-
Data structures that persist on a disk to look
48-
up data are called “indexes” in database systems. And database indexes can be larger than
49-
memory. There is a saying: if your problem fits in memory, it’s an easy problem.
50-
Common data structures for indexing include B-Trees and LSM-Trees.
51-
52-
1. Scan the whole data set. (No index is used).
53-
2. Point query: Query the index by a specific key.
54-
3. Range query: Query the index by a range. (The index is sorted).
55-
56-
#### Data structure
57-
58-
On-disk data structures are often used when the amounts of data are so large that
59-
keeping an entire dataset in memory is impossible or not feasible. Only a fraction of
60-
the data can be cached in memory at any time, and the rest has to be stored on disk in
61-
a manner that allows efficiently accessing it.
62-
63-
On spinning disks, seeks increase costs of random reads because they require disk
64-
rotation and mechanical head movements to position the read/write head to the
65-
desired location. However, once the expensive part is done, reading or writing contig‐
66-
uous bytes (i.e., sequential operations) is relatively cheap.
67-
The smallest transfer unit of a spinning drive is a sector, so when some operation is
68-
performed, at least an entire sector can be read or written. Sector sizes typically range
69-
from 512 bytes to 4 Kb.
70-
Head positioning is the most expensive part of an operation on the HDD. This is one
71-
of the reasons we often hear about the positive effects of sequential I/O: reading and
72-
writing contiguous memory segments from disk.
73-
74-
In SSDs, we don’t have a strong emphasis on random versus sequential I/O, as in
75-
HDDs, because the difference in latencies between random and sequential reads is
76-
not as large. There is still some difference caused by prefetching, reading contiguous
77-
pages, and internal parallelism
78-
79-
Writing only full blocks, and combining subsequent writes to the same block, can
80-
help to reduce the number of required I/O operations.
81-
82-
In summary, on-disk structures are designed with their target storage specifics in
83-
mind and generally optimize for fewer disk accesses. We can do this by improving
84-
locality, optimizing the internal representation of the structure, and reducing the
85-
number of out-of-page pointers.
86-
87-
##### Hashtable
88-
no sorting or ordering, resizing problems
89-
90-
##### Binary search tree
91-
Unbalanced trees have a worst-case complexity of O(N).
92-
Balanced trees give us an average O(log2 N). At the same time, due to low fanout
93-
(fanout is the maximum allowed number of children per node), we have to perform
94-
balancing, relocate nodes, and update pointers rather frequently. Increased mainte‐
95-
nance costs make BSTs impractical as on-disk data structures
96-
97-
If we wanted to maintain a BST on disk, we’d face several problems. One problem is
98-
locality: since elements are added in random order, there’s no guarantee that a newly
99-
created node is written close to its parent, which means that node child pointers may
100-
span across several disk pages. We can improve the situation to a certain extent by
101-
modifying the tree layout and using paged binary trees
102-
103-
Another problem, closely related to the cost of following child pointers, is tree height.
104-
Since binary trees have a fanout of just two, height is a binary logarithm of the num‐
105-
ber of the elements in the tree, and we have to perform O(log2 N) seeks to locate the
106-
searched element and, subsequently, perform the same number of disk transfers. 2-3-
107-
Trees and other low-fanout trees have a similar limitation: while they are useful as
108-
in-memory data structures, small node size makes them impractical for external storage
109-
110-
A naive on-disk BST implementation would require as many disk seeks as compari‐
111-
sons, since there’s no built-in concept of locality.
112-
113-
Considering these factors, a version of the tree that would be better suited for disk
114-
implementation has to exhibit the following properties:
115-
• High fanout to improve locality of the neighboring keys.
116-
• Low height to reduce the number of seeks during traversal.
117-
118-
##### Balanced binary trees BTree
119-
Queried and updated in O(log(n)) and can be range-queried. A BTree is roughly a balanced n-ary tree
120-
Why use an n-ary tree instead of a binary tree =>
121-
1. Less space overhead
122-
Every leaf node in a binary tree is reached via a pointer from a parent node, and
123-
the parent node may also have a parent. On average, each leaf node requires 1~2
124-
pointers.
125-
This is in contrast to B-trees, where multiple data in a leaf node share one parent.
126-
And n-ary trees are also shorter. Less space is wasted on pointers.
127-
2. Faster in memory.
128-
Due to modern CPU memory caching and other factors, n-ary trees can be faster
129-
than binary trees, even if their big-O complexity is the same.
130-
3. Less disk IO.
131-
• B-trees are shorter, which means fewer disk seeks.
132-
• The minimum size of disk IOs is usually the size of the memory page (probably
133-
4K). The operating system will fill the whole 4K page even if you read a smaller
134-
size. It’s optimal if we make use of all the information in a 4K page (by choosing
135-
the node size of at least one page).
136-
137-
##### Log-structured merge-tree LSM-Trees
138-
How to query:
139-
1. An LSM-Tree contains multiple levels of data.
140-
2. Each level is sorted and split into multiple files.
141-
3. A point query starts at the top level, if the key is not found, the search continues to
142-
the next level.
143-
4. A range query merges the results from all levels, higher levels have more priority
144-
when merging.
145-
How to update:
146-
5. When updating a key, the key is inserted into a file from the top level first.
147-
6. If the file size exceeds a threshold, merge it with the next level.
148-
7. The file size threshold increases exponentially with each level, which means that
149-
the amount of data also increases exponentially.
150-
Let’s analyze how this works. For queries:
151-
1. Each level is sorted, keys can be found via binary search, and range queries are just
152-
sequential file IO. It’s efficient.
153-
For updates:
154-
2. The top-level file size is small, so inserting into the top level requires only a small
155-
amount of IO.
156-
3. Data is eventually merged to a lower level. Merging is sequential IO, which is an
157-
advantage.
158-
4. Higher levels trigger merging more often, but the merge is also smaller.
159-
5. When merging a file into a lower level, any lower files whose range intersects are
160-
replaced by the merged results (which can be multiple files). We can see why levels
161-
are split into multiple files — to reduce the size of the merge.
162-
6. Merging can be done in the background. However, low-level merging can suddenly
163-
cause high IO usage, which can degrade system performance.
22+
## Vector Db
23+
24+
* https://github.com/skyzh/write-you-a-vector-db
25+
26+
## Search
27+
28+
* [The Art of Searching](https://www.youtube.com/watch?v=yst6VQ7Lwpo)
29+
* [Algorithms & data-structures that power Lucene & ElasticSearch](https://www.youtube.com/watch?v=eQ-rXP-D80U)
30+
31+
* [How do Spell Checkers work? Levenshtein Edit Distance](https://www.youtube.com/watch?v=Cu7Tl7FGigQ)
32+
* [The Algorithm Behind Spell Checkers](https://www.youtube.com/watch?v=d-Eq6x1yssU)
33+
34+
## CRDT
35+
36+
* [Collaborative Text Editing with Eg-Walker](https://www.youtube.com/watch?v=rjbEG7COj7o)
37+
* [Text CRDTs from scratch, in code!](https://www.youtube.com/watch?v=_lQ2Q4Kzi1I)
38+
* [Lets write Eg-walker from scratch! Part 1](https://www.youtube.com/watch?v=ggXka5TTsOs)
39+
40+
* [CRDTs: The Hard Parts](https://www.youtube.com/watch?v=x7drE24geUw)
41+
* [CRDTs and the Quest for Distributed Consistency](https://www.youtube.com/watch?v=B5NULPSiOGw)
42+
* [A CRDT Primer: Defanging Order Theory](https://www.youtube.com/watch?v=OOlnp2bZVRs)
43+
* [Conflict-Free Replicated Data Types (CRDT) for Distributed JavaScript Apps.](https://www.youtube.com/watch?v=M8-WFTjZoA0)
44+
45+
* [Loro Is Local-First State With CRDT](https://www.youtube.com/watch?v=NB7HRfyufLk)
46+
* [How Yjs works from the inside out](https://www.youtube.com/watch?v=0l5XgnQ6rB4)
47+
48+
## HyperLogLog
49+
50+
* [PapersWeLove : HyperLogLog](https://www.youtube.com/watch?v=y3fTaxA8PkU)
51+
* [A problem so hard even Google relies on Random Chance](https://www.youtube.com/watch?v=lJYufx0bfpw)
52+
* [The Algorithm with the Best Name - HyperLogLog Explained](https://www.youtube.com/watch?v=2PlrMCiUN_s)
53+
54+
## LSM-Tree
55+
56+
[#04 - Database Storage: Log-Structured Merge Trees & Tuples (CMU Intro to Database Systems)](https://www.youtube.com/watch?v=IHtVWGhG0Xg&t=1372s)
57+
58+
https://github.com/facebook/rocksdb/wiki
59+
60+
https://github.com/krasun/lsmtree
61+
https://github.com/skyzh/mini-lsm
62+
63+
## OLAP
64+
65+
* https://github.com/risinglightdb/risinglight

0 commit comments

Comments
 (0)