|
9 | 9 | * [Build a NoSQL Database From Scratch in 1000 Lines of Code](https://medium.com/better-programming/build-a-nosql-database-from-the-scratch-in-1000-lines-of-code-8ed1c15ed924) |
10 | 10 | * [Writing a SQL database from scratch in Go: 1. SELECT, INSERT, CREATE and a REPL](https://notes.eatonphil.com/database-basics.html) |
11 | 11 |
|
| 12 | +* https://github.com/cmu-db/bustub |
| 13 | + |
12 | 14 | ## |
13 | 15 |
|
14 | 16 | ``` |
|
17 | 19 | 3. Concurrency. How to handle multiple (large number of ) clients. And transactions. |
18 | 20 | ``` |
19 | 21 |
|
20 | | -### Persistence |
21 | | -Why do we need databases? Why not dump the data directly into files |
22 | | - |
23 | | -Let’s say your process crashed middle-way while writing to a file, or you lost power, what’s |
24 | | -the state of the file? |
25 | | -• Does the file just lose the last write? |
26 | | -• Or ends up with a half-written file? |
27 | | -• Or ends up in an even more corrupted state? |
28 | | -Any outcome is possible. Your data is not guaranteed to persist on a disk when you simply |
29 | | -write to files. This is a concern of databases. And a database will recover to a usable state |
30 | | -when started after an unexpected shutdown. |
31 | | -Can we achieve persistence without using a database? There is a way: |
32 | | -1. Write the whole updated dataset to a new file. |
33 | | -2. Call fsync on the new file. |
34 | | -3. Overwrite the old file by renaming the new file to the old file, which is guaranteed |
35 | | -by the file systems to be atomic. |
36 | | -This is only acceptable when the dataset is tiny. A database like SQLite can do incremental |
37 | | -updates. |
38 | | - |
39 | | -### Indexing |
40 | | - |
41 | | -• Analytical (OLAP) queries typically involve a large amount of data, with aggregation, |
42 | | -grouping, or join operations. |
43 | | -• In contrast, transactional (OLTP) queries usually only touch a small amount of |
44 | | -indexed data. The most common types of queries are indexed point queries and |
45 | | -indexed range queries. |
46 | | - |
47 | | -Data structures that persist on a disk to look |
48 | | -up data are called “indexes” in database systems. And database indexes can be larger than |
49 | | -memory. There is a saying: if your problem fits in memory, it’s an easy problem. |
50 | | -Common data structures for indexing include B-Trees and LSM-Trees. |
51 | | - |
52 | | -1. Scan the whole data set. (No index is used). |
53 | | -2. Point query: Query the index by a specific key. |
54 | | -3. Range query: Query the index by a range. (The index is sorted). |
55 | | - |
56 | | -#### Data structure |
57 | | - |
58 | | -On-disk data structures are often used when the amounts of data are so large that |
59 | | -keeping an entire dataset in memory is impossible or not feasible. Only a fraction of |
60 | | -the data can be cached in memory at any time, and the rest has to be stored on disk in |
61 | | -a manner that allows efficiently accessing it. |
62 | | - |
63 | | -On spinning disks, seeks increase costs of random reads because they require disk |
64 | | -rotation and mechanical head movements to position the read/write head to the |
65 | | -desired location. However, once the expensive part is done, reading or writing contig‐ |
66 | | -uous bytes (i.e., sequential operations) is relatively cheap. |
67 | | -The smallest transfer unit of a spinning drive is a sector, so when some operation is |
68 | | -performed, at least an entire sector can be read or written. Sector sizes typically range |
69 | | -from 512 bytes to 4 Kb. |
70 | | -Head positioning is the most expensive part of an operation on the HDD. This is one |
71 | | -of the reasons we often hear about the positive effects of sequential I/O: reading and |
72 | | -writing contiguous memory segments from disk. |
73 | | - |
74 | | -In SSDs, we don’t have a strong emphasis on random versus sequential I/O, as in |
75 | | -HDDs, because the difference in latencies between random and sequential reads is |
76 | | -not as large. There is still some difference caused by prefetching, reading contiguous |
77 | | -pages, and internal parallelism |
78 | | - |
79 | | -Writing only full blocks, and combining subsequent writes to the same block, can |
80 | | -help to reduce the number of required I/O operations. |
81 | | - |
82 | | -In summary, on-disk structures are designed with their target storage specifics in |
83 | | -mind and generally optimize for fewer disk accesses. We can do this by improving |
84 | | -locality, optimizing the internal representation of the structure, and reducing the |
85 | | -number of out-of-page pointers. |
86 | | - |
87 | | -##### Hashtable |
88 | | -no sorting or ordering, resizing problems |
89 | | - |
90 | | -##### Binary search tree |
91 | | -Unbalanced trees have a worst-case complexity of O(N). |
92 | | -Balanced trees give us an average O(log2 N). At the same time, due to low fanout |
93 | | -(fanout is the maximum allowed number of children per node), we have to perform |
94 | | -balancing, relocate nodes, and update pointers rather frequently. Increased mainte‐ |
95 | | -nance costs make BSTs impractical as on-disk data structures |
96 | | - |
97 | | -If we wanted to maintain a BST on disk, we’d face several problems. One problem is |
98 | | -locality: since elements are added in random order, there’s no guarantee that a newly |
99 | | -created node is written close to its parent, which means that node child pointers may |
100 | | -span across several disk pages. We can improve the situation to a certain extent by |
101 | | -modifying the tree layout and using paged binary trees |
102 | | - |
103 | | -Another problem, closely related to the cost of following child pointers, is tree height. |
104 | | -Since binary trees have a fanout of just two, height is a binary logarithm of the num‐ |
105 | | -ber of the elements in the tree, and we have to perform O(log2 N) seeks to locate the |
106 | | -searched element and, subsequently, perform the same number of disk transfers. 2-3- |
107 | | -Trees and other low-fanout trees have a similar limitation: while they are useful as |
108 | | -in-memory data structures, small node size makes them impractical for external storage |
109 | | - |
110 | | -A naive on-disk BST implementation would require as many disk seeks as compari‐ |
111 | | -sons, since there’s no built-in concept of locality. |
112 | | - |
113 | | -Considering these factors, a version of the tree that would be better suited for disk |
114 | | -implementation has to exhibit the following properties: |
115 | | -• High fanout to improve locality of the neighboring keys. |
116 | | -• Low height to reduce the number of seeks during traversal. |
117 | | - |
118 | | -##### Balanced binary trees BTree |
119 | | -Queried and updated in O(log(n)) and can be range-queried. A BTree is roughly a balanced n-ary tree |
120 | | -Why use an n-ary tree instead of a binary tree => |
121 | | -1. Less space overhead |
122 | | -Every leaf node in a binary tree is reached via a pointer from a parent node, and |
123 | | -the parent node may also have a parent. On average, each leaf node requires 1~2 |
124 | | -pointers. |
125 | | -This is in contrast to B-trees, where multiple data in a leaf node share one parent. |
126 | | -And n-ary trees are also shorter. Less space is wasted on pointers. |
127 | | -2. Faster in memory. |
128 | | -Due to modern CPU memory caching and other factors, n-ary trees can be faster |
129 | | -than binary trees, even if their big-O complexity is the same. |
130 | | -3. Less disk IO. |
131 | | -• B-trees are shorter, which means fewer disk seeks. |
132 | | -• The minimum size of disk IOs is usually the size of the memory page (probably |
133 | | -4K). The operating system will fill the whole 4K page even if you read a smaller |
134 | | -size. It’s optimal if we make use of all the information in a 4K page (by choosing |
135 | | -the node size of at least one page). |
136 | | - |
137 | | -##### Log-structured merge-tree LSM-Trees |
138 | | -How to query: |
139 | | -1. An LSM-Tree contains multiple levels of data. |
140 | | -2. Each level is sorted and split into multiple files. |
141 | | -3. A point query starts at the top level, if the key is not found, the search continues to |
142 | | -the next level. |
143 | | -4. A range query merges the results from all levels, higher levels have more priority |
144 | | -when merging. |
145 | | -How to update: |
146 | | -5. When updating a key, the key is inserted into a file from the top level first. |
147 | | -6. If the file size exceeds a threshold, merge it with the next level. |
148 | | -7. The file size threshold increases exponentially with each level, which means that |
149 | | -the amount of data also increases exponentially. |
150 | | -Let’s analyze how this works. For queries: |
151 | | -1. Each level is sorted, keys can be found via binary search, and range queries are just |
152 | | -sequential file IO. It’s efficient. |
153 | | -For updates: |
154 | | -2. The top-level file size is small, so inserting into the top level requires only a small |
155 | | -amount of IO. |
156 | | -3. Data is eventually merged to a lower level. Merging is sequential IO, which is an |
157 | | -advantage. |
158 | | -4. Higher levels trigger merging more often, but the merge is also smaller. |
159 | | -5. When merging a file into a lower level, any lower files whose range intersects are |
160 | | -replaced by the merged results (which can be multiple files). We can see why levels |
161 | | -are split into multiple files — to reduce the size of the merge. |
162 | | -6. Merging can be done in the background. However, low-level merging can suddenly |
163 | | -cause high IO usage, which can degrade system performance. |
| 22 | +## Vector Db |
| 23 | + |
| 24 | +* https://github.com/skyzh/write-you-a-vector-db |
| 25 | + |
| 26 | +## Search |
| 27 | + |
| 28 | +* [The Art of Searching](https://www.youtube.com/watch?v=yst6VQ7Lwpo) |
| 29 | +* [Algorithms & data-structures that power Lucene & ElasticSearch](https://www.youtube.com/watch?v=eQ-rXP-D80U) |
| 30 | + |
| 31 | +* [How do Spell Checkers work? Levenshtein Edit Distance](https://www.youtube.com/watch?v=Cu7Tl7FGigQ) |
| 32 | +* [The Algorithm Behind Spell Checkers](https://www.youtube.com/watch?v=d-Eq6x1yssU) |
| 33 | + |
| 34 | +## CRDT |
| 35 | + |
| 36 | +* [Collaborative Text Editing with Eg-Walker](https://www.youtube.com/watch?v=rjbEG7COj7o) |
| 37 | +* [Text CRDTs from scratch, in code!](https://www.youtube.com/watch?v=_lQ2Q4Kzi1I) |
| 38 | +* [Lets write Eg-walker from scratch! Part 1](https://www.youtube.com/watch?v=ggXka5TTsOs) |
| 39 | + |
| 40 | +* [CRDTs: The Hard Parts](https://www.youtube.com/watch?v=x7drE24geUw) |
| 41 | +* [CRDTs and the Quest for Distributed Consistency](https://www.youtube.com/watch?v=B5NULPSiOGw) |
| 42 | +* [A CRDT Primer: Defanging Order Theory](https://www.youtube.com/watch?v=OOlnp2bZVRs) |
| 43 | +* [Conflict-Free Replicated Data Types (CRDT) for Distributed JavaScript Apps.](https://www.youtube.com/watch?v=M8-WFTjZoA0) |
| 44 | + |
| 45 | +* [Loro Is Local-First State With CRDT](https://www.youtube.com/watch?v=NB7HRfyufLk) |
| 46 | +* [How Yjs works from the inside out](https://www.youtube.com/watch?v=0l5XgnQ6rB4) |
| 47 | + |
| 48 | +## HyperLogLog |
| 49 | + |
| 50 | +* [PapersWeLove : HyperLogLog](https://www.youtube.com/watch?v=y3fTaxA8PkU) |
| 51 | +* [A problem so hard even Google relies on Random Chance](https://www.youtube.com/watch?v=lJYufx0bfpw) |
| 52 | +* [The Algorithm with the Best Name - HyperLogLog Explained](https://www.youtube.com/watch?v=2PlrMCiUN_s) |
| 53 | + |
| 54 | +## LSM-Tree |
| 55 | + |
| 56 | +[#04 - Database Storage: Log-Structured Merge Trees & Tuples (CMU Intro to Database Systems)](https://www.youtube.com/watch?v=IHtVWGhG0Xg&t=1372s) |
| 57 | + |
| 58 | +https://github.com/facebook/rocksdb/wiki |
| 59 | + |
| 60 | +https://github.com/krasun/lsmtree |
| 61 | +https://github.com/skyzh/mini-lsm |
| 62 | + |
| 63 | +## OLAP |
| 64 | + |
| 65 | +* https://github.com/risinglightdb/risinglight |
0 commit comments