Skip to content

Commit 210c544

Browse files
committed
Docs
1 parent 3a41f4a commit 210c544

3 files changed

Lines changed: 131 additions & 0 deletions

File tree

secretary/Todo.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
* Delete key, if deletes node, keep deleted node in array for removal from disk
2+
*

secretary/bufferpool_pager.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
Buffer Pool in Databases
2+
3+
A buffer pool is a memory management component in a database system that caches frequently accessed data pages in RAM. It helps reduce disk I/O by keeping recently used or frequently needed data in memory, improving query performance.
4+
5+
How It Works:
6+
1. When a query needs a page from the database, the database engine first checks the buffer pool.
7+
2. If the page is in memory (cache hit), it is retrieved quickly.
8+
3. If the page is not in memory (cache miss), it is read from disk and placed in the buffer pool.
9+
4. If the buffer pool is full, an existing page is evicted using a replacement policy (e.g., LRU - Least Recently Used).
10+
5. Modified pages (dirty pages) are periodically written back to disk (checkpointing or background flushing).
11+
12+
Buffer Pool Advantages:
13+
• Minimizes disk I/O by keeping frequently accessed pages in RAM.
14+
• Speeds up query execution by reducing the need for slow disk reads.
15+
• Manages concurrency efficiently by allowing multiple transactions to work on cached pages.
16+
17+
What is a Pager?
18+
19+
A pager is a low-level component responsible for reading and writing fixed-size pages to and from storage (disk, SSD, or memory). It acts as an abstraction layer between the storage system and higher-level database structures.
20+
21+
Responsibilities of a Pager:
22+
1. Reading Pages: When a page is requested, the pager loads it from disk (if not already in memory).
23+
2. Writing Pages: When pages are modified, the pager ensures they are written back to disk properly.
24+
3. Page Allocation & Freeing: It manages free pages and allocates new pages as needed.
25+
4. Crash Recovery: Works with journaling or WAL (Write-Ahead Logging) to ensure data consistency.
26+
5. Interacting with the Buffer Pool: The pager fetches pages into the buffer pool and evicts them when necessary.
27+
28+
Pager vs Buffer Pool:
29+
30+
Feature Buffer Pool Pager
31+
Purpose Caches pages in RAM for faster access Manages storage and handles disk I/O
32+
Memory Usage Uses RAM Uses disk (persistent storage)
33+
Performance Improves speed by reducing disk access Ensures data persistence
34+
Scope Works at the RAM level Works at the storage level
35+
36+
Example Use Cases:
37+
• SQLite: The pager component manages pages on disk, while the buffer pool (cache) keeps frequently accessed pages in RAM.
38+
• MySQL InnoDB: Uses a buffer pool to store frequently accessed table data and index pages while the storage layer writes pages to disk.
39+
40+
Would you like more details on a specific database system’s buffer pool or pager? 🚀
41+
42+
43+
44+
Buffer Pool and Pager in PostgreSQL
45+
46+
1. PostgreSQL Buffer Pool (Shared Buffers)
47+
48+
PostgreSQL’s buffer pool is called shared buffers, and it plays a crucial role in caching database pages in RAM to reduce disk I/O.
49+
50+
How It Works:
51+
1. When a query requests a page, PostgreSQL first checks shared buffers.
52+
2. If the page is found (cache hit), it is retrieved quickly from memory.
53+
3. If not found (cache miss), the pager loads the page from disk into shared buffers.
54+
4. If shared buffers are full, PostgreSQL uses clock-sweep (similar to LRU) to evict pages.
55+
5. Modified pages (dirty pages) are written back to disk periodically.
56+
57+
Tuning Buffer Pool in PostgreSQL:
58+
• Configured via shared_buffers (default is 128MB, but often set to 25-40% of total RAM).
59+
• Larger values reduce disk reads but consume more memory.
60+
61+
2. PostgreSQL Pager (Storage Manager)
62+
63+
The pager in PostgreSQL is responsible for reading and writing 8KB pages between disk and shared buffers.
64+
65+
Pager’s Responsibilities:
66+
• Fetching Pages: Reads 8KB pages from disk when needed.
67+
• Writing Pages: Modified (dirty) pages are written back to disk.
68+
• Managing Free Space: Keeps track of allocated and free pages.
69+
• Ensuring Consistency: Works with WAL (Write-Ahead Logging) for crash recovery.
70+
71+
Page Flow in PostgreSQL:
72+
1. A query requests data → PostgreSQL checks shared buffers.
73+
2. If the page is missing, the pager reads it from disk into shared buffers.
74+
3. If a page is modified, it becomes dirty and is scheduled for writing back to disk.
75+
4. PostgreSQL periodically flushes dirty pages using background writer.
76+
5. WAL ensures durability before dirty pages are written back.
77+
78+
Key Differences: Buffer Pool vs Pager
79+
80+
Feature Buffer Pool (Shared Buffers) Pager (Storage Manager)
81+
Purpose Caches frequently accessed pages in RAM Reads/writes pages between disk and buffer pool
82+
Speed Fast (stored in RAM) Slow (disk-based)
83+
Replacement Uses clock-sweep algorithm Manages file I/O
84+
Writes to Disk Uses background writer & checkpoints Ensures durability via WAL
85+
86+
3. Checkpointing & WAL (Write-Ahead Logging)
87+
• Checkpointing: Periodically flushes dirty pages from the buffer pool to disk.
88+
• WAL: Ensures modifications are logged before writing pages, enabling crash recovery.
89+
90+
Would you like more details on PostgreSQL’s shared buffers, page structure, or performance tuning? 🚀
91+
92+
93+
94+
95+
What is Clock Sweep?
96+
97+
Clock Sweep is a page replacement algorithm used in PostgreSQL to manage its buffer pool (shared buffers) efficiently. It is a variant of the Least Recently Used (LRU) algorithm but avoids the performance overhead of strict LRU tracking.
98+
99+
How Clock Sweep Works:
100+
1. PostgreSQL maintains a circular buffer (clock-like structure) where each page has a usage count (similar to a reference bit).
101+
2. A hand (pointer) sweeps through pages in a circular manner.
102+
3. When a page needs to be evicted:
103+
• If usage count > 0, it is decremented (page gets a second chance).
104+
• If usage count = 0, the page is evicted and replaced with a new page.
105+
4. If the buffer pool is full, the hand keeps sweeping until it finds a page with a usage count = 0.
106+
107+
Advantages of Clock Sweep:
108+
109+
✅ Less Overhead – Unlike strict LRU, it doesn’t require frequent sorting of pages.
110+
✅ Adaptive – Popular pages are given multiple chances before eviction.
111+
✅ Efficient – Simple to implement and works well for PostgreSQL’s workload.
112+
113+
Clock Sweep vs LRU (Least Recently Used)
114+
115+
Feature Clock Sweep (PostgreSQL) LRU (Traditional)
116+
Tracking Uses a circular buffer with a usage count Maintains a linked list of pages (most recent → least recent)
117+
Eviction Strategy Sweeps through pages, decrementing usage count until a 0-count page is found Evicts the least recently used page directly
118+
Complexity O(1) per operation (efficient) O(N) or O(log N) (needs list updates or heap operations)
119+
Overhead Low High (requires list updates and sorting)
120+
Performance Works well for large datasets with minimal tracking overhead Works well but can slow down under heavy workloads
121+
122+
Why PostgreSQL Uses Clock Sweep Instead of LRU?
123+
• LRU has high overhead – maintaining a perfect LRU list requires frequent updates.
124+
• Clock Sweep is lightweight – it provides an approximate LRU with much lower cost.
125+
• Efficient for databases – PostgreSQL can manage millions of pages without excessive bookkeeping.
126+
127+
Would you like a deeper dive into PostgreSQL’s buffer eviction process or how to tune shared_buffers? 🚀

secretary/kademilia.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# Kademilia
22

3+
* https://kelseyc18.github.io/kademlia_vis//basics/1/
4+
35
* [IPFS](https://research.protocol.ai/publications/ipfs-content-addressed-versioned-p2p-file-system/benet2014.pdf)
46
* [Kademilia Paper](https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf)
57
* [Distributed Hash Tables with Kademlia](https://codethechange.stanford.edu/guides/guide_kademlia.html#supporting-dynamic-leaves-and-joins)

0 commit comments

Comments
 (0)