Docs

CodeHariK · CodeHariK · commit 210c54400fe6 · 2025-03-05T14:43:21.000+05:30
diff --git a/secretary/Todo.md b/secretary/Todo.md
@@ -0,0 +1,2 @@
+* Delete key, if deletes node, keep deleted node in array for removal from disk
+* 
diff --git a/secretary/bufferpool_pager.md b/secretary/bufferpool_pager.md
@@ -0,0 +1,127 @@
+Buffer Pool in Databases
+
+A buffer pool is a memory management component in a database system that caches frequently accessed data pages in RAM. It helps reduce disk I/O by keeping recently used or frequently needed data in memory, improving query performance.
+
+How It Works:
+	1.	When a query needs a page from the database, the database engine first checks the buffer pool.
+	2.	If the page is in memory (cache hit), it is retrieved quickly.
+	3.	If the page is not in memory (cache miss), it is read from disk and placed in the buffer pool.
+	4.	If the buffer pool is full, an existing page is evicted using a replacement policy (e.g., LRU - Least Recently Used).
+	5.	Modified pages (dirty pages) are periodically written back to disk (checkpointing or background flushing).
+
+Buffer Pool Advantages:
+	•	Minimizes disk I/O by keeping frequently accessed pages in RAM.
+	•	Speeds up query execution by reducing the need for slow disk reads.
+	•	Manages concurrency efficiently by allowing multiple transactions to work on cached pages.
+
+What is a Pager?
+
+A pager is a low-level component responsible for reading and writing fixed-size pages to and from storage (disk, SSD, or memory). It acts as an abstraction layer between the storage system and higher-level database structures.
+
+Responsibilities of a Pager:
+	1.	Reading Pages: When a page is requested, the pager loads it from disk (if not already in memory).
+	2.	Writing Pages: When pages are modified, the pager ensures they are written back to disk properly.
+	3.	Page Allocation & Freeing: It manages free pages and allocates new pages as needed.
+	4.	Crash Recovery: Works with journaling or WAL (Write-Ahead Logging) to ensure data consistency.
+	5.	Interacting with the Buffer Pool: The pager fetches pages into the buffer pool and evicts them when necessary.
+
+Pager vs Buffer Pool:
+
+Feature	Buffer Pool	Pager
+Purpose	Caches pages in RAM for faster access	Manages storage and handles disk I/O
+Memory Usage	Uses RAM	Uses disk (persistent storage)
+Performance	Improves speed by reducing disk access	Ensures data persistence
+Scope	Works at the RAM level	Works at the storage level
+
+Example Use Cases:
+	•	SQLite: The pager component manages pages on disk, while the buffer pool (cache) keeps frequently accessed pages in RAM.
+	•	MySQL InnoDB: Uses a buffer pool to store frequently accessed table data and index pages while the storage layer writes pages to disk.
+
+Would you like more details on a specific database system’s buffer pool or pager? 🚀
+
+
+
+Buffer Pool and Pager in PostgreSQL
+
+1. PostgreSQL Buffer Pool (Shared Buffers)
+
+PostgreSQL’s buffer pool is called shared buffers, and it plays a crucial role in caching database pages in RAM to reduce disk I/O.
+
+How It Works:
+	1.	When a query requests a page, PostgreSQL first checks shared buffers.
+	2.	If the page is found (cache hit), it is retrieved quickly from memory.
+	3.	If not found (cache miss), the pager loads the page from disk into shared buffers.
+	4.	If shared buffers are full, PostgreSQL uses clock-sweep (similar to LRU) to evict pages.
+	5.	Modified pages (dirty pages) are written back to disk periodically.
+
+Tuning Buffer Pool in PostgreSQL:
+	•	Configured via shared_buffers (default is 128MB, but often set to 25-40% of total RAM).
+	•	Larger values reduce disk reads but consume more memory.
+
+2. PostgreSQL Pager (Storage Manager)
+
+The pager in PostgreSQL is responsible for reading and writing 8KB pages between disk and shared buffers.
+
+Pager’s Responsibilities:
+	•	Fetching Pages: Reads 8KB pages from disk when needed.
+	•	Writing Pages: Modified (dirty) pages are written back to disk.
+	•	Managing Free Space: Keeps track of allocated and free pages.
+	•	Ensuring Consistency: Works with WAL (Write-Ahead Logging) for crash recovery.
+
+Page Flow in PostgreSQL:
+	1.	A query requests data → PostgreSQL checks shared buffers.
+	2.	If the page is missing, the pager reads it from disk into shared buffers.
+	3.	If a page is modified, it becomes dirty and is scheduled for writing back to disk.
+	4.	PostgreSQL periodically flushes dirty pages using background writer.
+	5.	WAL ensures durability before dirty pages are written back.
+
+Key Differences: Buffer Pool vs Pager
+
+Feature	Buffer Pool (Shared Buffers)	Pager (Storage Manager)
+Purpose	Caches frequently accessed pages in RAM	Reads/writes pages between disk and buffer pool
+Speed	Fast (stored in RAM)	Slow (disk-based)
+Replacement	Uses clock-sweep algorithm	Manages file I/O
+Writes to Disk	Uses background writer & checkpoints	Ensures durability via WAL
+
+3. Checkpointing & WAL (Write-Ahead Logging)
+	•	Checkpointing: Periodically flushes dirty pages from the buffer pool to disk.
+	•	WAL: Ensures modifications are logged before writing pages, enabling crash recovery.
+
+Would you like more details on PostgreSQL’s shared buffers, page structure, or performance tuning? 🚀
+
+
+
+
+What is Clock Sweep?
+
+Clock Sweep is a page replacement algorithm used in PostgreSQL to manage its buffer pool (shared buffers) efficiently. It is a variant of the Least Recently Used (LRU) algorithm but avoids the performance overhead of strict LRU tracking.
+
+How Clock Sweep Works:
+	1.	PostgreSQL maintains a circular buffer (clock-like structure) where each page has a usage count (similar to a reference bit).
+	2.	A hand (pointer) sweeps through pages in a circular manner.
+	3.	When a page needs to be evicted:
+	•	If usage count > 0, it is decremented (page gets a second chance).
+	•	If usage count = 0, the page is evicted and replaced with a new page.
+	4.	If the buffer pool is full, the hand keeps sweeping until it finds a page with a usage count = 0.
+
+Advantages of Clock Sweep:
+
+✅ Less Overhead – Unlike strict LRU, it doesn’t require frequent sorting of pages.
+✅ Adaptive – Popular pages are given multiple chances before eviction.
+✅ Efficient – Simple to implement and works well for PostgreSQL’s workload.
+
+Clock Sweep vs LRU (Least Recently Used)
+
+Feature	Clock Sweep (PostgreSQL)	LRU (Traditional)
+Tracking	Uses a circular buffer with a usage count	Maintains a linked list of pages (most recent → least recent)
+Eviction Strategy	Sweeps through pages, decrementing usage count until a 0-count page is found	Evicts the least recently used page directly
+Complexity	O(1) per operation (efficient)	O(N) or O(log N) (needs list updates or heap operations)
+Overhead	Low	High (requires list updates and sorting)
+Performance	Works well for large datasets with minimal tracking overhead	Works well but can slow down under heavy workloads
+
+Why PostgreSQL Uses Clock Sweep Instead of LRU?
+	•	LRU has high overhead – maintaining a perfect LRU list requires frequent updates.
+	•	Clock Sweep is lightweight – it provides an approximate LRU with much lower cost.
+	•	Efficient for databases – PostgreSQL can manage millions of pages without excessive bookkeeping.
+
+Would you like a deeper dive into PostgreSQL’s buffer eviction process or how to tune shared_buffers? 🚀
diff --git a/secretary/kademilia.md b/secretary/kademilia.md
@@ -1,5 +1,7 @@
 # Kademilia
 
+* https://kelseyc18.github.io/kademlia_vis//basics/1/
+
 * [IPFS](https://research.protocol.ai/publications/ipfs-content-addressed-versioned-p2p-file-system/benet2014.pdf)
 * [Kademilia Paper](https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf)
 * [Distributed Hash Tables with Kademlia](https://codethechange.stanford.edu/guides/guide_kademlia.html#supporting-dynamic-leaves-and-joins)

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+* Delete key, if deletes node, keep deleted node in array for removal from disk`
	`2`	`+*`