Skip to content

Commit a05e1cc

Browse files
committed
fix: bug fixes and README revamp
1 parent 7430090 commit a05e1cc

6 files changed

Lines changed: 49 additions & 25 deletions

File tree

README.md

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,40 @@
11
<div align="center">
22

3-
# OpenKB
3+
<a href="https://openkb.ai">
4+
<img src="https://docs.pageindex.ai/images/general/openkb.png" alt="OpenKB (by PageIndex)" />
5+
</a>
46

5-
### Karpathy's LLM Knowledge Base — as a CLI
7+
# OpenKB (Open Knowledge Base)
68

7-
**Drop documents in. Get an auto-maintained, cross-linked wiki out.**
9+
<h3 align="center">LLM-Powered Wiki Knowledge Base</h3>
810

9-
[Getting Started](#getting-started) · [How It Works](#how-it-works) · [Commands](#commands) · [Configuration](#configuration)
11+
<p align="center"><i>Scale to long documents&nbsp;&nbsp;Reasoning-based retrieval&nbsp;&nbsp;Native multimodality support&nbsp;&nbsp;No Vector DB</i></p>
1012

1113
</div>
1214

1315
---
1416

17+
# 📑 Introduction to OpenKB
18+
1519
Andrej Karpathy [described](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) a workflow where LLMs compile raw documents into a structured, interlinked markdown wiki — summaries, concept pages, cross-references — all maintained automatically. Knowledge compounds over time instead of being re-derived on every query.
1620

17-
**OpenKB** is an open-source CLI that implements this workflow, powered by [PageIndex](https://github.com/VectifyAI/PageIndex) for long document understanding and [markitdown](https://github.com/microsoft/markitdown) for broad format support.
21+
**OpenKB** (Open Knowledge Base) is an open-source CLI that implements this workflow, powered by [PageIndex](https://github.com/VectifyAI/PageIndex) for long document understanding and [markitdown](https://github.com/microsoft/markitdown) for broad format support.
1822

1923
### Why not just RAG?
2024

2125
RAG rediscovers knowledge from scratch on every query. Nothing accumulates. OpenKB compiles knowledge once into a persistent wiki, then keeps it current. Cross-references already exist. Contradictions are flagged. Synthesis reflects everything consumed.
2226

23-
## Features
27+
### Features
2428

25-
- **Any format** — PDF, Word, PowerPoint, Excel, HTML, Markdown, and more via markitdown
29+
- **Any format** — PDF, Word, PowerPoint, Excel, HTML, Markdown, text, CSV, and more via markitdown
2630
- **Long documents** — Books and reports that exceed LLM context windows are handled via [PageIndex](https://github.com/VectifyAI/PageIndex) tree indexing
2731
- **Auto wiki** — LLM generates summaries, concept pages, and cross-links. You curate sources; the LLM does the rest
2832
- **Query** — Ask questions against your wiki. The LLM navigates your compiled knowledge to answer
2933
- **Lint** — Health checks find contradictions, gaps, orphans, and stale content
3034
- **Watch mode** — Drop files into `raw/`, wiki updates automatically
3135
- **Obsidian compatible** — Wiki is plain `.md` files with `[[wikilinks]]`. Open in Obsidian for graph view and browsing
3236

33-
## Getting Started
37+
# 🚀 Getting Started
3438

3539
### Install
3640

@@ -67,7 +71,7 @@ OPENAI_API_KEY=sk-...
6771

6872
OpenKB uses [LiteLLM](https://docs.litellm.ai/docs/providers) — any provider works. Set the model during `okb init` or edit `.okb/config.yaml`.
6973

70-
## How It Works
74+
# 🧩 How It Works
7175

7276
```
7377
raw/ You drop files here
@@ -116,7 +120,9 @@ When you add a document, the LLM:
116120

117121
A single source might touch 10-15 wiki pages. Knowledge accumulates — each document enriches the existing wiki rather than sitting in isolation.
118122

119-
## Commands
123+
# 📦 Usage
124+
125+
### Commands
120126

121127
| Command | Description |
122128
|---|---|
@@ -126,11 +132,11 @@ A single source might touch 10-15 wiki pages. Knowledge accumulates — each doc
126132
| `okb query "question" --save` | Ask and save the answer to `wiki/explorations/` |
127133
| `okb watch` | Watch `raw/` and auto-compile new files |
128134
| `okb lint` | Run structural + knowledge health checks |
129-
| `okb lint --fix` | Auto-fix what it can |
135+
<!-- | `okb lint --fix` | Auto-fix what it can | -->
130136
| `okb list` | List indexed documents and concepts |
131137
| `okb status` | Show knowledge base stats |
132138

133-
## Configuration
139+
### Configuration
134140

135141
Generated by `okb init`, stored in `.okb/config.yaml`:
136142

@@ -148,7 +154,7 @@ The `wiki/AGENTS.md` file defines wiki structure and conventions. It's the LLM's
148154

149155
At runtime, the LLM reads `AGENTS.md` from disk — your edits take effect immediately.
150156

151-
## Using with Obsidian
157+
### Using with Obsidian
152158

153159
OpenKB's wiki is a directory of Markdown files with `[[wikilinks]]` — Obsidian renders it natively.
154160

@@ -157,18 +163,20 @@ OpenKB's wiki is a directory of Markdown files with `[[wikilinks]]` — Obsidian
157163
3. Use graph view to see knowledge connections
158164
4. Use Obsidian Web Clipper to add web articles to `raw/`
159165

160-
## Compared to Karpathy's Approach
166+
# 🔗 Learn More
167+
168+
### Compared to Karpathy's Approach
161169

162170
| | Karpathy's workflow | OpenKB |
163171
|---|---|---|
164172
| Short documents | LLM reads directly | markitdown → LLM reads |
165173
| Long documents | Doesn't fit in context | PageIndex tree index |
166-
| Supported formats | Web clipper → .md | PDF, Word, PPT, Excel, HTML, audio, .md |
174+
| Supported formats | Web clipper → .md | PDF, Word, PPT, Excel, HTML, text, CSV, .md |
167175
| Wiki compilation | LLM agent | LLM agent (same) |
168176
| Q&A | Query over wiki | Wiki + PageIndex retrieval |
169177
| Open source | No | Yes |
170178

171-
## Tech Stack
179+
### Tech Stack
172180

173181
- [PageIndex](https://github.com/VectifyAI/PageIndex) — Vectorless, reasoning-based document indexing
174182
- [markitdown](https://github.com/microsoft/markitdown) — Universal file-to-markdown conversion
@@ -177,10 +185,10 @@ OpenKB's wiki is a directory of Markdown files with `[[wikilinks]]` — Obsidian
177185
- [Click](https://click.palletsprojects.com/) — CLI framework
178186
- [watchdog](https://github.com/gorakhargosh/watchdog) — Filesystem monitoring
179187

180-
## License
188+
### License
181189

182190
Apache 2.0 — see [LICENSE](LICENSE)
183191

184-
## Acknowledgments
192+
### Acknowledgments
185193

186194
Inspired by [Andrej Karpathy's LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). Powered by [PageIndex](https://pageindex.ai/).
File renamed without changes.

openkb/agent/tools.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,10 @@ def list_wiki_files(directory: str, wiki_root: str) -> str:
2020
Newline-separated list of ``.md`` filenames found in *directory*,
2121
or ``"No files found."`` if the directory is empty or does not exist.
2222
"""
23-
target = Path(wiki_root) / directory
23+
root = Path(wiki_root).resolve()
24+
target = (root / directory).resolve()
25+
if not target.is_relative_to(root):
26+
return "Access denied: path escapes wiki root."
2427
if not target.exists() or not target.is_dir():
2528
return "No files found."
2629

@@ -40,7 +43,10 @@ def read_wiki_file(path: str, wiki_root: str) -> str:
4043
Returns:
4144
File contents as a string, or ``"File not found: {path}"`` if missing.
4245
"""
43-
full_path = Path(wiki_root) / path
46+
root = Path(wiki_root).resolve()
47+
full_path = (root / path).resolve()
48+
if not full_path.is_relative_to(root):
49+
return "Access denied: path escapes wiki root."
4450
if not full_path.exists():
4551
return f"File not found: {path}"
4652
return full_path.read_text(encoding="utf-8")
@@ -59,7 +65,10 @@ def write_wiki_file(path: str, content: str, wiki_root: str) -> str:
5965
Returns:
6066
``"Written: {path}"`` on success.
6167
"""
62-
full_path = Path(wiki_root) / path
68+
root = Path(wiki_root).resolve()
69+
full_path = (root / path).resolve()
70+
if not full_path.is_relative_to(root):
71+
return "Access denied: path escapes wiki root."
6372
full_path.parent.mkdir(parents=True, exist_ok=True)
6473
full_path.write_text(content, encoding="utf-8")
6574
return f"Written: {path}"

openkb/cli.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -280,14 +280,21 @@ def watch():
280280

281281
def on_new_files(paths):
282282
for p in paths:
283-
_add_single_file(Path(p), kb_dir)
283+
fp = Path(p)
284+
if fp.suffix.lower() not in SUPPORTED_EXTENSIONS:
285+
click.echo(
286+
f"Skipping unsupported file type: {fp.suffix}. "
287+
f"Supported: {', '.join(sorted(SUPPORTED_EXTENSIONS))}"
288+
)
289+
continue
290+
_add_single_file(fp, kb_dir)
284291

285292
click.echo(f"Watching {raw_dir} for new documents. Press Ctrl+C to stop.")
286293
watch_directory(raw_dir, on_new_files)
287294

288295

289296
@cli.command()
290-
@click.option("--fix", is_flag=True, default=False, help="Automatically fix lint issues.")
297+
@click.option("--fix", is_flag=True, default=False, help="Automatically fix lint issues.") # TODO: --fix not yet implemented
291298
def lint(fix):
292299
"""Lint the knowledge base for structural and semantic inconsistencies."""
293300
kb_dir = _find_kb_dir()

openkb/converter.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ def convert_document(src: Path, kb_dir: Path) -> ConvertResult:
6666
raw_dir = kb_dir / "raw"
6767
raw_dir.mkdir(parents=True, exist_ok=True)
6868
raw_dest = raw_dir / src.name
69-
if raw_dest != src:
69+
if raw_dest.resolve() != src.resolve():
7070
shutil.copy2(src, raw_dest)
7171

7272
# ------------------------------------------------------------------

openkb/schema.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
- Explorations: name, one-liner description
3131
3232
## Log Format
33-
Each log entry: `## [YYYY-MM-DD] operation | description`
33+
Each log entry: `## [YYYY-MM-DD HH:MM:SS] operation | description`
3434
Operations: ingest, query, lint
3535
3636
## Format

0 commit comments

Comments
 (0)