Skip to content

Commit 3d760e0

Browse files
committed
feat: add semantic search demo with embeddings and REST API
1 parent 37bf1c4 commit 3d760e0

8 files changed

Lines changed: 2934 additions & 0 deletions

File tree

demos/embeddings/.gitignore

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Dependencies
2+
node_modules/
3+
4+
# Generated embeddings index
5+
embeddings-index.json
6+
7+
# Logs
8+
npm-debug.log*
9+
yarn-debug.log*
10+
yarn-error.log*
11+
12+
# OS files
13+
.DS_Store
14+
Thumbs.db

demos/embeddings/README.md

Lines changed: 254 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,254 @@
1+
# Codebase Embeddings Demo
2+
3+
This demo application enables semantic search over your codebase using AI embeddings. It allows you to ask natural language questions like "Where is the model packaging code?" and get relevant code snippets with similarity scores.
4+
5+
## Overview
6+
7+
The demo consists of:
8+
- **Indexer**: Walks the codebase, generates embeddings for all source files
9+
- **Search Engine**: Uses cosine similarity to find relevant code
10+
- **Web UI**: Interactive interface for searching the codebase
11+
- **REST API**: Backend server for search and indexing operations
12+
13+
## Features
14+
15+
- 🔍 Natural language search over entire codebase
16+
- 📊 Similarity scores for each result
17+
- 🎯 Smart chunking for large files (respects function boundaries in Go)
18+
- 🚫 Respects .gitignore patterns
19+
- 📈 Progress tracking during indexing
20+
- 🔄 Rebuild index from UI
21+
- 💾 Persistent storage (JSON file)
22+
23+
## Prerequisites
24+
25+
Before running this demo, you need:
26+
27+
1. **Node.js** (version 18 or higher)
28+
2. **Docker Model Runner** with the embedding model loaded
29+
3. **The embedding model** (`ai/qwen3-embedding:0.6B-F16`)
30+
31+
### Pull the Embedding Model
32+
33+
```bash
34+
# Pull the Qwen3 embedding model
35+
docker model pull ai/qwen3-embedding
36+
37+
# Verify it's available
38+
docker model list
39+
```
40+
41+
## Installation
42+
43+
1. **Navigate to the demo directory:**
44+
```bash
45+
cd demos/embeddings
46+
```
47+
48+
2. **Install dependencies:**
49+
```bash
50+
npm install
51+
```
52+
53+
## Usage
54+
55+
### Step 1: Generate the Embeddings Index
56+
57+
Before you can search, you need to index the codebase:
58+
59+
```bash
60+
npm run index
61+
```
62+
63+
This will:
64+
- Scan all source files in the project (respecting .gitignore)
65+
- Generate embeddings for each file/chunk
66+
- Save the index to `embeddings-index.json`
67+
68+
**Note**: Indexing may take 5-15 minutes depending on the size of your codebase. Progress will be displayed in the console.
69+
70+
### Step 2: Start the Server
71+
72+
```bash
73+
npm start
74+
```
75+
76+
The server will start on `http://localhost:3000`
77+
78+
### Step 3: Open the Web Interface
79+
80+
Open your browser and navigate to:
81+
```
82+
http://localhost:3000
83+
```
84+
85+
## Using the Search Interface
86+
87+
1. **Check Index Status**: The status bar shows index information (files indexed, embeddings count, last updated)
88+
89+
2. **Enter Your Query**: Type a natural language question or keywords
90+
- Example: "Where is the model packaging code?"
91+
- Example: "GPU memory handling implementation"
92+
- Example: "How does distribution client work?"
93+
94+
3. **View Results**: Results are ranked by similarity score
95+
- File path and line numbers
96+
- Similarity percentage
97+
- Code snippet preview
98+
99+
4. **Try Example Queries**: Click on any example query to quickly test the search
100+
101+
5. **Rebuild Index**: Click "Rebuild Index" to regenerate embeddings (e.g., after code changes)
102+
103+
## API Reference
104+
105+
### Search Endpoint
106+
107+
**POST** `/api/search`
108+
109+
Search the codebase with a natural language query.
110+
111+
**Request:**
112+
```json
113+
{
114+
"query": "Where is the model packaging code?",
115+
"topK": 10
116+
}
117+
```
118+
119+
**Response:**
120+
```json
121+
{
122+
"query": "Where is the model packaging code?",
123+
"topK": 10,
124+
"count": 3,
125+
"results": [
126+
{
127+
"filePath": "cmd/cli/commands/package.go",
128+
"chunkId": 0,
129+
"content": "package commands\n\nimport (\n...",
130+
"startLine": 1,
131+
"endLine": 50,
132+
"fileType": ".go",
133+
"similarity": 0.8542
134+
}
135+
]
136+
}
137+
```
138+
139+
### Index Status
140+
141+
**GET** `/api/index/status`
142+
143+
Get information about the current index.
144+
145+
**Response:**
146+
```json
147+
{
148+
"exists": true,
149+
"size": 15728640,
150+
"sizeHuman": "15 MB",
151+
"modified": "2024-01-15T10:30:00.000Z",
152+
"metadata": {
153+
"projectRoot": "/path/to/project",
154+
"model": "ai/qwen3-embedding:0.6B-F16",
155+
"totalFiles": 150,
156+
"totalEmbeddings": 1250,
157+
"generatedAt": "2024-01-15T10:30:00.000Z",
158+
"version": "1.0"
159+
}
160+
}
161+
```
162+
163+
### Metadata
164+
165+
**GET** `/api/metadata`
166+
167+
Get index metadata.
168+
169+
### Rebuild Index
170+
171+
**POST** `/api/index/rebuild`
172+
173+
Trigger background indexing process.
174+
175+
## CLI Usage
176+
177+
You can also search from the command line:
178+
179+
```bash
180+
# Search with default settings (top 10 results)
181+
node search.js "model packaging code"
182+
183+
# Specify number of results
184+
node search.js "GPU memory" 5
185+
```
186+
187+
## Configuration
188+
189+
You can modify these settings in the respective files:
190+
191+
### indexer.js Configuration
192+
193+
```javascript
194+
const CONFIG = {
195+
projectRoot: path.resolve(__dirname, '../..'),
196+
embeddingsAPI: 'http://localhost:12434/engines/llama.cpp/v1/embeddings',
197+
model: 'ai/qwen3-embedding:0.6B-F16',
198+
maxChunkSize: 100, // tokens per chunk
199+
batchSize: 5, // files to process in parallel
200+
fileExtensions: ['.go'],
201+
};
202+
```
203+
204+
### search.js Configuration
205+
206+
```javascript
207+
const CONFIG = {
208+
embeddingsAPI: 'http://localhost:12434/engines/llama.cpp/v1/embeddings',
209+
model: 'ai/qwen3-embedding:0.6B-F16',
210+
defaultTopK: 10,
211+
similarityThreshold: 0.5, // minimum similarity score
212+
};
213+
```
214+
215+
## How It Works
216+
217+
### 1. File Collection
218+
- Reads `.gitignore` to respect ignore patterns
219+
- Filters by file extension (Go, JavaScript, Markdown, etc.)
220+
- Excludes directories like `node_modules`, `vendor`, `build`
221+
222+
### 2. Chunking Strategy
223+
- Files under 100 tokens: kept as single chunk
224+
- Go files: split at function boundaries
225+
- Other files: split by line count
226+
- Maintains line number references for each chunk
227+
228+
### 3. Embedding Generation
229+
- Each chunk is sent to the embedding API
230+
- Returns a high-dimensional vector (typically 768 or 1024 dimensions)
231+
- Vectors capture semantic meaning of the code
232+
233+
### 4. Search Process
234+
- User query is converted to an embedding vector
235+
- Cosine similarity calculated between query and all chunks
236+
- Results sorted by similarity (highest first)
237+
- Top K results returned
238+
239+
### 5. Similarity Calculation
240+
Uses cosine similarity formula:
241+
```
242+
similarity = (A · B) / (||A|| × ||B||)
243+
```
244+
245+
Where:
246+
- A = query embedding vector
247+
- B = code chunk embedding vector
248+
- Range: 0 to 1 (higher = more similar)
249+
250+
## Additional Resources
251+
252+
- [Docker Model Runner Documentation](https://docs.docker.com/ai/model-runner/)
253+
- [Embedding Models on Docker Hub](https://hub.docker.com/r/ai)
254+
- [Cosine Similarity Explanation](https://en.wikipedia.org/wiki/Cosine_similarity)

0 commit comments

Comments
 (0)