Skip to content

Commit b0b0fa7

Browse files
committed
README updated to meet sysadminctl.services format
1 parent b255f01 commit b0b0fa7

1 file changed

Lines changed: 65 additions & 51 deletions

File tree

README.md

Lines changed: 65 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,142 +1,156 @@
1-
# SynapsIA
1+
# SynapsIA ✍️
22

33
**SynapsIA** is the digital scribe of the new era. Its name comes from the fusion of **Synapse**, the neural connections where knowledge resides, and **AI**, the intelligence that brings it to life.
44

5-
It is not a simple reader; it is a craftsman of knowledge. Just as an ancient scribe forged texts that would last for centuries, **SynapsIA** ingests your documents, from the simplest to the most complex, and forges an intricate network of synaptic connections. Every piece of data, every concept, and every relationship becomes a golden thread in a vast knowledge base.
6-
7-
The result is a coherent, living digital mind, ready to be interrogated. Ask it a question, and it will respond with wisdom drawn from your own texts, allowing other AI systems to drink from this fountain of pure, contextualized knowledge.
5+
It is not a simple reader; it is a craftsman of knowledge. Just as an ancient scribe forged texts that would last for centuries, **SynapsIA** ingests your documents and forges an intricate network of synaptic connections. The result is a coherent, living digital mind, ready to be interrogated by other AI systems.
86

97
> **SynapsIA: Forging knowledge, one synapse at a time.**
108
119
---
1210

13-
## Technical Purpose
11+
## 🚀 Features
1412

15-
`synapsia.py` is a command-line script that uses the LlamaIndex library to process a set of documents. It reads the files, splits them into chunks, and uses an embedding model (via an Ollama service) to convert them into numerical vectors. Finally, it stores these vectors in a persistent index on disk, creating a knowledge base ready to be used by a RAG (Retrieval-Augmented Generation) application.
13+
* **Ollama-Powered:** Leverages local Ollama services for all embedding tasks, keeping your data private.
14+
* **RAG-Ready:** Processes your documents into a persistent, optimized vector index, creating the "knowledge base" for any RAG application (like [Kondoo](https://github.com/sysadminctl-services/kondoo)).
15+
* **Tunable Ingestion:** Provides fine-grained control over `chunk-size` and `chunk-overlap` so you can optimize your knowledge base for Q&A, summarization, or other tasks.
16+
* **Built-in Query Tool:** Includes a companion script, `synapsia_query.py`, to immediately test and debug your new knowledge base with your local LLMs.
1617

17-
## Prerequisites
18+
## Prerequisites
1819

19-
1. **Python 3.9+**: A recent version of Python installed on your local machine. We highly recommend using a virtual environment (`python -m venv .venv`).
20-
2. **Python Dependencies**: Install the required libraries by running `pip install -r requirements.txt` from within the `SynapsIA` directory.
21-
3. **Ollama Service**: The script needs to connect to a running Ollama instance. You can launch one using Podman. This command creates a persistent named volume called `synapsia_storage` to store the downloaded models, so you don't have to download them again every time.
20+
Before you begin, ensure you have the following installed and running:
2221

22+
1. **Python 3.9+**: A recent version of Python. We highly recommend using a virtual environment (`python -m venv .venv`).
23+
2. **Python Dependencies**: Install the required libraries:
24+
```bash
25+
pip install -r requirements.txt
26+
```
27+
3. **Ollama Service**: The script needs to connect to a running Ollama instance. You can launch one using Podman:
2328
```bash
2429
# Launch the Ollama container in the background
30+
# This uses a persistent volume to save your models
2531
podman run -d --rm -p 11434:11434 --name ollama-synapsia -v synapsia_storage:/root/.ollama ollama/ollama
2632
```
27-
28-
4. **Embedding Model**: Ensure the embedding model is available in your Ollama instance. **After starting the container**, pull the model using `podman exec`:
29-
33+
4. **Embedding Model**: After starting the container, pull your desired embedding model:
3034
```bash
3135
# Tell the running 'ollama-synapsia' container to download the model
3236
podman exec -it ollama-synapsia ollama pull mxbai-embed-large
3337
```
3438

35-
## Usage
39+
---
40+
41+
## ✍️ Ingesting Knowledge (`synapsia.py`)
3642

37-
The script is run from the terminal inside the `SynapsIA` project directory. It uses named arguments to specify paths and options.
43+
This is the main script for processing your documents.
3844

3945
**Command Syntax:**
4046
```bash
4147
python synapsia.py --docs <path_to_docs> --knowledge <path_to_knowledge_base> [OPTIONS]
4248
```
4349

4450
### Required Arguments
45-
4651
* `--docs <path>`: The relative or absolute path to the directory containing your source documents.
52+
4753
* `--knowledge <path>`: The relative or absolute path to the output directory where the knowledge base will be saved.
4854

4955
### Options
50-
5156
* `--embed-model <model_name>`: The name of the embedding model to use from Ollama.
52-
* **Default:** `mxbai-embed-large`
57+
Default: `mxbai-embed-large`
58+
5359
* `--ollama-url <url>`: The base URL of the Ollama API service.
54-
* **Default:** `http://localhost:11434`
55-
* `--embed-batch-size <number>`: Number of chunks to process at a time for embeddings.
56-
* **Default:** `5`
57-
* `--chunk-size <number>`: The size of text chunks in tokens. Ideal for tuning how context is stored.
58-
* **Default:** `1024`
59-
* `--chunk-overlap <number>`: The number of overlapping tokens between consecutive chunks to maintain context.
60-
* **Default:** `20`
60+
Default: `http://localhost:11434`
61+
62+
* `--embed-batch-size <number>`: Number of chunks to process at a time.
63+
Default: `5`
64+
65+
* `--chunk-size <number>`: The size of text chunks in tokens.
66+
Default: `1024`
67+
68+
* `--chunk-overlap <number>`: The number of overlapping tokens between consecutive chunks.
69+
Default: `20`
70+
6171
* `-h`, `--help`: Show the help message and exit.
6272

63-
### Examples
73+
### Ingestion Examples
6474

65-
#### Basic Usage
75+
#### 1. Basic Usage
76+
Process documents from `./my_docs/` and save the index to `./my_kb/`.
6677

6778
```bash
6879
python synapsia.py --docs ./my_docs/ --knowledge ./my_kb/
6980
```
7081

71-
#### Advanced Usage (Custom Model)
82+
#### 2. Advanced Usage (Custom Model)
7283

73-
To run the script using a different embedding model, like `nomic-embed-text`:
84+
Use a different embedding model.
7485

7586
```bash
7687
python synapsia.py --docs ./my_docs/ --knowledge ./my_kb/ --embed-model nomic-embed-text
7788
```
7889

79-
#### Advanced Usage (Tuned for Q&A)
80-
81-
This command is optimized for dense, factual documents like FAQs. It uses a smaller chunk size for more precise answers, a corresponding overlap to maintain context between chunks, and a very dynamic progress bar.
90+
#### 3. Advanced Usage (Tuned for Q&A)
91+
Optimized for dense, factual documents. Uses a smaller chunk size for more precise answers.
8292

8393
```bash
84-
8594
python synapsia.py \
8695
--docs ./faq_docs/ \
8796
--knowledge ./faq_kb/ \
8897
--chunk-size 256 \
89-
--chunk-overlap 25 \
90-
--embed-batch-size 1
98+
--chunk-overlap 25
9199
```
92100

93-
## Querying the Knowledge Base
101+
---
94102

95-
Once you have created a knowledge base with `synapsia.py`, you can ask it questions using the `synapsia_query.py` script. This tool loads the indexed knowledge, sends your query to a local LLM, and uses the documents as context to provide a relevant answer.
103+
## ❓ Querying Knowledge (synapsia_query.py)
104+
105+
Use this script to test your new knowledge base. It loads the index, sends your query to a local LLM (via Ollama), and provides a RAG-generated answer.
106+
107+
**Command Syntax**:
96108

97-
**Command Syntax:**
98109
```bash
99110
python synapsia_query.py --knowledge <path_to_knowledge_base> --query "Your question here" [OPTIONS]
100111
```
101112

102113
### Required Arguments
103-
104114
* `--knowledge <path>`: The path to the directory where the knowledge base was saved.
105115

106116
* `--query <question>`: The question you want to ask, enclosed in quotes.
107117

108118
### Options
109119

110120
* `--llm-model <model_name>`: The name of the LLM to use from Ollama for generating the answer.
111-
* **Default:** `llama3`
112-
113-
* `--top-k <number>`: The number of most relevant text chunks to retrieve from the knowledge base to build the answer.
114-
* **Default:** `3`
121+
Default: `llama3`
115122

116-
* `--show-context`: A flag that, when present, displays the source text chunks and their relevance scores. This is extremely useful for debugging and understanding the model's reasoning.
123+
* `--top-k <number>`: The number of relevant text chunks to retrieve.
124+
Default: `3`
117125

118-
* `--embed-model`, `--ollama-url`: These should generally match the values used during the ingestion process.
126+
* `--show-context`: A flag that displays the source text chunks and their relevance scores. Extremely useful for debugging.
119127

120-
### Examples
128+
* `--embed-model`, `--ollama-url`: These should match the values used during the ingestion process.
121129

122-
### Basic Question
130+
### Query Examples
123131

124-
A simple query to get a direct answer from the knowledge base.
132+
#### 1. Basic Question
125133

134+
Get a direct answer from the knowledge base.
126135

127136
```bash
128137
python synapsia_query.py \
129138
--knowledge ./my_kb/ \
130139
--query "What is the main purpose of Ansible?"
131140
```
132141

133-
### Debugging Query
134-
A query to not only get an answer but also see the top 2 source chunks that the RAG system used as context. This is ideal for fine-tuning your ingestion parameters.
142+
#### 2. Debugging Query
143+
144+
Get an answer and see the top 2 source chunks used as context.
135145

136146
```bash
137147
python synapsia_query.py \
138148
--knowledge ./faq_kb/ \
139149
--query "How do I reset my password?" \
140150
--top-k 2 \
141151
--show-context
142-
```
152+
```
153+
154+
## ⚖️ License
155+
156+
This project is licensed under the MIT License. See the LICENSE file for details.

0 commit comments

Comments
 (0)