Skip to content

Commit 303fabd

Browse files
committed
feat: first git
0 parents  commit 303fabd

20 files changed

Lines changed: 3581 additions & 0 deletions

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 TOON Format Contributors
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 358 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,358 @@
1+
# TOON (Token-Oriented Object Notation)
2+
3+
A compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.
4+
5+
[![Python Version](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
6+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7+
8+
## Overview
9+
10+
TOON achieves **CSV-like compactness** while adding **explicit structure**, making it ideal for:
11+
- Reducing token costs in LLM API calls
12+
- Improving context window efficiency
13+
- Maintaining human readability
14+
- Preserving data structure and types
15+
16+
### Key Features
17+
18+
-**Compact**: 30-60% smaller than JSON for structured data
19+
-**Readable**: Clean, indentation-based syntax
20+
-**Structured**: Preserves nested objects and arrays
21+
-**Type-safe**: Supports strings, numbers, booleans, null
22+
-**Flexible**: Multiple delimiter options (comma, tab, pipe)
23+
-**Smart**: Automatic tabular format for uniform arrays
24+
-**Efficient**: Key folding for deeply nested objects
25+
26+
## Installation
27+
28+
```bash
29+
pip install toonify
30+
```
31+
32+
For development:
33+
```bash
34+
pip install toonify[dev]
35+
```
36+
37+
## Quick Start
38+
39+
### Python API
40+
41+
```python
42+
from toon import encode, decode
43+
44+
# Encode Python dict to TOON
45+
data = {
46+
'products': [
47+
{'sku': 'LAP-001', 'name': 'Gaming Laptop', 'price': 1299.99},
48+
{'sku': 'MOU-042', 'name': 'Wireless Mouse', 'price': 29.99}
49+
]
50+
}
51+
52+
toon_string = encode(data)
53+
print(toon_string)
54+
# Output:
55+
# products[2]{sku,name,price}:
56+
# LAP-001,Gaming Laptop,1299.99
57+
# MOU-042,Wireless Mouse,29.99
58+
59+
# Decode TOON back to Python
60+
result = decode(toon_string)
61+
assert result == data
62+
```
63+
64+
### Command Line
65+
66+
```bash
67+
# Encode JSON to TOON
68+
toon input.json -o output.toon
69+
70+
# Decode TOON to JSON
71+
toon input.toon -o output.json
72+
73+
# Use with pipes
74+
cat data.json | toon -e > data.toon
75+
76+
# Show token statistics
77+
toon data.json --stats
78+
```
79+
80+
## TOON Format Specification
81+
82+
### Basic Syntax
83+
84+
```toon
85+
# Simple key-value pairs
86+
title: Machine Learning Basics
87+
chapters: 12
88+
published: true
89+
```
90+
91+
### Arrays
92+
93+
**Primitive arrays** (inline):
94+
```toon
95+
temperatures: [72.5,68.3,75.1,70.8,73.2]
96+
categories: [electronics,computers,accessories]
97+
```
98+
99+
**Tabular arrays** (uniform objects with header):
100+
```toon
101+
inventory[3]{sku,product,stock}:
102+
KB-789,Mechanical Keyboard,45
103+
MS-456,RGB Mouse Pad,128
104+
HD-234,USB Headset,67
105+
```
106+
107+
**List arrays** (non-uniform or nested):
108+
```toon
109+
tasks[2]:
110+
Complete documentation
111+
Review pull requests
112+
```
113+
114+
### Nested Objects
115+
116+
```toon
117+
server:
118+
hostname: api-prod-01
119+
config:
120+
port: 8080
121+
region: us-east
122+
```
123+
124+
### Quoting Rules
125+
126+
Strings are quoted only when necessary:
127+
- Contains special characters (`,`, `:`, `"`, newlines)
128+
- Has leading/trailing whitespace
129+
- Looks like a literal (`true`, `false`, `null`)
130+
- Is empty
131+
132+
```toon
133+
simple: ProductName
134+
quoted: "Product, Description"
135+
escaped: "Size: 15\" display"
136+
multiline: "First feature\nSecond feature"
137+
```
138+
139+
## API Reference
140+
141+
### `encode(data, options=None)`
142+
143+
Convert Python object to TOON string.
144+
145+
**Parameters:**
146+
- `data`: Python dict or list
147+
- `options`: Optional dict with:
148+
- `delimiter`: `'comma'` (default), `'tab'`, or `'pipe'`
149+
- `indent`: Number of spaces per level (default: 2)
150+
- `key_folding`: `'off'` (default) or `'safe'`
151+
- `flatten_depth`: Max depth for key folding (default: None)
152+
153+
**Example:**
154+
```python
155+
toon = encode(data, {
156+
'delimiter': 'tab',
157+
'indent': 4,
158+
'key_folding': 'safe'
159+
})
160+
```
161+
162+
### `decode(toon_string, options=None)`
163+
164+
Convert TOON string to Python object.
165+
166+
**Parameters:**
167+
- `toon_string`: TOON formatted string
168+
- `options`: Optional dict with:
169+
- `strict`: Validate structure strictly (default: True)
170+
- `expand_paths`: `'off'` (default) or `'safe'`
171+
- `default_delimiter`: Default delimiter (default: `','`)
172+
173+
**Example:**
174+
```python
175+
data = decode(toon_string, {
176+
'expand_paths': 'safe',
177+
'strict': False
178+
})
179+
```
180+
181+
## CLI Usage
182+
183+
```
184+
usage: toon [-h] [-o OUTPUT] [-e] [-d] [--delimiter {comma,tab,pipe}]
185+
[--indent INDENT] [--stats] [--no-strict]
186+
[--key-folding {off,safe}] [--flatten-depth DEPTH]
187+
[--expand-paths {off,safe}]
188+
[input]
189+
190+
TOON (Token-Oriented Object Notation) - Convert between JSON and TOON formats
191+
192+
positional arguments:
193+
input Input file path (or "-" for stdin)
194+
195+
optional arguments:
196+
-h, --help show this help message and exit
197+
-o, --output OUTPUT Output file path (default: stdout)
198+
-e, --encode Force encode mode (JSON to TOON)
199+
-d, --decode Force decode mode (TOON to JSON)
200+
--delimiter {comma,tab,pipe}
201+
Array delimiter (default: comma)
202+
--indent INDENT Indentation size (default: 2)
203+
--stats Show token statistics
204+
--no-strict Disable strict validation (decode only)
205+
--key-folding {off,safe}
206+
Key folding mode (encode only)
207+
--flatten-depth DEPTH Maximum key folding depth (encode only)
208+
--expand-paths {off,safe}
209+
Path expansion mode (decode only)
210+
```
211+
212+
## Advanced Features
213+
214+
### Key Folding
215+
216+
Collapse single-key chains into dotted paths:
217+
218+
```python
219+
data = {
220+
'api': {
221+
'response': {
222+
'product': {
223+
'title': 'Wireless Keyboard'
224+
}
225+
}
226+
}
227+
}
228+
229+
# With key_folding='safe'
230+
toon = encode(data, {'key_folding': 'safe'})
231+
# Output: api.response.product.title: Wireless Keyboard
232+
```
233+
234+
### Path Expansion
235+
236+
Expand dotted keys into nested objects:
237+
238+
```python
239+
toon = 'store.location.zipcode: 10001'
240+
241+
# With expand_paths='safe'
242+
data = decode(toon, {'expand_paths': 'safe'})
243+
# Result: {'store': {'location': {'zipcode': 10001}}}
244+
```
245+
246+
### Custom Delimiters
247+
248+
Choose the delimiter that best fits your data:
249+
250+
```python
251+
# Tab delimiter (better for spreadsheet-like data)
252+
toon = encode(data, {'delimiter': 'tab'})
253+
254+
# Pipe delimiter (when data contains commas)
255+
toon = encode(data, {'delimiter': 'pipe'})
256+
```
257+
258+
## Format Comparison
259+
260+
### JSON vs TOON
261+
262+
**JSON** (247 bytes):
263+
```json
264+
{
265+
"products": [
266+
{"id": 101, "name": "Laptop Pro", "price": 1299},
267+
{"id": 102, "name": "Magic Mouse", "price": 79},
268+
{"id": 103, "name": "USB-C Cable", "price": 19}
269+
]
270+
}
271+
```
272+
273+
**TOON** (98 bytes, **60% reduction**):
274+
```toon
275+
products[3]{id,name,price}:
276+
101,Laptop Pro,1299
277+
102,Magic Mouse,79
278+
103,USB-C Cable,19
279+
```
280+
281+
### When to Use TOON
282+
283+
**Use TOON when:**
284+
- ✅ Passing data to LLM APIs (reduce token costs)
285+
- ✅ Working with uniform tabular data
286+
- ✅ Context window is limited
287+
- ✅ Human readability matters
288+
289+
**Use JSON when:**
290+
- ❌ Maximum compatibility is required
291+
- ❌ Data is highly irregular/nested
292+
- ❌ Working with existing JSON-only tools
293+
294+
## Development
295+
296+
### Setup
297+
298+
```bash
299+
git clone https://github.com/ScrapeGraphAI/toonify.git
300+
cd toonify
301+
pip install -e .[dev]
302+
```
303+
304+
### Running Tests
305+
306+
```bash
307+
pytest
308+
pytest --cov=toon --cov-report=term-missing
309+
```
310+
311+
### Running Examples
312+
313+
```bash
314+
python examples/basic_usage.py
315+
python examples/advanced_features.py
316+
```
317+
318+
## Performance
319+
320+
TOON typically achieves:
321+
- **30-60% size reduction** vs JSON for structured data
322+
- **40-70% token reduction** with tabular data
323+
- **Minimal overhead** in encoding/decoding (<1ms for typical payloads)
324+
325+
## Contributing
326+
327+
Contributions are welcome! Please:
328+
329+
1. Fork the repository
330+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
331+
3. Make your changes with tests
332+
4. Run tests (`pytest`)
333+
5. Commit your changes (`git commit -m 'Add amazing feature'`)
334+
6. Push to the branch (`git push origin feature/amazing-feature`)
335+
7. Open a Pull Request
336+
337+
## License
338+
339+
MIT License - see [LICENSE](LICENSE) file for details.
340+
341+
## Credits
342+
343+
Python implementation inspired by the TypeScript TOON library at [toon-format/toon](https://github.com/toon-format/toon).
344+
345+
## Links
346+
347+
- **GitHub**: https://github.com/ScrapeGraphAI/toonify
348+
- **PyPI**: https://pypi.org/project/toonify/
349+
- **Documentation**: https://github.com/ScrapeGraphAI/toonify#readme
350+
- **Format Spec**: https://github.com/toon-format/toon
351+
352+
---
353+
354+
Made with love by the [ScrapeGraph team](https://scrapegraphai.com)
355+
356+
<p align="center">
357+
<img src="https://github.com/ScrapeGraphAI/Scrapegraph-ai/blob/main/docs/assets/scrapegraphai_logo.png" alt="ScrapeGraphAI Logo" width="250">
358+
</p>

0 commit comments

Comments
 (0)