|
| 1 | +# TOON (Token-Oriented Object Notation) |
| 2 | + |
| 3 | +A compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage. |
| 4 | + |
| 5 | +[](https://www.python.org/downloads/) |
| 6 | +[](https://opensource.org/licenses/MIT) |
| 7 | + |
| 8 | +## Overview |
| 9 | + |
| 10 | +TOON achieves **CSV-like compactness** while adding **explicit structure**, making it ideal for: |
| 11 | +- Reducing token costs in LLM API calls |
| 12 | +- Improving context window efficiency |
| 13 | +- Maintaining human readability |
| 14 | +- Preserving data structure and types |
| 15 | + |
| 16 | +### Key Features |
| 17 | + |
| 18 | +- ✅ **Compact**: 30-60% smaller than JSON for structured data |
| 19 | +- ✅ **Readable**: Clean, indentation-based syntax |
| 20 | +- ✅ **Structured**: Preserves nested objects and arrays |
| 21 | +- ✅ **Type-safe**: Supports strings, numbers, booleans, null |
| 22 | +- ✅ **Flexible**: Multiple delimiter options (comma, tab, pipe) |
| 23 | +- ✅ **Smart**: Automatic tabular format for uniform arrays |
| 24 | +- ✅ **Efficient**: Key folding for deeply nested objects |
| 25 | + |
| 26 | +## Installation |
| 27 | + |
| 28 | +```bash |
| 29 | +pip install toonify |
| 30 | +``` |
| 31 | + |
| 32 | +For development: |
| 33 | +```bash |
| 34 | +pip install toonify[dev] |
| 35 | +``` |
| 36 | + |
| 37 | +## Quick Start |
| 38 | + |
| 39 | +### Python API |
| 40 | + |
| 41 | +```python |
| 42 | +from toon import encode, decode |
| 43 | + |
| 44 | +# Encode Python dict to TOON |
| 45 | +data = { |
| 46 | + 'products': [ |
| 47 | + {'sku': 'LAP-001', 'name': 'Gaming Laptop', 'price': 1299.99}, |
| 48 | + {'sku': 'MOU-042', 'name': 'Wireless Mouse', 'price': 29.99} |
| 49 | + ] |
| 50 | +} |
| 51 | + |
| 52 | +toon_string = encode(data) |
| 53 | +print(toon_string) |
| 54 | +# Output: |
| 55 | +# products[2]{sku,name,price}: |
| 56 | +# LAP-001,Gaming Laptop,1299.99 |
| 57 | +# MOU-042,Wireless Mouse,29.99 |
| 58 | + |
| 59 | +# Decode TOON back to Python |
| 60 | +result = decode(toon_string) |
| 61 | +assert result == data |
| 62 | +``` |
| 63 | + |
| 64 | +### Command Line |
| 65 | + |
| 66 | +```bash |
| 67 | +# Encode JSON to TOON |
| 68 | +toon input.json -o output.toon |
| 69 | + |
| 70 | +# Decode TOON to JSON |
| 71 | +toon input.toon -o output.json |
| 72 | + |
| 73 | +# Use with pipes |
| 74 | +cat data.json | toon -e > data.toon |
| 75 | + |
| 76 | +# Show token statistics |
| 77 | +toon data.json --stats |
| 78 | +``` |
| 79 | + |
| 80 | +## TOON Format Specification |
| 81 | + |
| 82 | +### Basic Syntax |
| 83 | + |
| 84 | +```toon |
| 85 | +# Simple key-value pairs |
| 86 | +title: Machine Learning Basics |
| 87 | +chapters: 12 |
| 88 | +published: true |
| 89 | +``` |
| 90 | + |
| 91 | +### Arrays |
| 92 | + |
| 93 | +**Primitive arrays** (inline): |
| 94 | +```toon |
| 95 | +temperatures: [72.5,68.3,75.1,70.8,73.2] |
| 96 | +categories: [electronics,computers,accessories] |
| 97 | +``` |
| 98 | + |
| 99 | +**Tabular arrays** (uniform objects with header): |
| 100 | +```toon |
| 101 | +inventory[3]{sku,product,stock}: |
| 102 | + KB-789,Mechanical Keyboard,45 |
| 103 | + MS-456,RGB Mouse Pad,128 |
| 104 | + HD-234,USB Headset,67 |
| 105 | +``` |
| 106 | + |
| 107 | +**List arrays** (non-uniform or nested): |
| 108 | +```toon |
| 109 | +tasks[2]: |
| 110 | + Complete documentation |
| 111 | + Review pull requests |
| 112 | +``` |
| 113 | + |
| 114 | +### Nested Objects |
| 115 | + |
| 116 | +```toon |
| 117 | +server: |
| 118 | + hostname: api-prod-01 |
| 119 | + config: |
| 120 | + port: 8080 |
| 121 | + region: us-east |
| 122 | +``` |
| 123 | + |
| 124 | +### Quoting Rules |
| 125 | + |
| 126 | +Strings are quoted only when necessary: |
| 127 | +- Contains special characters (`,`, `:`, `"`, newlines) |
| 128 | +- Has leading/trailing whitespace |
| 129 | +- Looks like a literal (`true`, `false`, `null`) |
| 130 | +- Is empty |
| 131 | + |
| 132 | +```toon |
| 133 | +simple: ProductName |
| 134 | +quoted: "Product, Description" |
| 135 | +escaped: "Size: 15\" display" |
| 136 | +multiline: "First feature\nSecond feature" |
| 137 | +``` |
| 138 | + |
| 139 | +## API Reference |
| 140 | + |
| 141 | +### `encode(data, options=None)` |
| 142 | + |
| 143 | +Convert Python object to TOON string. |
| 144 | + |
| 145 | +**Parameters:** |
| 146 | +- `data`: Python dict or list |
| 147 | +- `options`: Optional dict with: |
| 148 | + - `delimiter`: `'comma'` (default), `'tab'`, or `'pipe'` |
| 149 | + - `indent`: Number of spaces per level (default: 2) |
| 150 | + - `key_folding`: `'off'` (default) or `'safe'` |
| 151 | + - `flatten_depth`: Max depth for key folding (default: None) |
| 152 | + |
| 153 | +**Example:** |
| 154 | +```python |
| 155 | +toon = encode(data, { |
| 156 | + 'delimiter': 'tab', |
| 157 | + 'indent': 4, |
| 158 | + 'key_folding': 'safe' |
| 159 | +}) |
| 160 | +``` |
| 161 | + |
| 162 | +### `decode(toon_string, options=None)` |
| 163 | + |
| 164 | +Convert TOON string to Python object. |
| 165 | + |
| 166 | +**Parameters:** |
| 167 | +- `toon_string`: TOON formatted string |
| 168 | +- `options`: Optional dict with: |
| 169 | + - `strict`: Validate structure strictly (default: True) |
| 170 | + - `expand_paths`: `'off'` (default) or `'safe'` |
| 171 | + - `default_delimiter`: Default delimiter (default: `','`) |
| 172 | + |
| 173 | +**Example:** |
| 174 | +```python |
| 175 | +data = decode(toon_string, { |
| 176 | + 'expand_paths': 'safe', |
| 177 | + 'strict': False |
| 178 | +}) |
| 179 | +``` |
| 180 | + |
| 181 | +## CLI Usage |
| 182 | + |
| 183 | +``` |
| 184 | +usage: toon [-h] [-o OUTPUT] [-e] [-d] [--delimiter {comma,tab,pipe}] |
| 185 | + [--indent INDENT] [--stats] [--no-strict] |
| 186 | + [--key-folding {off,safe}] [--flatten-depth DEPTH] |
| 187 | + [--expand-paths {off,safe}] |
| 188 | + [input] |
| 189 | +
|
| 190 | +TOON (Token-Oriented Object Notation) - Convert between JSON and TOON formats |
| 191 | +
|
| 192 | +positional arguments: |
| 193 | + input Input file path (or "-" for stdin) |
| 194 | +
|
| 195 | +optional arguments: |
| 196 | + -h, --help show this help message and exit |
| 197 | + -o, --output OUTPUT Output file path (default: stdout) |
| 198 | + -e, --encode Force encode mode (JSON to TOON) |
| 199 | + -d, --decode Force decode mode (TOON to JSON) |
| 200 | + --delimiter {comma,tab,pipe} |
| 201 | + Array delimiter (default: comma) |
| 202 | + --indent INDENT Indentation size (default: 2) |
| 203 | + --stats Show token statistics |
| 204 | + --no-strict Disable strict validation (decode only) |
| 205 | + --key-folding {off,safe} |
| 206 | + Key folding mode (encode only) |
| 207 | + --flatten-depth DEPTH Maximum key folding depth (encode only) |
| 208 | + --expand-paths {off,safe} |
| 209 | + Path expansion mode (decode only) |
| 210 | +``` |
| 211 | + |
| 212 | +## Advanced Features |
| 213 | + |
| 214 | +### Key Folding |
| 215 | + |
| 216 | +Collapse single-key chains into dotted paths: |
| 217 | + |
| 218 | +```python |
| 219 | +data = { |
| 220 | + 'api': { |
| 221 | + 'response': { |
| 222 | + 'product': { |
| 223 | + 'title': 'Wireless Keyboard' |
| 224 | + } |
| 225 | + } |
| 226 | + } |
| 227 | +} |
| 228 | + |
| 229 | +# With key_folding='safe' |
| 230 | +toon = encode(data, {'key_folding': 'safe'}) |
| 231 | +# Output: api.response.product.title: Wireless Keyboard |
| 232 | +``` |
| 233 | + |
| 234 | +### Path Expansion |
| 235 | + |
| 236 | +Expand dotted keys into nested objects: |
| 237 | + |
| 238 | +```python |
| 239 | +toon = 'store.location.zipcode: 10001' |
| 240 | + |
| 241 | +# With expand_paths='safe' |
| 242 | +data = decode(toon, {'expand_paths': 'safe'}) |
| 243 | +# Result: {'store': {'location': {'zipcode': 10001}}} |
| 244 | +``` |
| 245 | + |
| 246 | +### Custom Delimiters |
| 247 | + |
| 248 | +Choose the delimiter that best fits your data: |
| 249 | + |
| 250 | +```python |
| 251 | +# Tab delimiter (better for spreadsheet-like data) |
| 252 | +toon = encode(data, {'delimiter': 'tab'}) |
| 253 | + |
| 254 | +# Pipe delimiter (when data contains commas) |
| 255 | +toon = encode(data, {'delimiter': 'pipe'}) |
| 256 | +``` |
| 257 | + |
| 258 | +## Format Comparison |
| 259 | + |
| 260 | +### JSON vs TOON |
| 261 | + |
| 262 | +**JSON** (247 bytes): |
| 263 | +```json |
| 264 | +{ |
| 265 | + "products": [ |
| 266 | + {"id": 101, "name": "Laptop Pro", "price": 1299}, |
| 267 | + {"id": 102, "name": "Magic Mouse", "price": 79}, |
| 268 | + {"id": 103, "name": "USB-C Cable", "price": 19} |
| 269 | + ] |
| 270 | +} |
| 271 | +``` |
| 272 | + |
| 273 | +**TOON** (98 bytes, **60% reduction**): |
| 274 | +```toon |
| 275 | +products[3]{id,name,price}: |
| 276 | + 101,Laptop Pro,1299 |
| 277 | + 102,Magic Mouse,79 |
| 278 | + 103,USB-C Cable,19 |
| 279 | +``` |
| 280 | + |
| 281 | +### When to Use TOON |
| 282 | + |
| 283 | +**Use TOON when:** |
| 284 | +- ✅ Passing data to LLM APIs (reduce token costs) |
| 285 | +- ✅ Working with uniform tabular data |
| 286 | +- ✅ Context window is limited |
| 287 | +- ✅ Human readability matters |
| 288 | + |
| 289 | +**Use JSON when:** |
| 290 | +- ❌ Maximum compatibility is required |
| 291 | +- ❌ Data is highly irregular/nested |
| 292 | +- ❌ Working with existing JSON-only tools |
| 293 | + |
| 294 | +## Development |
| 295 | + |
| 296 | +### Setup |
| 297 | + |
| 298 | +```bash |
| 299 | +git clone https://github.com/ScrapeGraphAI/toonify.git |
| 300 | +cd toonify |
| 301 | +pip install -e .[dev] |
| 302 | +``` |
| 303 | + |
| 304 | +### Running Tests |
| 305 | + |
| 306 | +```bash |
| 307 | +pytest |
| 308 | +pytest --cov=toon --cov-report=term-missing |
| 309 | +``` |
| 310 | + |
| 311 | +### Running Examples |
| 312 | + |
| 313 | +```bash |
| 314 | +python examples/basic_usage.py |
| 315 | +python examples/advanced_features.py |
| 316 | +``` |
| 317 | + |
| 318 | +## Performance |
| 319 | + |
| 320 | +TOON typically achieves: |
| 321 | +- **30-60% size reduction** vs JSON for structured data |
| 322 | +- **40-70% token reduction** with tabular data |
| 323 | +- **Minimal overhead** in encoding/decoding (<1ms for typical payloads) |
| 324 | + |
| 325 | +## Contributing |
| 326 | + |
| 327 | +Contributions are welcome! Please: |
| 328 | + |
| 329 | +1. Fork the repository |
| 330 | +2. Create a feature branch (`git checkout -b feature/amazing-feature`) |
| 331 | +3. Make your changes with tests |
| 332 | +4. Run tests (`pytest`) |
| 333 | +5. Commit your changes (`git commit -m 'Add amazing feature'`) |
| 334 | +6. Push to the branch (`git push origin feature/amazing-feature`) |
| 335 | +7. Open a Pull Request |
| 336 | + |
| 337 | +## License |
| 338 | + |
| 339 | +MIT License - see [LICENSE](LICENSE) file for details. |
| 340 | + |
| 341 | +## Credits |
| 342 | + |
| 343 | +Python implementation inspired by the TypeScript TOON library at [toon-format/toon](https://github.com/toon-format/toon). |
| 344 | + |
| 345 | +## Links |
| 346 | + |
| 347 | +- **GitHub**: https://github.com/ScrapeGraphAI/toonify |
| 348 | +- **PyPI**: https://pypi.org/project/toonify/ |
| 349 | +- **Documentation**: https://github.com/ScrapeGraphAI/toonify#readme |
| 350 | +- **Format Spec**: https://github.com/toon-format/toon |
| 351 | + |
| 352 | +--- |
| 353 | + |
| 354 | +Made with love by the [ScrapeGraph team](https://scrapegraphai.com) |
| 355 | + |
| 356 | +<p align="center"> |
| 357 | + <img src="https://github.com/ScrapeGraphAI/Scrapegraph-ai/blob/main/docs/assets/scrapegraphai_logo.png" alt="ScrapeGraphAI Logo" width="250"> |
| 358 | +</p> |
0 commit comments