Skip to content

Commit 7f9eeca

Browse files
committed
feat: add chinese readme
1 parent 303fabd commit 7f9eeca

2 files changed

Lines changed: 363 additions & 0 deletions

File tree

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# TOON (Token-Oriented Object Notation)
22

3+
[English](README.md) | [中文](README.zh-CN.md)
4+
35
A compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.
46

57
[![Python Version](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)

README.zh-CN.md

Lines changed: 361 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,361 @@
1+
# TOON(面向Token的对象表示法)
2+
3+
[English](README.md) | [中文](README.zh-CN.md)
4+
5+
一种紧凑、人类可读的序列化格式,专为向大型语言模型传递结构化数据而设计,显著减少Token使用量。
6+
7+
[![Python Version](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
8+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
9+
10+
## 概述
11+
12+
TOON在实现**CSV般的紧凑性**的同时增加了**明确的结构**,非常适合:
13+
- 降低LLM API调用的Token成本
14+
- 提高上下文窗口效率
15+
- 保持人类可读性
16+
- 保留数据结构和类型
17+
18+
### 主要特性
19+
20+
-**紧凑**:比JSON结构化数据小30-60%
21+
-**可读**:简洁、基于缩进的语法
22+
-**结构化**:保留嵌套对象和数组
23+
-**类型安全**:支持字符串、数字、布尔值、null
24+
-**灵活**:多种分隔符选项(逗号、制表符、竖线)
25+
-**智能**:对统一数组自动使用表格格式
26+
-**高效**:对深层嵌套对象的键折叠
27+
28+
## 安装
29+
30+
```bash
31+
pip install toonify
32+
```
33+
34+
开发环境安装:
35+
```bash
36+
pip install toonify[dev]
37+
```
38+
39+
## 快速开始
40+
41+
### Python API
42+
43+
```python
44+
from toon import encode, decode
45+
46+
# 将Python字典编码为TOON
47+
data = {
48+
'products': [
49+
{'sku': 'LAP-001', 'name': 'Gaming Laptop', 'price': 1299.99},
50+
{'sku': 'MOU-042', 'name': 'Wireless Mouse', 'price': 29.99}
51+
]
52+
}
53+
54+
toon_string = encode(data)
55+
print(toon_string)
56+
# 输出:
57+
# products[2]{sku,name,price}:
58+
# LAP-001,Gaming Laptop,1299.99
59+
# MOU-042,Wireless Mouse,29.99
60+
61+
# 将TOON解码回Python
62+
result = decode(toon_string)
63+
assert result == data
64+
```
65+
66+
### 命令行
67+
68+
```bash
69+
# 将JSON编码为TOON
70+
toon input.json -o output.toon
71+
72+
# 将TOON解码为JSON
73+
toon input.toon -o output.json
74+
75+
# 使用管道
76+
cat data.json | toon -e > data.toon
77+
78+
# 显示Token统计信息
79+
toon data.json --stats
80+
```
81+
82+
## TOON格式规范
83+
84+
### 基本语法
85+
86+
```toon
87+
# 简单的键值对
88+
title: Machine Learning Basics
89+
chapters: 12
90+
published: true
91+
```
92+
93+
### 数组
94+
95+
**原始数组**(内联):
96+
```toon
97+
temperatures: [72.5,68.3,75.1,70.8,73.2]
98+
categories: [electronics,computers,accessories]
99+
```
100+
101+
**表格数组**(具有标题的统一对象):
102+
```toon
103+
inventory[3]{sku,product,stock}:
104+
KB-789,Mechanical Keyboard,45
105+
MS-456,RGB Mouse Pad,128
106+
HD-234,USB Headset,67
107+
```
108+
109+
**列表数组**(非统一或嵌套):
110+
```toon
111+
tasks[2]:
112+
Complete documentation
113+
Review pull requests
114+
```
115+
116+
### 嵌套对象
117+
118+
```toon
119+
server:
120+
hostname: api-prod-01
121+
config:
122+
port: 8080
123+
region: us-east
124+
```
125+
126+
### 引号规则
127+
128+
字符串仅在必要时使用引号:
129+
- 包含特殊字符(`,``:``"`、换行符)
130+
- 有前导/尾随空格
131+
- 看起来像字面量(`true``false``null`
132+
- 为空字符串
133+
134+
```toon
135+
simple: ProductName
136+
quoted: "Product, Description"
137+
escaped: "Size: 15\" display"
138+
multiline: "First feature\nSecond feature"
139+
```
140+
141+
## API参考
142+
143+
### `encode(data, options=None)`
144+
145+
将Python对象转换为TOON字符串。
146+
147+
**参数:**
148+
- `data`:Python字典或列表
149+
- `options`:可选字典,包含:
150+
- `delimiter``'comma'`(默认)、`'tab'``'pipe'`
151+
- `indent`:每级缩进的空格数(默认:2)
152+
- `key_folding``'off'`(默认)或`'safe'`
153+
- `flatten_depth`:键折叠的最大深度(默认:None)
154+
155+
**示例:**
156+
```python
157+
toon = encode(data, {
158+
'delimiter': 'tab',
159+
'indent': 4,
160+
'key_folding': 'safe'
161+
})
162+
```
163+
164+
### `decode(toon_string, options=None)`
165+
166+
将TOON字符串转换为Python对象。
167+
168+
**参数:**
169+
- `toon_string`:TOON格式字符串
170+
- `options`:可选字典,包含:
171+
- `strict`:严格验证结构(默认:True)
172+
- `expand_paths``'off'`(默认)或`'safe'`
173+
- `default_delimiter`:默认分隔符(默认:`','`
174+
175+
**示例:**
176+
```python
177+
data = decode(toon_string, {
178+
'expand_paths': 'safe',
179+
'strict': False
180+
})
181+
```
182+
183+
## CLI使用
184+
185+
```
186+
用法:toon [-h] [-o OUTPUT] [-e] [-d] [--delimiter {comma,tab,pipe}]
187+
[--indent INDENT] [--stats] [--no-strict]
188+
[--key-folding {off,safe}] [--flatten-depth DEPTH]
189+
[--expand-paths {off,safe}]
190+
[input]
191+
192+
TOON (Token-Oriented Object Notation) - 在JSON和TOON格式之间转换
193+
194+
位置参数:
195+
input 输入文件路径(或"-"表示stdin)
196+
197+
可选参数:
198+
-h, --help 显示帮助信息并退出
199+
-o, --output OUTPUT 输出文件路径(默认:stdout)
200+
-e, --encode 强制编码模式(JSON到TOON)
201+
-d, --decode 强制解码模式(TOON到JSON)
202+
--delimiter {comma,tab,pipe}
203+
数组分隔符(默认:comma)
204+
--indent INDENT 缩进大小(默认:2)
205+
--stats 显示Token统计信息
206+
--no-strict 禁用严格验证(仅解码)
207+
--key-folding {off,safe}
208+
键折叠模式(仅编码)
209+
--flatten-depth DEPTH 最大键折叠深度(仅编码)
210+
--expand-paths {off,safe}
211+
路径扩展模式(仅解码)
212+
```
213+
214+
## 高级特性
215+
216+
### 键折叠
217+
218+
将单键链折叠为点分隔路径:
219+
220+
```python
221+
data = {
222+
'api': {
223+
'response': {
224+
'product': {
225+
'title': 'Wireless Keyboard'
226+
}
227+
}
228+
}
229+
}
230+
231+
# 使用key_folding='safe'
232+
toon = encode(data, {'key_folding': 'safe'})
233+
# 输出:api.response.product.title: Wireless Keyboard
234+
```
235+
236+
### 路径扩展
237+
238+
将点分隔的键扩展为嵌套对象:
239+
240+
```python
241+
toon = 'store.location.zipcode: 10001'
242+
243+
# 使用expand_paths='safe'
244+
data = decode(toon, {'expand_paths': 'safe'})
245+
# 结果:{'store': {'location': {'zipcode': 10001}}}
246+
```
247+
248+
### 自定义分隔符
249+
250+
选择最适合您数据的分隔符:
251+
252+
```python
253+
# 制表符分隔符(更适合类似电子表格的数据)
254+
toon = encode(data, {'delimiter': 'tab'})
255+
256+
# 竖线分隔符(当数据包含逗号时)
257+
toon = encode(data, {'delimiter': 'pipe'})
258+
```
259+
260+
## 格式比较
261+
262+
### JSON vs TOON
263+
264+
**JSON**(247字节):
265+
```json
266+
{
267+
"products": [
268+
{"id": 101, "name": "Laptop Pro", "price": 1299},
269+
{"id": 102, "name": "Magic Mouse", "price": 79},
270+
{"id": 103, "name": "USB-C Cable", "price": 19}
271+
]
272+
}
273+
```
274+
275+
**TOON**(98字节,**减少60%**):
276+
```toon
277+
products[3]{id,name,price}:
278+
101,Laptop Pro,1299
279+
102,Magic Mouse,79
280+
103,USB-C Cable,19
281+
```
282+
283+
### 何时使用TOON
284+
285+
**使用TOON的场景:**
286+
- ✅ 向LLM API传递数据(降低Token成本)
287+
- ✅ 处理统一的表格数据
288+
- ✅ 上下文窗口受限
289+
- ✅ 重视人类可读性
290+
291+
**使用JSON的场景:**
292+
- ❌ 需要最大兼容性
293+
- ❌ 数据高度不规则/嵌套
294+
- ❌ 使用仅支持JSON的现有工具
295+
296+
## 开发
297+
298+
### 设置
299+
300+
```bash
301+
git clone https://github.com/ScrapeGraphAI/toonify.git
302+
cd toonify
303+
pip install -e .[dev]
304+
```
305+
306+
### 运行测试
307+
308+
```bash
309+
pytest
310+
pytest --cov=toon --cov-report=term-missing
311+
```
312+
313+
### 运行示例
314+
315+
```bash
316+
python examples/basic_usage.py
317+
python examples/advanced_features.py
318+
```
319+
320+
## 性能
321+
322+
TOON通常实现:
323+
- 与JSON相比,结构化数据**减少30-60%的大小**
324+
- 表格数据**减少40-70%的Token**
325+
- **最小的开销**用于编码/解码(典型有效负载<1ms)
326+
327+
## 贡献
328+
329+
欢迎贡献!请:
330+
331+
1. Fork仓库
332+
2. 创建功能分支(`git checkout -b feature/amazing-feature`
333+
3. 进行更改并编写测试
334+
4. 运行测试(`pytest`
335+
5. 提交更改(`git commit -m 'Add amazing feature'`
336+
6. 推送到分支(`git push origin feature/amazing-feature`
337+
7. 打开Pull Request
338+
339+
## 许可证
340+
341+
MIT许可证 - 详情请参见[LICENSE](LICENSE)文件。
342+
343+
## 致谢
344+
345+
Python实现受[toon-format/toon](https://github.com/toon-format/toon)的TypeScript TOON库启发。
346+
347+
## 链接
348+
349+
- **GitHub**https://github.com/ScrapeGraphAI/toonify
350+
- **PyPI**https://pypi.org/project/toonify/
351+
- **文档**https://github.com/ScrapeGraphAI/toonify#readme
352+
- **格式规范**https://github.com/toon-format/toon
353+
354+
---
355+
356+
[ScrapeGraph团队](https://scrapegraphai.com)用心制作
357+
358+
<p align="center">
359+
<img src="https://github.com/ScrapeGraphAI/Scrapegraph-ai/blob/main/docs/assets/scrapegraphai_logo.png" alt="ScrapeGraphAI Logo" width="250">
360+
</p>
361+

0 commit comments

Comments
 (0)