Structure data with ASCII control codes.
One vocabulary, multiple shapes.
Between text formats and binary formats — values are UTF-8, structure is single bytes
{
"users": [
{"name": "Alice", "amount": "1502.30"},
{"name": "Bob", "amount": "340.00"}
],
"products": [
{"id": "01", "product": "Widget"},
{"id": "02", "product": "Gadget"}
]
}
␜mydb ␝users ␁name␟amount ␞Alice␟1502.30 ␞Bob␟340.00 ␝products ␁id␟product ␞01␟Widget ␞02␟Gadget ␄
Each glyph is a single-byte control code. No braces, no quotes, no escaping.
Four hierarchical data separators — the same ones the ASCII committee designed in 1963.
ASCII has 32 control codes (0x00–0x1F) designed for structuring data. C0DATA assigns meaning to 11 of them.
| Glyph | Byte | Abbr | Role |
|---|---|---|---|
| ␜ | 0x1C | FS | File / Database separator |
| ␝ | 0x1D | GS | Group / Table / Section separator |
| ␞ | 0x1E | RS | Record / Row separator |
| ␟ | 0x1F | US | Unit / Field separator |
| ␁ | 0x01 | SOH | Header (field names) |
| ␄ | 0x04 | EOT | End of document |
| ␂ | 0x02 | STX | Open nested sub-structure |
| ␃ | 0x03 | ETX | Close nested sub-structure |
| ␅ | 0x05 | ENQ | Reference (look up named data) |
| ␐ | 0x10 | DLE | Escape (next byte is literal) |
| ␚ | 0x1A | SUB | Substitution (C0DIFF) |
The same codes express every common data shape.
// Like CSV or SQL tables ␝users ␁name␟amount␟type ␞Alice␟1502.30␟DEPOSIT ␞Bob␟340.00␟WITHDRAWAL
// Like TOML or INI ␝database ␞host␟localhost ␞port␟5432 ␝server ␞host␟0.0.0.0 ␞port␟8080
// Like Markdown sections ␜My Document ␝Chapter 1 ␞First paragraph of text. ␞A list: ␟Item one ␟Item two ␝␝Section 1.1 ␞Nested content.
// Atomic multi-file edits ␜foo.txt ␝Hello ␟world␚universe␟! // Find "Hello world!", replace "world" → "universe"
Single-byte delimiters. Zero-copy parsing. The hot loop is one comparison: byte < 0x20.
c compact — canonical wire/storage form p pretty — human-readable with Unicode glyphs
c Zero-copy — no parsing or allocation.
Fields are byte slices into the original buffer.
Benchmarks on Intel Core Ultra 7 155H, 16 GB RAM, Crystal 1.19.1 compiled with --release. Tokenizer benchmark on 10 MB synthetic document. Conversion benchmarks on 10,000-row table (506 KB C0DATA).
Add C0DATA to your Crystal project, or use the c0fmt command-line tool.
# shard.yml dependencies: c0: github: trans/c0data
buf = C0::Builder.build do |b| b.group("users", headers: ["name", "amount"]) do b.record("Alice", "1502.30") b.record("Bob", "340.00") end end
table = C0::Table.new(buf) table.record(0).field(0) # => "Alice" (zero-copy slice)
# Command line
c0fmt import data.csv | c0fmt export json
c0fmt import config.json | c0fmt pretty
c0fmt import yaml settings.yml | c0fmt compact -o data.c0
crystal build src/c0fmt.cr -o bin/c0fmt --release