C0DATA

Structure data with ASCII control codes.
One vocabulary, multiple shapes.

Between text formats and binary formats — values are UTF-8, structure is single bytes

JSON 198 bytes

{
  "users": [
    {"name": "Alice", "amount": "1502.30"},
    {"name": "Bob", "amount": "340.00"}
  ],
  "products": [
    {"id": "01", "product": "Widget"},
    {"id": "02", "product": "Gadget"}
  ]
}

C0DATA 100 bytes

␜mydb
  ␝users
    ␁name␟amount
    ␞Alice␟1502.30
    ␞Bob␟340.00
  ␝products
    ␁id␟product
    ␞01␟Widget
    ␞02␟Gadget
␄

Each glyph is a single-byte control code. No braces, no quotes, no escaping.

␜ File

›

␝ Group

›

␞ Record

›

␟ Unit

Four hierarchical data separators — the same ones the ASCII committee designed in 1963.

How It Works

ASCII has 32 control codes (0x00–0x1F) designed for structuring data. C0DATA assigns meaning to 11 of them.

Glyph	Byte	Abbr	Role
␜	0x1C	FS	File / Database separator
␝	0x1D	GS	Group / Table / Section separator
␞	0x1E	RS	Record / Row separator
␟	0x1F	US	Unit / Field separator
␁	0x01	SOH	Header (field names)
␄	0x04	EOT	End of document
␂	0x02	STX	Open nested sub-structure
␃	0x03	ETX	Close nested sub-structure
␅	0x05	ENQ	Reference (look up named data)
␐	0x10	DLE	Escape (next byte is literal)
␚	0x1A	SUB	Substitution (C0DIFF)

Multiple Shapes

The same codes express every common data shape.

// Like CSV or SQL tables
␝users
  ␁name␟amount␟type
  ␞Alice␟1502.30␟DEPOSIT
  ␞Bob␟340.00␟WITHDRAWAL

// Like TOML or INI
␝database
  ␞host␟localhost
  ␞port␟5432
␝server
  ␞host␟0.0.0.0
  ␞port␟8080

// Like Markdown sections
␜My Document
  ␝Chapter 1
    ␞First paragraph of text.
    ␞A list:
      ␟Item one
      ␟Item two
    ␝␝Section 1.1
      ␞Nested content.

// Atomic multi-file edits
␜foo.txt
  ␝Hello ␟world␚universe␟!

// Find "Hello world!", replace "world" → "universe"

Benchmarks

Single-byte delimiters. Zero-copy parsing. The hot loop is one comparison: byte < 0x20.

1,928

MB/s tokenizer

~3x

smaller than JSON

Format Size (10K rows, 5 fields)

C0DATA^c

506 KB

CSV

526 KB

C0DATA^p

636 KB

YAML

966 KB

JSON

1,476 KB

^c compact — canonical wire/storage form ^p pretty — human-readable with Unicode glyphs

Parse (format → data)

C0DATA^c

0.4 ms

C0DATA^p

2.7 ms

CSV

3.4 ms

JSON

9.0 ms

YAML

27.5 ms

^c Zero-copy — no parsing or allocation.
Fields are byte slices into the original buffer.

Serialize (data → format)

C0DATA^c

0.8 ms

C0DATA^p

1.2 ms

CSV

5.7 ms

JSON

9.5 ms

YAML

24.8 ms

Benchmarks on Intel Core Ultra 7 155H, 16 GB RAM, Crystal 1.19.1 compiled with --release. Tokenizer benchmark on 10 MB synthetic document. Conversion benchmarks on 10,000-row table (506 KB C0DATA).

Get Started

Add C0DATA to your Crystal project, or use the c0fmt command-line tool.

As a library

# shard.yml
dependencies:
  c0:
    github: trans/c0data

Build data

buf = C0::Builder.build do |b|
  b.group("users", headers: ["name", "amount"]) do
    b.record("Alice", "1502.30")
    b.record("Bob", "340.00")
  end
end

Read data

table = C0::Table.new(buf)
table.record(0).field(0)  # => "Alice" (zero-copy slice)

Convert formats

# Command line
c0fmt import data.csv | c0fmt export json
c0fmt import config.json | c0fmt pretty
c0fmt import yaml settings.yml | c0fmt compact -o data.c0

Build c0fmt

crystal build src/c0fmt.cr -o bin/c0fmt --release