C0DATA

Structure data with ASCII control codes.
One vocabulary, multiple shapes.

Between text formats and binary formats — values are UTF-8, structure is single bytes

JSON 198 bytes
{
  "users": [
    {"name": "Alice", "amount": "1502.30"},
    {"name": "Bob", "amount": "340.00"}
  ],
  "products": [
    {"id": "01", "product": "Widget"},
    {"id": "02", "product": "Gadget"}
  ]
}
C0DATA 100 bytes
mydb
  users
    nameamount
    Alice1502.30
    Bob340.00
  products
    idproduct
    01Widget
    02Gadget

Each glyph is a single-byte control code. No braces, no quotes, no escaping.

File
Group
Record
Unit

Four hierarchical data separators — the same ones the ASCII committee designed in 1963.

How It Works

ASCII has 32 control codes (0x00–0x1F) designed for structuring data. C0DATA assigns meaning to 11 of them.

GlyphByteAbbrRole
0x1CFSFile / Database separator
0x1DGSGroup / Table / Section separator
0x1ERSRecord / Row separator
0x1FUSUnit / Field separator
0x01SOHHeader (field names)
0x04EOTEnd of document
0x02STXOpen nested sub-structure
0x03ETXClose nested sub-structure
0x05ENQReference (look up named data)
0x10DLEEscape (next byte is literal)
0x1ASUBSubstitution (C0DIFF)

Multiple Shapes

The same codes express every common data shape.

// Like CSV or SQL tables
users
  nameamounttype
  Alice1502.30DEPOSIT
  Bob340.00WITHDRAWAL
// Like TOML or INI
database
  hostlocalhost
  port5432
server
  host0.0.0.0
  port8080
// Like Markdown sections
My Document
  Chapter 1
    First paragraph of text.
    A list:
      Item one
      Item two
    ␝␝Section 1.1
      Nested content.
// Atomic multi-file edits
foo.txt
  Hello worlduniverse!

// Find "Hello world!", replace "world" → "universe"

Benchmarks

Single-byte delimiters. Zero-copy parsing. The hot loop is one comparison: byte < 0x20.

1,928
MB/s tokenizer
~3x
smaller than JSON

Format Size (10K rows, 5 fields)

C0DATAc
506 KB
CSV
526 KB
C0DATAp
636 KB
YAML
966 KB
JSON
1,476 KB

c compact — canonical wire/storage form   p pretty — human-readable with Unicode glyphs

Parse (format → data)

C0DATAc
0.4 ms
C0DATAp
2.7 ms
CSV
3.4 ms
JSON
9.0 ms
YAML
27.5 ms

c Zero-copy — no parsing or allocation.
Fields are byte slices into the original buffer.

Serialize (data → format)

C0DATAc
0.8 ms
C0DATAp
1.2 ms
CSV
5.7 ms
JSON
9.5 ms
YAML
24.8 ms

Benchmarks on Intel Core Ultra 7 155H, 16 GB RAM, Crystal 1.19.1 compiled with --release. Tokenizer benchmark on 10 MB synthetic document. Conversion benchmarks on 10,000-row table (506 KB C0DATA).

Get Started

Add C0DATA to your Crystal project, or use the c0fmt command-line tool.

As a library

# shard.yml
dependencies:
  c0:
    github: trans/c0data

Build data

buf = C0::Builder.build do |b|
  b.group("users", headers: ["name", "amount"]) do
    b.record("Alice", "1502.30")
    b.record("Bob", "340.00")
  end
end

Read data

table = C0::Table.new(buf)
table.record(0).field(0)  # => "Alice" (zero-copy slice)

Convert formats

# Command line
c0fmt import data.csv | c0fmt export json
c0fmt import config.json | c0fmt pretty
c0fmt import yaml settings.yml | c0fmt compact -o data.c0

Build c0fmt

crystal build src/c0fmt.cr -o bin/c0fmt --release