PHP Classes

File: docs/SPECIFICATION_V2.md

Recommend this page to a friend!
  Packages of Stefano D'Agostino   ATON Format PHP   docs/SPECIFICATION_V2.md   Download  
File: docs/SPECIFICATION_V2.md
Role: Auxiliary data
Content type: text/markdown
Description: Auxiliary data
Class: ATON Format PHP
Encode and decode values using the ATON format
Author: By
Last change:
Date: 3 months ago
Size: 5,840 bytes
 

Contents

Class file image Download

ATON Format Specification V2

Overview

ATON V2 builds on V1 with advanced features for better compression, querying, and large dataset handling.

New Features in V2

  1. Dictionary Compression: Automatic deduplication of repeated strings
  2. Default Values: Skip encoding when values match defaults
  3. Query Language: SQL-like filtering and sorting
  4. Streaming Encoder: Process large datasets in chunks
  5. Compression Modes: FAST, BALANCED, ULTRA, ADAPTIVE

Format Structure

Complete Syntax

@dict[#0:"repeated value", #1:"another repeated"]
@schema[field1:type1, field2:type2, ...]
@defaults[field1:defaultValue, field2:defaultValue]
@queryable[tableName]

tableName(recordCount):
  value1, value2, ...
  value1, value2, ...

Dictionary Compression

Purpose

Reduces token usage by replacing repeated strings with short references.

Syntax

@dict[#0:"Long repeated string", #1:"Another common value"]

Usage in Data

@dict[#0:"Electronics", #1:"In Stock"]
@schema[id:int, name:str, category:str, status:str]

products(3):
  1, "Laptop", #0, #1
  2, "Mouse", #0, #1
  3, "Keyboard", #0, "Out of Stock"

Compression Thresholds

| Mode | Min Length | Min Occurrences | |------|------------|-----------------| | FAST | No compression | - | | BALANCED | 5 chars | 3 times | | ULTRA | 3 chars | 2 times | | ADAPTIVE | Auto-selected based on data size |

Default Values

Purpose

Skip encoding values that match the most common value for a field.

Syntax

@defaults[status:"active", verified:true]

Example

@schema[id:int, name:str, status:str, verified:bool]
@defaults[status:"active", verified:true]

users(4):
  1, "Alice"
  2, "Bob"
  3, "Carol", "inactive"
  4, "Dave", "active", false

Users 1 and 2 have default status and verified values (not encoded). User 3 has non-default status. User 4 has non-default verified.

Query Language

Syntax

tableName [SELECT fields] [WHERE conditions] [ORDER BY field [ASC|DESC]] [LIMIT n] [OFFSET n]

Operators

| Operator | Description | Example | |----------|-------------|---------| | = | Equals | status = 'active' | | !=, <> | Not equals | status != 'deleted' | | < | Less than | age < 30 | | > | Greater than | price > 100 | | <= | Less or equal | count <= 10 | | >= | Greater or equal | score >= 80 | | LIKE | Pattern match | name LIKE '%john%' | | IN | In set | category IN ('A', 'B') | | NOT IN | Not in set | status NOT IN ('deleted') | | BETWEEN | Range | price BETWEEN 10 AND 100 |

Logical Operators

  • `AND`: Both conditions must be true
  • `OR`: Either condition must be true
  • `NOT`: Negates condition
  • Parentheses for grouping: `(a OR b) AND c`

Examples

-- Simple filter
users WHERE active = true

-- Multiple conditions
products WHERE price > 100 AND category = 'Electronics'

-- Pattern matching
users WHERE email LIKE '%@gmail.com'

-- Sorting and pagination
orders WHERE status = 'pending' ORDER BY created_at DESC LIMIT 10

-- Field selection
users SELECT id, name, email WHERE verified = true

Streaming Format

Chunk Structure

First chunk includes full schema:

@schema[id:int, name:str]

records(100):
  1, "First"
  2, "Second"
  ...

Subsequent chunks use continuation syntax:

records+(100):
  101, "Next"
  102, "Another"
  ...

Metadata

Each chunk includes: - chunkId: Current chunk number (0-indexed) - totalChunks: Total number of chunks - isFirst: Boolean, true for first chunk - isLast: Boolean, true for last chunk - metadata.table: Table name - metadata.recordsInChunk: Records in this chunk - metadata.startIdx: Starting record index - metadata.endIdx: Ending record index - metadata.totalRecords: Total records across all chunks - metadata.progress: Completion percentage (0.0 to 1.0)

Compression Modes

FAST

  • No dictionary compression
  • Fastest encoding
  • Best for: Small datasets, real-time encoding

BALANCED (Default)

  • Dictionary compression for strings ?5 chars appearing ?3 times
  • Good balance of speed and compression
  • Best for: General purpose use

ULTRA

  • Aggressive dictionary compression (?3 chars, ?2 times)
  • Maximum compression
  • Best for: Large datasets, bandwidth-constrained scenarios

ADAPTIVE

  • Automatically selects mode based on data size: - < 1KB: FAST - 1KB - 10KB: BALANCED - > 10KB: ULTRA

PHP Implementation

Encoder

use Aton\Encoder;
use Aton\Enums\CompressionMode;

$encoder = new Encoder(
    optimize: true,
    compression: CompressionMode::BALANCED,
    queryable: true,
    validate: true
);

// Basic encoding
$aton = $encoder->encode($data);

// With query filter
$aton = $encoder->encodeWithQuery($data, "users WHERE active = true");

// Get stats
$stats = $encoder->getCompressionStats($data);

Decoder

use Aton\Decoder;

$decoder = new Decoder(validate: true);
$data = $decoder->decode($atonString);

Query Engine

use Aton\QueryEngine;

$engine = new QueryEngine();
$query = $engine->parse("products WHERE price > 100 ORDER BY price DESC");
$results = $engine->execute($data, $query);

Stream Encoder

use Aton\StreamEncoder;
use Aton\Enums\CompressionMode;

$encoder = new StreamEncoder(
    chunkSize: 100,
    compression: CompressionMode::BALANCED
);

foreach ($encoder->streamEncode($largeData) as $chunk) {
    processChunk($chunk['data']);
}

Migration from V1

V2 is fully backward compatible with V1. To use V1-style encoding:

$encoder = new Encoder(
    optimize: false,              // Disable defaults optimization
    compression: CompressionMode::FAST  // No dictionary compression
);

V2 decoder can read both V1 and V2 format without any changes.