Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Storage Adapters

Storage adapters enable Horizon Epoch to work with different data storage systems while providing a unified interface for version control operations.

Supported Backends

Horizon Epoch supports 8 storage backends:

BackendStatusConstraint SupportUse Case
PostgreSQLProductionFullTransactional databases
MySQLProductionFullTransactional databases
Microsoft SQL ServerProductionFullEnterprise databases
SQLiteProductionPartialLocal/embedded databases
S3/Delta LakeProductionMetadata OnlyData lakes, analytics
Azure Blob StorageProductionMetadata OnlyCloud data lakes
Google Cloud StorageProductionMetadata OnlyCloud data lakes
Local FilesystemProductionMetadata OnlyDevelopment, testing

Adapter Architecture

┌─────────────────────────────────────────────────────────────┐
│                  Storage Abstraction Layer                   │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │              StorageAdapter Trait                       │ │
│  │  - read_records()    - write_records()                 │ │
│  │  - delete_records()  - get_schema()                    │ │
│  │  - list_tables()     - scan_table()                    │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
        │              │              │              │
        ▼              ▼              ▼              ▼
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ PostgreSQL │ │   MySQL    │ │    MSSQL   │ │   SQLite   │
│  Adapter   │ │  Adapter   │ │   Adapter  │ │  Adapter   │
└────────────┘ └────────────┘ └────────────┘ └────────────┘
        │              │              │              │
        ▼              ▼              ▼              ▼
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ S3/Delta   │ │   Azure    │ │    GCS     │ │  LocalFS   │
│  Adapter   │ │   Adapter  │ │  Adapter   │ │  Adapter   │
└────────────┘ └────────────┘ └────────────┘ └────────────┘

Common Interface

All storage adapters implement these core operations:

Record Operations

#![allow(unused)]
fn main() {
// Read specific records by primary key
fn read_records(
    &self,
    table: &str,
    keys: &[RecordKey],
    branch_context: &BranchContext
) -> Result<Vec<Record>>;

// Write records (insert or update)
fn write_records(
    &self,
    table: &str,
    records: &[Record],
    branch_context: &BranchContext
) -> Result<WriteResult>;

// Delete records by primary key
fn delete_records(
    &self,
    table: &str,
    keys: &[RecordKey],
    branch_context: &BranchContext
) -> Result<DeleteResult>;
}

Schema Operations

#![allow(unused)]
fn main() {
// Get table schema
fn get_schema(&self, table: &str) -> Result<Schema>;

// Detect schema changes
fn compare_schema(
    &self,
    table: &str,
    old_schema: &Schema
) -> Result<SchemaChanges>;
}

Table Operations

#![allow(unused)]
fn main() {
// List available tables
fn list_tables(&self) -> Result<Vec<TableInfo>>;

// Scan entire table (for initial import or full sync)
fn scan_table(
    &self,
    table: &str,
    options: ScanOptions
) -> Result<RecordStream>;

// Get table statistics
fn get_table_stats(&self, table: &str) -> Result<TableStats>;
}

Relational Database Adapters

PostgreSQL Adapter

Capabilities:

  • Direct SQL execution
  • ACID transactions
  • Full schema introspection
  • Constraint enforcement (foreign keys, unique, check, exclusion)
  • Connection pooling
  • TLS/SSL secure connections

Configuration:

[storage.postgresql.mydb]
url = "postgresql://user:pass@host/database"
pool_size = 10
connect_timeout = 30
statement_timeout = 60000  # 60 seconds
ssl_mode = "require"       # disable, prefer, require

MySQL Adapter

Capabilities:

  • Full CRUD operations
  • Schema introspection via information_schema
  • Full constraint support
  • 5 SSL/TLS modes
  • SSH tunnel support
  • Connection pooling

Configuration:

[storage.mysql.mydb]
url = "mysql://user:pass@host/database"
pool_size = 10
ssl_mode = "required"  # disabled, preferred, required, verify_ca, verify_identity

Microsoft SQL Server Adapter

Capabilities:

  • Full CRUD operations
  • Schema introspection via sys.* catalog
  • Full constraint support
  • TLS/SSL encryption (3 levels)
  • Windows and SQL Server authentication

Configuration:

[storage.mssql.mydb]
url = "mssql://user:pass@host/database"
pool_size = 10
encrypt = "required"  # off, on, required
trust_server_certificate = false

SQLite Adapter

Capabilities:

  • File-based and in-memory databases
  • Schema introspection via PRAGMA
  • Partial constraint support (foreign keys require PRAGMA foreign_keys = ON)
  • WAL journal mode for concurrency
  • Connection pooling with health checks

Configuration:

[storage.sqlite.local]
path = "/path/to/database.db"
# Or use in-memory:
# path = ":memory:"
journal_mode = "wal"
foreign_keys = true

Object Storage Adapters

All object storage adapters use Delta Lake format for transactional semantics.

S3/Delta Lake Adapter

Capabilities:

  • Delta Lake protocol support
  • Parquet file format
  • Time travel via Delta versioning
  • Schema evolution
  • Efficient columnar storage
  • AWS S3 and S3-compatible (MinIO, etc.)

Configuration:

[storage.s3.datalake]
bucket = "company-datalake"
region = "us-east-1"
prefix = "horizon-epoch/"
endpoint = "https://s3.amazonaws.com"  # Optional, for S3-compatible
delta_log_retention_days = 30

Azure Blob Storage Adapter

Capabilities:

  • Delta Lake format on Azure Blob
  • Multiple authentication methods:
    • Account key
    • SAS token
    • Service principal
    • Managed identity
  • Copy-on-write semantics

Configuration:

[storage.azure.datalake]
account = "mystorageaccount"
container = "data"
prefix = "horizon-epoch/"
auth_method = "account_key"  # account_key, sas_token, service_principal, managed_identity

Google Cloud Storage Adapter

Capabilities:

  • Delta Lake format on GCS
  • Authentication:
    • Service account key
    • Application Default Credentials (ADC)
  • Copy-on-write semantics

Configuration:

[storage.gcs.datalake]
bucket = "my-gcs-bucket"
prefix = "horizon-epoch/"
project_id = "my-project"
auth_method = "service_account"  # service_account, adc

Local Filesystem Adapter

Capabilities:

  • Delta Lake format on local disk
  • SQL query execution via DataFusion
  • Ideal for development and testing
  • Copy-on-write semantics

Configuration:

[storage.local.dev]
path = "/data/horizon-epoch"

Constraint Support Levels

LevelDescriptionBackends
FullConstraints enforced via DDLPostgreSQL, MySQL, MSSQL
PartialSome constraints require configSQLite
Metadata OnlyStored but not enforcedS3, Azure, GCS, LocalFS

For object storage backends, constraints are versioned in metadata but enforcement happens at the application layer or when promoting to a relational database.


Adapter Selection

Horizon Epoch selects adapters based on table location URL scheme:

# PostgreSQL table
client.track_table("users", StorageLocation.postgresql("main", "public", "users"))

# MySQL table
client.track_table("orders", StorageLocation.mysql("main", "mydb", "orders"))

# S3 Delta table
client.track_table("events", StorageLocation.s3("bucket", "delta/events"))

# Local filesystem
client.track_table("test_data", StorageLocation.local("/data/test"))

Performance Characteristics

OperationPostgreSQLMySQLMSSQLSQLiteS3/DeltaAzure/GCS
Point readFastFastFastFastMediumMedium
Range scanFastFastFastFastFastFast
Write singleFastFastFastFastSlowSlow
Write batchFastFastFastMediumFastFast
Branch createInstantInstantInstantInstantInstantInstant
Full scanDependsDependsDependsFastVery FastVery Fast

Best Practices

Relational Databases (PostgreSQL, MySQL, MSSQL)

  1. Ensure proper indexes on primary keys and commonly queried columns
  2. Use connection pooling for high-throughput workloads
  3. Configure appropriate timeouts for long-running operations
  4. Monitor overlay table size and compact periodically

SQLite

  1. Use WAL mode for better concurrency (journal_mode = "wal")
  2. Enable foreign keys explicitly if needed (foreign_keys = true)
  3. Use file-based for persistence, in-memory for testing

Object Storage (S3, Azure, GCS, Local)

  1. Use appropriate file sizes (target 100MB-1GB per file)
  2. Partition large tables for efficient queries
  3. Enable predicate pushdown for filtered scans
  4. Run OPTIMIZE periodically for better read performance
  5. Configure retention for Delta log cleanup

Troubleshooting

Connection Issues

PostgreSQL/MySQL/MSSQL:

  • Verify network connectivity and firewall rules
  • Check SSL/TLS configuration matches server requirements
  • Ensure connection pool size is appropriate for workload

Object Storage:

  • Verify credentials and permissions
  • Check bucket/container exists and is accessible
  • Ensure network allows access to storage endpoints

Performance Issues

Slow Queries:

  • Add indexes for overlay joins (relational)
  • Analyze query plans
  • Consider materializing branches for read-heavy workloads

Slow Writes:

  • Batch writes together (especially for object storage)
  • Check network latency to storage
  • Use regional endpoints for cloud storage