Storage Adapters
Storage adapters enable Horizon Epoch to work with different data storage systems while providing a unified interface for version control operations.
Supported Backends
Horizon Epoch supports 8 storage backends:
| Backend | Status | Constraint Support | Use Case |
|---|---|---|---|
| PostgreSQL | Production | Full | Transactional databases |
| MySQL | Production | Full | Transactional databases |
| Microsoft SQL Server | Production | Full | Enterprise databases |
| SQLite | Production | Partial | Local/embedded databases |
| S3/Delta Lake | Production | Metadata Only | Data lakes, analytics |
| Azure Blob Storage | Production | Metadata Only | Cloud data lakes |
| Google Cloud Storage | Production | Metadata Only | Cloud data lakes |
| Local Filesystem | Production | Metadata Only | Development, testing |
Adapter Architecture
┌─────────────────────────────────────────────────────────────┐
│ Storage Abstraction Layer │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ StorageAdapter Trait │ │
│ │ - read_records() - write_records() │ │
│ │ - delete_records() - get_schema() │ │
│ │ - list_tables() - scan_table() │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ PostgreSQL │ │ MySQL │ │ MSSQL │ │ SQLite │
│ Adapter │ │ Adapter │ │ Adapter │ │ Adapter │
└────────────┘ └────────────┘ └────────────┘ └────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ S3/Delta │ │ Azure │ │ GCS │ │ LocalFS │
│ Adapter │ │ Adapter │ │ Adapter │ │ Adapter │
└────────────┘ └────────────┘ └────────────┘ └────────────┘
Common Interface
All storage adapters implement these core operations:
Record Operations
#![allow(unused)]
fn main() {
// Read specific records by primary key
fn read_records(
&self,
table: &str,
keys: &[RecordKey],
branch_context: &BranchContext
) -> Result<Vec<Record>>;
// Write records (insert or update)
fn write_records(
&self,
table: &str,
records: &[Record],
branch_context: &BranchContext
) -> Result<WriteResult>;
// Delete records by primary key
fn delete_records(
&self,
table: &str,
keys: &[RecordKey],
branch_context: &BranchContext
) -> Result<DeleteResult>;
}
Schema Operations
#![allow(unused)]
fn main() {
// Get table schema
fn get_schema(&self, table: &str) -> Result<Schema>;
// Detect schema changes
fn compare_schema(
&self,
table: &str,
old_schema: &Schema
) -> Result<SchemaChanges>;
}
Table Operations
#![allow(unused)]
fn main() {
// List available tables
fn list_tables(&self) -> Result<Vec<TableInfo>>;
// Scan entire table (for initial import or full sync)
fn scan_table(
&self,
table: &str,
options: ScanOptions
) -> Result<RecordStream>;
// Get table statistics
fn get_table_stats(&self, table: &str) -> Result<TableStats>;
}
Relational Database Adapters
PostgreSQL Adapter
Capabilities:
- Direct SQL execution
- ACID transactions
- Full schema introspection
- Constraint enforcement (foreign keys, unique, check, exclusion)
- Connection pooling
- TLS/SSL secure connections
Configuration:
[storage.postgresql.mydb]
url = "postgresql://user:pass@host/database"
pool_size = 10
connect_timeout = 30
statement_timeout = 60000 # 60 seconds
ssl_mode = "require" # disable, prefer, require
MySQL Adapter
Capabilities:
- Full CRUD operations
- Schema introspection via
information_schema - Full constraint support
- 5 SSL/TLS modes
- SSH tunnel support
- Connection pooling
Configuration:
[storage.mysql.mydb]
url = "mysql://user:pass@host/database"
pool_size = 10
ssl_mode = "required" # disabled, preferred, required, verify_ca, verify_identity
Microsoft SQL Server Adapter
Capabilities:
- Full CRUD operations
- Schema introspection via
sys.*catalog - Full constraint support
- TLS/SSL encryption (3 levels)
- Windows and SQL Server authentication
Configuration:
[storage.mssql.mydb]
url = "mssql://user:pass@host/database"
pool_size = 10
encrypt = "required" # off, on, required
trust_server_certificate = false
SQLite Adapter
Capabilities:
- File-based and in-memory databases
- Schema introspection via
PRAGMA - Partial constraint support (foreign keys require
PRAGMA foreign_keys = ON) - WAL journal mode for concurrency
- Connection pooling with health checks
Configuration:
[storage.sqlite.local]
path = "/path/to/database.db"
# Or use in-memory:
# path = ":memory:"
journal_mode = "wal"
foreign_keys = true
Object Storage Adapters
All object storage adapters use Delta Lake format for transactional semantics.
S3/Delta Lake Adapter
Capabilities:
- Delta Lake protocol support
- Parquet file format
- Time travel via Delta versioning
- Schema evolution
- Efficient columnar storage
- AWS S3 and S3-compatible (MinIO, etc.)
Configuration:
[storage.s3.datalake]
bucket = "company-datalake"
region = "us-east-1"
prefix = "horizon-epoch/"
endpoint = "https://s3.amazonaws.com" # Optional, for S3-compatible
delta_log_retention_days = 30
Azure Blob Storage Adapter
Capabilities:
- Delta Lake format on Azure Blob
- Multiple authentication methods:
- Account key
- SAS token
- Service principal
- Managed identity
- Copy-on-write semantics
Configuration:
[storage.azure.datalake]
account = "mystorageaccount"
container = "data"
prefix = "horizon-epoch/"
auth_method = "account_key" # account_key, sas_token, service_principal, managed_identity
Google Cloud Storage Adapter
Capabilities:
- Delta Lake format on GCS
- Authentication:
- Service account key
- Application Default Credentials (ADC)
- Copy-on-write semantics
Configuration:
[storage.gcs.datalake]
bucket = "my-gcs-bucket"
prefix = "horizon-epoch/"
project_id = "my-project"
auth_method = "service_account" # service_account, adc
Local Filesystem Adapter
Capabilities:
- Delta Lake format on local disk
- SQL query execution via DataFusion
- Ideal for development and testing
- Copy-on-write semantics
Configuration:
[storage.local.dev]
path = "/data/horizon-epoch"
Constraint Support Levels
| Level | Description | Backends |
|---|---|---|
| Full | Constraints enforced via DDL | PostgreSQL, MySQL, MSSQL |
| Partial | Some constraints require config | SQLite |
| Metadata Only | Stored but not enforced | S3, Azure, GCS, LocalFS |
For object storage backends, constraints are versioned in metadata but enforcement happens at the application layer or when promoting to a relational database.
Adapter Selection
Horizon Epoch selects adapters based on table location URL scheme:
# PostgreSQL table
client.track_table("users", StorageLocation.postgresql("main", "public", "users"))
# MySQL table
client.track_table("orders", StorageLocation.mysql("main", "mydb", "orders"))
# S3 Delta table
client.track_table("events", StorageLocation.s3("bucket", "delta/events"))
# Local filesystem
client.track_table("test_data", StorageLocation.local("/data/test"))
Performance Characteristics
| Operation | PostgreSQL | MySQL | MSSQL | SQLite | S3/Delta | Azure/GCS |
|---|---|---|---|---|---|---|
| Point read | Fast | Fast | Fast | Fast | Medium | Medium |
| Range scan | Fast | Fast | Fast | Fast | Fast | Fast |
| Write single | Fast | Fast | Fast | Fast | Slow | Slow |
| Write batch | Fast | Fast | Fast | Medium | Fast | Fast |
| Branch create | Instant | Instant | Instant | Instant | Instant | Instant |
| Full scan | Depends | Depends | Depends | Fast | Very Fast | Very Fast |
Best Practices
Relational Databases (PostgreSQL, MySQL, MSSQL)
- Ensure proper indexes on primary keys and commonly queried columns
- Use connection pooling for high-throughput workloads
- Configure appropriate timeouts for long-running operations
- Monitor overlay table size and compact periodically
SQLite
- Use WAL mode for better concurrency (
journal_mode = "wal") - Enable foreign keys explicitly if needed (
foreign_keys = true) - Use file-based for persistence, in-memory for testing
Object Storage (S3, Azure, GCS, Local)
- Use appropriate file sizes (target 100MB-1GB per file)
- Partition large tables for efficient queries
- Enable predicate pushdown for filtered scans
- Run OPTIMIZE periodically for better read performance
- Configure retention for Delta log cleanup
Troubleshooting
Connection Issues
PostgreSQL/MySQL/MSSQL:
- Verify network connectivity and firewall rules
- Check SSL/TLS configuration matches server requirements
- Ensure connection pool size is appropriate for workload
Object Storage:
- Verify credentials and permissions
- Check bucket/container exists and is accessible
- Ensure network allows access to storage endpoints
Performance Issues
Slow Queries:
- Add indexes for overlay joins (relational)
- Analyze query plans
- Consider materializing branches for read-heavy workloads
Slow Writes:
- Batch writes together (especially for object storage)
- Check network latency to storage
- Use regional endpoints for cloud storage