Multi-Backend Architecture
Horizon Epoch is designed to manage data across multiple, heterogeneous storage backends through a unified interface.
Design Philosophy
Separation of Concerns
┌─────────────────────────────────────────────────────────────┐
│ Horizon Epoch │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Metadata Layer │ │
│ │ - Commits, branches, tags │ │
│ │ - Table registrations │ │
│ │ - Change tracking indices │ │
│ │ - Version graph │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ │ (references, not data) │
│ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Storage Layer │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │PostgreSQL│ │ MySQL │ │SQL Server│ │ SQLite │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ AWS S3 │ │ Azure │ │ GCS │ │ Local │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Key insight: The metadata layer stores what changed and when, while storage adapters handle where the data physically lives.
Benefits
- No Data Migration - Keep data where it is
- Best Tool for Job - Use PostgreSQL for transactional, S3 for analytics
- Gradual Adoption - Add version control to existing infrastructure
- Unified Operations - Same commands work across all backends
Architecture Components
Storage Registry
Central registry of all configured backends:
config = Config(
metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
"prod_users", # Logical name
"postgresql://prod/users"
).add_postgres(
"prod_orders",
"postgresql://prod/orders"
).add_s3(
"analytics",
bucket="company-analytics"
)
Storage Location
Each table has a storage location that identifies:
- Protocol - Which adapter to use
- Backend Name - Which configured backend
- Path - Table identifier within the backend
postgresql://prod_users/public.users
│ │ │
protocol backend table path
Table Registration
Tables are registered with their location:
# PostgreSQL table
client.register_table("users", "postgresql://prod_users/public.users")
# S3 Delta table
client.register_table("events", "s3://analytics/delta/events")
Metadata References
Metadata stores references to data, not the data itself:
-- In metadata database
SELECT * FROM epoch_tables;
┌──────────┬────────────────────────────────────────┐
│ name │ location │
├──────────┼────────────────────────────────────────┤
│ users │ postgresql://prod_users/public.users │
│ orders │ postgresql://prod_orders/public.orders │
│ events │ s3://analytics/delta/events │
└──────────┴────────────────────────────────────────┘
Cross-Backend Operations
Branching
Branches span all registered tables:
# Creates a branch that covers:
# - users (PostgreSQL)
# - orders (PostgreSQL)
# - events (S3)
client.create_branch("feature/new-reporting")
Each backend maintains its own overlay:
- PostgreSQL: Overlay tables
- S3: Separate Delta log
Committing
Commits can include changes from multiple backends:
# Changes to users (PostgreSQL) and events (S3)
# are captured in a single commit
client.commit(message="Update user events schema")
The commit metadata tracks which backends have changes:
{
"commit_id": "abc123",
"message": "Update user events schema",
"changes": {
"postgresql://prod_users": ["users"],
"s3://analytics": ["events"]
}
}
Diffing
Diff operations aggregate across backends:
diff = client.diff("main", "feature/branch")
# Returns changes from all backends
for table_diff in diff.table_diffs:
print(f"{table_diff.location}: {table_diff.status}")
Merging
Merges are coordinated across backends:
- Compute changes per backend
- Detect conflicts per backend
- Apply changes per backend (in transaction if supported)
- Create unified merge commit
Consistency Model
Within a Backend
Operations within a single backend use that backend’s consistency guarantees:
- PostgreSQL: ACID transactions
- S3/Delta: Serializable via Delta protocol
Across Backends
Cross-backend operations provide best-effort consistency:
┌─────────────┐ ┌─────────────┐
│ PostgreSQL │ │ S3 │
│ commit │ │ commit │
└──────┬──────┘ └──────┬──────┘
│ │
└────────┬───────────┘
│
┌──────▼──────┐
│ Metadata │
│ Commit │
└─────────────┘
If one backend fails:
- The operation is marked as partial
- Rollback is attempted where possible
- User is notified of partial state
try:
client.commit(message="Multi-backend update")
except PartialCommitError as e:
print(f"Committed to: {e.successful_backends}")
print(f"Failed on: {e.failed_backends}")
# Manual intervention needed
Configuration Patterns
Separate Environments
# Development
[storage.postgres.dev_db]
url = "postgresql://localhost/dev"
# Staging
[storage.postgres.staging_db]
url = "postgresql://staging-db.internal/staging"
# Production
[storage.postgres.prod_db]
url = "postgresql://prod-db.internal/production"
aws_secret_id = "prod-db-credentials"
Mixed Workloads
# Transactional data in PostgreSQL
[storage.postgres.transactional]
url = "postgresql://oltp-db/production"
# Analytics in S3
[storage.s3.analytics]
bucket = "company-analytics"
region = "us-east-1"
# Archive in Glacier
[storage.s3.archive]
bucket = "company-archive"
region = "us-east-1"
storage_class = "GLACIER_IR"
Cross-Region
[storage.s3.us_data]
bucket = "data-us"
region = "us-east-1"
[storage.s3.eu_data]
bucket = "data-eu"
region = "eu-west-1"
Routing and Discovery
Explicit Routing
Specify backend when registering tables:
client.register_table("users", "postgresql://prod_users/public.users")
Pattern-Based Routing
Configure default routing patterns:
[routing]
# Tables starting with "raw_" go to S3
"raw_*" = "s3://analytics"
# Everything else goes to PostgreSQL
"*" = "postgresql://default"
Auto-Discovery
Discover tables from backends:
# List tables in a backend
tables = client.discover_tables("postgresql://prod_db")
# Register discovered tables
for table in tables:
client.register_table(table.name, table.location)
Performance Considerations
Query Routing
Queries are routed to the appropriate backend:
# Routed to PostgreSQL
client.query("SELECT * FROM users")
# Routed to S3
client.query("SELECT * FROM events")
Cross-Backend Queries
Currently, joins across backends are not supported in a single query. Use application-level joining:
# Query each backend
users = client.query("SELECT * FROM users")
events = client.query("SELECT * FROM events WHERE user_id IN (...)")
# Join in application
result = join(users, events, on="user_id")
Caching
Per-backend connection pooling and caching:
[storage.postgres.prod_db]
pool_size = 20
cache_schema = true
[storage.s3.analytics]
cache_metadata = true
cache_ttl = 300
Limitations
- No cross-backend transactions - ACID only within single backend
- No cross-backend joins - Query each backend separately
- Eventual consistency - Cross-backend commits may be partially applied
- Network latency - Operations touch multiple backends
Best Practices
- Group related tables - Tables that are often queried together should be in the same backend
- Consider latency - Place backends close to where they’re accessed
- Plan for failures - Have recovery procedures for partial commits
- Monitor all backends - Track health and performance per backend
- Document routing - Make it clear which tables are where