Multi-Backend Setup
Configure Horizon Epoch to work with multiple storage backends simultaneously.
Overview
Horizon Epoch can manage data across different storage systems:
- Multiple PostgreSQL databases
- Multiple S3 buckets
- Mixed PostgreSQL + S3 environments
- Cross-region or cross-account configurations
Architecture
┌─────────────────────────────────────────────────┐
│ Horizon Epoch Metadata │
│ (Single PostgreSQL Database) │
└─────────────────────┬───────────────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ PostgreSQL│ │ PostgreSQL│ │ S3 │
│ (users) │ │ (orders) │ │ (events) │
└───────────┘ └───────────┘ └───────────┘
Configuration
Multiple PostgreSQL Databases
import asyncio
from horizon_epoch import Client, StorageBackend
async def setup_multi_backend():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
await client.init("multi-backend-repo")
# Add multiple PostgreSQL backends
await client.add_storage(
name="users_db",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://localhost/users"}
)
await client.add_storage(
name="orders_db",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://prod-orders.cluster.rds.amazonaws.com/orders"}
)
await client.add_storage(
name="analytics_db",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://analytics.internal/warehouse"}
)
asyncio.run(setup_multi_backend())
Multiple S3 Buckets
# Add multiple S3 backends
await client.add_storage(
name="raw_data",
backend=StorageBackend.S3,
config={"bucket": "company-raw-data", "region": "us-east-1"}
)
await client.add_storage(
name="processed_data",
backend=StorageBackend.S3,
config={"bucket": "company-processed", "region": "us-west-2"}
)
await client.add_storage(
name="archive",
backend=StorageBackend.S3,
config={"bucket": "company-archive", "region": "eu-west-1"}
)
Mixed PostgreSQL + S3
# Add both PostgreSQL and S3 backends
await client.add_storage(
name="transactional",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://localhost/production"}
)
await client.add_storage(
name="datalake",
backend=StorageBackend.S3,
config={"bucket": "company-datalake"}
)
Registering Tables
Specify the storage backend when registering tables:
from horizon_epoch.client import _native
# PostgreSQL table
loc = _native.StorageLocation.postgresql("users_db", "public", "users")
await client.track_table("users", loc)
# Different PostgreSQL database
loc = _native.StorageLocation.postgresql("orders_db", "public", "orders")
await client.track_table("orders", loc)
# S3 Delta table
loc = _native.StorageLocation.s3("datalake", "delta/events")
await client.track_table("events", loc)
CLI Registration
# PostgreSQL
epoch table add users --location "postgresql://users_db/public.users"
# S3
epoch table add events --location "s3://datalake/delta/events"
Configuration File
# epoch.toml
[metadata]
url = "postgresql://localhost/horizon_epoch"
# PostgreSQL backends
[storage.postgres.users_db]
url = "postgresql://localhost/users"
[storage.postgres.orders_db]
url = "postgresql://prod-orders.cluster.rds.amazonaws.com/orders"
aws_secret_id = "horizon-epoch/orders-db" # Credentials from Secrets Manager
[storage.postgres.analytics]
host = "analytics.internal"
database = "warehouse"
vault_path = "secret/data/analytics-db" # Credentials from Vault
# S3 backends
[storage.s3.raw]
bucket = "company-raw-data"
region = "us-east-1"
[storage.s3.processed]
bucket = "company-processed"
region = "us-west-2"
assume_role_arn = "arn:aws:iam::123456789012:role/DataAccess"
[storage.s3.archive]
bucket = "company-archive"
region = "eu-west-1"
endpoint = "https://s3.eu-west-1.amazonaws.com"
Cross-Storage Operations
Branching
Branches span all registered tables:
# Creates branch affecting all tables
epoch branch create feature/new-schema
Commits
Commits can include changes across backends:
# Commit changes from any backend
epoch commit -m "Update user preferences and events schema"
Diff
Compare changes across storage types:
epoch diff main feature/new-schema
Output:
PostgreSQL (users_db):
users: 5 modified, 2 added
S3 (datalake):
events: schema changed, 1000 added
Merge
Merges coordinate across all backends:
epoch merge feature/new-schema
Storage Routing
Default Backend
Set a default storage backend via configuration:
epoch config set default_storage users_db
Explicit Routing
Always specify backend in table location:
# Explicit backend reference via StorageLocation
loc = _native.StorageLocation.postgresql("users_db", "public", "users")
await client.track_table("users", loc)
Cross-Account S3
await client.add_storage(
name="partner_data",
backend=StorageBackend.S3,
config={
"bucket": "partner-shared-bucket",
"assume_role_arn": "arn:aws:iam::999888777666:role/HorizonEpochAccess",
"external_id": "partner-integration-id"
}
)
Best Practices
1. Separate Metadata from Data
The metadata database should be separate from data storage:
# Connect to dedicated metadata database
async with Client.connect("postgresql://metadata.internal/horizon_epoch") as client:
# Add production data storage (different server)
await client.add_storage(
name="production",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://data.internal/production"}
)
2. Use Consistent Naming
# Good: Clear, consistent names
await client.add_storage("prod_users", StorageBackend.POSTGRESQL, {...})
await client.add_storage("prod_orders", StorageBackend.POSTGRESQL, {...})
await client.add_storage("prod_events", StorageBackend.S3, {...})
# Avoid: Inconsistent or unclear names like "db1" or "bucket"
3. Document Backend Purpose
# epoch.toml
# Users database - primary transactional storage
[storage.postgres.users]
url = "postgresql://..."
# Orders database - order processing system
[storage.postgres.orders]
url = "postgresql://..."
# Event lake - analytics events from all services
[storage.s3.events]
bucket = "company-events"
4. Handle Partial Failures
Operations across multiple backends may partially fail:
try:
result = client.commit(message="Multi-backend update")
except PartialCommitError as e:
print(f"Committed to: {e.successful_backends}")
print(f"Failed on: {e.failed_backends}")
# Handle recovery
Troubleshooting
Backend Not Found
Error: Storage backend 'mydb' not found
- Verify backend is configured in config
- Check backend name spelling
Cross-Backend Transaction
Warning: Cross-backend operations are not atomic
- Horizon Epoch provides best-effort consistency
- Use single backend for strict atomicity requirements
Permission Mismatch
Error: Access denied on backend 'orders_db'
- Verify credentials for each backend
- Check each backend has required permissions
Next Steps
- Branch-Aware Queries - Query across backends
- Configuration Reference - Full options
- Architecture - Design details