Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Multi-Backend Setup

Configure Horizon Epoch to work with multiple storage backends simultaneously.

Overview

Horizon Epoch can manage data across different storage systems:

  • Multiple PostgreSQL databases
  • Multiple S3 buckets
  • Mixed PostgreSQL + S3 environments
  • Cross-region or cross-account configurations

Architecture

┌─────────────────────────────────────────────────┐
│           Horizon Epoch Metadata                │
│         (Single PostgreSQL Database)            │
└─────────────────────┬───────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
┌───────────┐  ┌───────────┐  ┌───────────┐
│ PostgreSQL│  │ PostgreSQL│  │    S3     │
│  (users)  │  │  (orders) │  │  (events) │
└───────────┘  └───────────┘  └───────────┘

Configuration

Multiple PostgreSQL Databases

import asyncio
from horizon_epoch import Client, StorageBackend

async def setup_multi_backend():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        await client.init("multi-backend-repo")

        # Add multiple PostgreSQL backends
        await client.add_storage(
            name="users_db",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://localhost/users"}
        )

        await client.add_storage(
            name="orders_db",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://prod-orders.cluster.rds.amazonaws.com/orders"}
        )

        await client.add_storage(
            name="analytics_db",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://analytics.internal/warehouse"}
        )

asyncio.run(setup_multi_backend())

Multiple S3 Buckets

# Add multiple S3 backends
await client.add_storage(
    name="raw_data",
    backend=StorageBackend.S3,
    config={"bucket": "company-raw-data", "region": "us-east-1"}
)

await client.add_storage(
    name="processed_data",
    backend=StorageBackend.S3,
    config={"bucket": "company-processed", "region": "us-west-2"}
)

await client.add_storage(
    name="archive",
    backend=StorageBackend.S3,
    config={"bucket": "company-archive", "region": "eu-west-1"}
)

Mixed PostgreSQL + S3

# Add both PostgreSQL and S3 backends
await client.add_storage(
    name="transactional",
    backend=StorageBackend.POSTGRESQL,
    config={"url": "postgresql://localhost/production"}
)

await client.add_storage(
    name="datalake",
    backend=StorageBackend.S3,
    config={"bucket": "company-datalake"}
)

Registering Tables

Specify the storage backend when registering tables:

from horizon_epoch.client import _native

# PostgreSQL table
loc = _native.StorageLocation.postgresql("users_db", "public", "users")
await client.track_table("users", loc)

# Different PostgreSQL database
loc = _native.StorageLocation.postgresql("orders_db", "public", "orders")
await client.track_table("orders", loc)

# S3 Delta table
loc = _native.StorageLocation.s3("datalake", "delta/events")
await client.track_table("events", loc)

CLI Registration

# PostgreSQL
epoch table add users --location "postgresql://users_db/public.users"

# S3
epoch table add events --location "s3://datalake/delta/events"

Configuration File

# epoch.toml
[metadata]
url = "postgresql://localhost/horizon_epoch"

# PostgreSQL backends
[storage.postgres.users_db]
url = "postgresql://localhost/users"

[storage.postgres.orders_db]
url = "postgresql://prod-orders.cluster.rds.amazonaws.com/orders"
aws_secret_id = "horizon-epoch/orders-db"  # Credentials from Secrets Manager

[storage.postgres.analytics]
host = "analytics.internal"
database = "warehouse"
vault_path = "secret/data/analytics-db"  # Credentials from Vault

# S3 backends
[storage.s3.raw]
bucket = "company-raw-data"
region = "us-east-1"

[storage.s3.processed]
bucket = "company-processed"
region = "us-west-2"
assume_role_arn = "arn:aws:iam::123456789012:role/DataAccess"

[storage.s3.archive]
bucket = "company-archive"
region = "eu-west-1"
endpoint = "https://s3.eu-west-1.amazonaws.com"

Cross-Storage Operations

Branching

Branches span all registered tables:

# Creates branch affecting all tables
epoch branch create feature/new-schema

Commits

Commits can include changes across backends:

# Commit changes from any backend
epoch commit -m "Update user preferences and events schema"

Diff

Compare changes across storage types:

epoch diff main feature/new-schema

Output:

PostgreSQL (users_db):
  users: 5 modified, 2 added

S3 (datalake):
  events: schema changed, 1000 added

Merge

Merges coordinate across all backends:

epoch merge feature/new-schema

Storage Routing

Default Backend

Set a default storage backend via configuration:

epoch config set default_storage users_db

Explicit Routing

Always specify backend in table location:

# Explicit backend reference via StorageLocation
loc = _native.StorageLocation.postgresql("users_db", "public", "users")
await client.track_table("users", loc)

Cross-Account S3

await client.add_storage(
    name="partner_data",
    backend=StorageBackend.S3,
    config={
        "bucket": "partner-shared-bucket",
        "assume_role_arn": "arn:aws:iam::999888777666:role/HorizonEpochAccess",
        "external_id": "partner-integration-id"
    }
)

Best Practices

1. Separate Metadata from Data

The metadata database should be separate from data storage:

# Connect to dedicated metadata database
async with Client.connect("postgresql://metadata.internal/horizon_epoch") as client:
    # Add production data storage (different server)
    await client.add_storage(
        name="production",
        backend=StorageBackend.POSTGRESQL,
        config={"url": "postgresql://data.internal/production"}
    )

2. Use Consistent Naming

# Good: Clear, consistent names
await client.add_storage("prod_users", StorageBackend.POSTGRESQL, {...})
await client.add_storage("prod_orders", StorageBackend.POSTGRESQL, {...})
await client.add_storage("prod_events", StorageBackend.S3, {...})

# Avoid: Inconsistent or unclear names like "db1" or "bucket"

3. Document Backend Purpose

# epoch.toml
# Users database - primary transactional storage
[storage.postgres.users]
url = "postgresql://..."

# Orders database - order processing system
[storage.postgres.orders]
url = "postgresql://..."

# Event lake - analytics events from all services
[storage.s3.events]
bucket = "company-events"

4. Handle Partial Failures

Operations across multiple backends may partially fail:

try:
    result = client.commit(message="Multi-backend update")
except PartialCommitError as e:
    print(f"Committed to: {e.successful_backends}")
    print(f"Failed on: {e.failed_backends}")
    # Handle recovery

Troubleshooting

Backend Not Found

Error: Storage backend 'mydb' not found
  • Verify backend is configured in config
  • Check backend name spelling

Cross-Backend Transaction

Warning: Cross-backend operations are not atomic
  • Horizon Epoch provides best-effort consistency
  • Use single backend for strict atomicity requirements

Permission Mismatch

Error: Access denied on backend 'orders_db'
  • Verify credentials for each backend
  • Check each backend has required permissions

Next Steps