Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

How to Migrate from Snapshots

If you’re currently using database snapshots or manual backups for versioning, this guide helps you migrate to Horizon Epoch.

Current Snapshot-Based Workflow

Many teams use approaches like:

  • Daily database dumps
  • PostgreSQL pg_dump snapshots
  • AWS RDS snapshots
  • Manual CREATE TABLE ... AS SELECT copies

These approaches have limitations:

  • Full data copies are expensive
  • No granular change tracking
  • Difficult to compare versions
  • Merging changes is manual

Migration Overview

  1. Initialize a Horizon Epoch repository
  2. Import your existing data as the initial commit
  3. Set up tables for tracking
  4. Train your team on the new workflow

Step-by-Step Migration

1. Prepare Your Environment

# Start services
docker compose -f docker/docker-compose.yml up -d

# Build Horizon Epoch from source (see Installation guide)
cargo build --release

2. Initialize Repository

epoch init production-data \
    --metadata-url "postgresql://localhost/horizon_epoch" \
    --description "Migrated from daily snapshots"

3. Register Existing Tables

# Register each table you want to version
epoch table add customers \
    --location "postgresql://localhost/mydb/public.customers"

epoch table add orders \
    --location "postgresql://localhost/mydb/public.orders"

epoch table add products \
    --location "postgresql://localhost/mydb/public.products"

4. Create Initial Commit

epoch commit -m "Initial import from production snapshot 2024-01-15"

5. Tag Important Snapshots

If you have historical snapshots you want to reference:

# Tag the current state
epoch tag snapshot-2024-01-15 \
    --message "Migration baseline from daily snapshot"

6. Import Historical Snapshots (Optional)

If you have historical data you want to preserve:

import asyncio
from horizon_epoch import Client, Author

async def import_snapshots():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # For each historical snapshot
        for snapshot_date in historical_dates:
            # Restore snapshot to a temporary database
            restore_snapshot(snapshot_date, temp_db)

            # Create a branch for this point in time
            branch_name = f"history/{snapshot_date}"
            await client.branch(branch_name)

            # Register tables from restored snapshot
            # (pointing to temp database)

            # Commit
            await client.commit(
                message=f"Historical snapshot: {snapshot_date}",
                author=Author(name="Migration", email="ops@example.com")
            )

            # Tag for easy reference
            await client.tag_create(
                name=f"snapshot-{snapshot_date}",
                message="Imported from backup"
            )

asyncio.run(import_snapshots())

Mapping Snapshot Workflows

Daily Backup Replacement

Before:

# Nightly cron job
pg_dump mydb > /backups/mydb-$(date +%Y%m%d).sql

After:

# Nightly cron job
epoch commit -m "Daily snapshot $(date +%Y-%m-%d)"
epoch tag daily-$(date +%Y%m%d)

Pre-Change Backup

Before:

# Before making changes
pg_dump mydb > /backups/pre-migration.sql
# Make changes
# If something goes wrong, restore from pre-migration.sql

After:

# Before making changes
epoch tag pre-migration-$(date +%Y%m%d)
# Make changes
epoch commit -m "Applied migration X"
# If something goes wrong
epoch reset --hard pre-migration-$(date +%Y%m%d)

Environment Copies

Before:

# Create staging from production
pg_dump prod_db | psql staging_db

After:

# Create staging branch from production
epoch branch create staging --from main
# Staging now has zero-copy access to production data

Handling Large Datasets

For very large databases:

1. Incremental Registration

Register tables in batches:

# Critical tables first
epoch table add customers orders products
epoch commit -m "Core business tables"

# Then secondary tables
epoch table add logs analytics events
epoch commit -m "Add operational tables"

2. Exclude Large/Non-Critical Tables

Some tables might not need versioning:

# Configure exclusions via CLI
# epoch config set exclude_patterns "audit_logs_*,*_archive,temp_*"

# Or skip these tables when registering
# Only register the tables you want to version

3. Use Partitioned Commits

For very large initial imports:

# Commit in chunks
epoch commit --tables customers,orders -m "Batch 1: customer data"
epoch commit --tables products,inventory -m "Batch 2: product data"

Validation

After migration, verify everything works:

# Check repository status
epoch status

# View commit history
epoch log

# Verify tables are registered
epoch table list

# Test branching
epoch branch create test-migration
epoch checkout test-migration
# Make a small change, commit, merge back
epoch checkout main
epoch merge test-migration
epoch branch delete test-migration

Team Training

Key concepts to communicate:

  1. Branches replace copies - No more copying entire databases
  2. Commits are lightweight - Only changes are stored
  3. Tags mark important points - Like naming a backup
  4. Merging combines changes - No more manual comparison

Rollback Plan

If migration doesn’t go smoothly:

  1. Your existing snapshot workflow still works
  2. Horizon Epoch metadata is separate from your data
  3. Remove Horizon Epoch without affecting production:
    # Remove metadata database
    dropdb horizon_epoch
    

Next Steps