Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Horizon Epoch

Git-like version control for data across heterogeneous storage systems.

Horizon Epoch brings the power of Git-style version control to your data infrastructure. Whether your data lives in relational databases, cloud object storage, or local filesystems, Horizon Epoch provides unified branching, merging, and history tracking.

Supported Storage Backends

  • Relational Databases - PostgreSQL, MySQL, SQL Server, SQLite
  • Cloud Object Storage - AWS S3, Azure Blob Storage, Google Cloud Storage
  • Local Storage - Local filesystem with Delta Lake format

Key Features

  • Multi-Storage Support - Work with data across 8 different storage backends
  • Zero-Copy Branching - Create instant branches without duplicating data
  • True Three-Way Merge - Smart conflict detection and resolution at the field level
  • Copy-on-Write Semantics - Efficient storage with minimal overhead
  • Git-Like Commands - Familiar CLI interface for data engineers

Who Is This For?

Horizon Epoch is designed for:

  • Data Engineers who need to safely test ETL changes before production
  • Analytics Teams who want to experiment with transformations in isolation
  • DevOps Engineers managing data across multiple environments
  • Anyone who has wished for Git-like workflows for their data

Quick Example

# Initialize a repository
epoch init my-data-repo --metadata-url "postgresql://localhost/horizon_epoch"

# Register a table to track
epoch table add users --location "postgresql://localhost/mydb/public.users"

# Create a feature branch
epoch branch create feature/new-schema

# Make changes, then commit
epoch commit -m "Add user preferences column"

# Merge back to main
epoch merge feature/new-schema

How It Works

Horizon Epoch tracks data versions through a metadata layer that sits alongside your existing storage:

  1. Metadata Database - Stores commit history, branch pointers, and version indices
  2. Storage Adapters - Connect to your data where it lives (PostgreSQL, MySQL, SQL Server, SQLite, S3, Azure, GCS, local filesystem)
  3. Copy-on-Write - Only modified records are stored separately on branches

This architecture means:

  • No data migration required
  • Works with your existing infrastructure
  • Minimal storage overhead for branching

Getting Started

Ready to try Horizon Epoch? Start with the Installation Guide and then follow the Quickstart tutorial.

Documentation Structure

This documentation is organized into several sections:

Installation

Horizon Epoch is currently in private development.

Installation packages and instructions will be available upon public release. For early access inquiries, please contact Horizon Analytic Studios.

Requirements

When released, Horizon Epoch will require:

  • PostgreSQL 14+ (for metadata storage)
  • One or more supported storage backends

Supported Storage Backends

  • PostgreSQL
  • MySQL
  • Microsoft SQL Server
  • SQLite
  • AWS S3 / S3-compatible (MinIO, etc.)
  • Azure Blob Storage
  • Google Cloud Storage
  • Local filesystem

Next Steps

Quickstart

Installation instructions coming soon.

For early access inquiries, please contact Horizon Analytic Studios.

What is Horizon Epoch?

Horizon Epoch is a Git-like version control system for data. It provides:

  • Branching - Create isolated environments for testing changes
  • Commits - Track changes with full history
  • Merging - Combine changes with conflict detection
  • Multi-backend support - Works with PostgreSQL, MySQL, SQL Server, SQLite, S3, Azure, GCS, local filesystem

Next Steps

Core Concepts

Repositories

A repository is the top-level container for versioned data. It contains:

  • Multiple tables (data sources)
  • Multiple branches (parallel versions)
  • A history of commits (snapshots in time)

Branches

Branches are independent lines of development. Like Git:

  • main is the default branch
  • Create branches for features, experiments, or environments
  • Branches are zero-copy - creating one is instant

Commits

A commit is a snapshot of one or more tables at a point in time:

  • Contains a message describing the changes
  • Links to parent commit(s) forming a history graph
  • Tracks which records changed

Tables

Tables are data sources that Horizon Epoch tracks:

  • Can be database tables (PostgreSQL, MySQL, SQL Server, SQLite) or object storage (S3, Azure, GCS, local)
  • Each table has a schema and primary key
  • Changes are tracked at the record level

Copy-on-Write

Horizon Epoch uses copy-on-write (CoW) semantics:

  • When you create a branch, no data is copied
  • Only modified records are stored separately
  • Queries automatically overlay changes on base data

This makes operations fast and storage efficient.

Three-Way Merge

When merging branches, Horizon Epoch uses three-way merge:

  1. Find the common ancestor
  2. Identify changes from ancestor to each branch
  3. Combine non-conflicting changes
  4. Report conflicts for overlapping changes

Conflicts can be resolved at the field level.

Storage Adapters

Horizon Epoch works with your existing storage through adapters:

Relational Databases:

  • PostgreSQL - Full constraint support, TLS/SSL
  • MySQL - Full constraint support, SSH tunnels
  • SQL Server - Full constraint support, Windows auth
  • SQLite - File-based or in-memory, partial constraint support

Object Storage (Delta Lake format):

  • AWS S3 - And S3-compatible (MinIO, etc.)
  • Azure Blob Storage - Multiple auth methods
  • Google Cloud Storage - Service account or ADC
  • Local Filesystem - For development and edge deployments

Each adapter implements the same interface, so version control operations work identically across storage types.

Metadata Layer

All versioning information is stored in a metadata database (PostgreSQL):

  • Commit graph (directed acyclic graph of commits)
  • Branch pointers (references to commits)
  • Table registrations and schema versions
  • Change tracking indices

The metadata layer is separate from your data storage, meaning Horizon Epoch doesn’t modify how your data is stored.

Tutorial: Your First Repository

In this tutorial, you’ll:

  • Create a Horizon Epoch repository
  • Register a PostgreSQL table
  • Make your first commit
  • Explore the history

Prerequisites

Step 1: Create Sample Data

First, let’s create a sample table in PostgreSQL:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

INSERT INTO users (name, email) VALUES
    ('Alice', 'alice@example.com'),
    ('Bob', 'bob@example.com'),
    ('Charlie', 'charlie@example.com');

Step 2: Initialize Repository

epoch init my-first-repo \
  --metadata-url "postgresql://localhost/horizon_epoch" \
  --description "My first Horizon Epoch repository"

Output:

Initialized repository 'my-first-repo' on branch 'main'

Step 3: Register the Table

epoch table add users \
  --location "postgresql://localhost/mydb/public.users"

Output:

Registered table 'users' with 3 records
  Schema: id (int), name (text), email (text), created_at (timestamp)
  Primary key: id

Step 4: Create Initial Commit

epoch commit -m "Initial import of users table"

Output:

Created commit abc123f
  1 table, 3 records

Step 5: View the Log

epoch log

Output:

commit abc123f
Author: Your Name <you@example.com>
Date:   Thu Dec 12 10:30:00 2025

    Initial import of users table

    Tables: users
    Stats: 3 records added

Step 6: Check Status

epoch status

Output:

On branch main
nothing to commit, working tree clean

Congratulations!

You’ve created your first Horizon Epoch repository. Next steps:

Tutorial: Branching Workflow

Learn how to use branches effectively in Horizon Epoch for parallel development and safe experimentation.

Prerequisites

Understanding Branches

Branches in Horizon Epoch work like Git branches:

  • Each branch is an independent line of development
  • Branches are zero-copy - creating one doesn’t duplicate data
  • Changes on one branch don’t affect others until merged

How Zero-Copy Branching Works

When you create a branch in Horizon Epoch, no data is actually copied. Instead, the new branch simply points to the same data as its parent. Only when you make changes does Horizon Epoch store the modified records separately.

flowchart LR
    subgraph "Before Branch Creation"
        M1[("main branch")] --> D1[("Shared Data<br/>users: 1000 rows<br/>orders: 5000 rows")]
    end
flowchart LR
    subgraph "After Branch Creation (Zero-Copy)"
        M2[("main")] --> D2[("Shared Data<br/>users: 1000 rows<br/>orders: 5000 rows")]
        F2[("feature")] --> D2
    end
flowchart LR
    subgraph "After Modifications on Feature Branch"
        M3[("main")] --> D3[("Shared Data<br/>users: 1000 rows<br/>orders: 5000 rows")]
        F3[("feature")] --> D3
        F3 --> C3[("Branch Changes<br/>users: 5 modified<br/>users: 2 added")]
    end

This architecture means:

  • Instant branch creation - No waiting for data to copy
  • Efficient storage - Only changed records are stored twice
  • Fast queries - Branch-specific changes are overlaid on base data

Common Branching Patterns

Pattern 1: Feature Branch

Create a branch for a new feature or experiment:

gitGraph
    commit id: "Initial"
    commit id: "Add users table"
    branch feature/user-preferences
    checkout feature/user-preferences
    commit id: "Add preferences column"
    commit id: "Populate default values"
    checkout main
    merge feature/user-preferences id: "Merge feature"
    commit id: "Continue development"
epoch branch create feature/user-preferences
epoch checkout feature/user-preferences

Pattern 2: Environment Branches

Maintain branches for different environments:

gitGraph
    commit id: "v1.0"
    branch staging
    branch development
    checkout development
    commit id: "New feature A"
    commit id: "New feature B"
    checkout staging
    merge development id: "Promote to staging"
    checkout main
    commit id: "v1.1 Hotfix"
    checkout staging
    merge main id: "Sync hotfix"
    checkout main
    merge staging id: "Release v1.2"
epoch branch create development
epoch branch create staging
epoch branch create production

Pattern 3: Hotfix Branch

Quick fixes that need to go directly to production:

gitGraph
    commit id: "v1.0 Release"
    commit id: "v1.1 Release"
    branch hotfix/fix-email
    checkout hotfix/fix-email
    commit id: "Fix email format"
    checkout main
    merge hotfix/fix-email id: "v1.1.1 Hotfix"
epoch branch create hotfix/fix-email-format --from production

Pattern 4: Multiple Feature Branches

Working on multiple features in parallel:

gitGraph
    commit id: "Base"
    branch feature/auth
    branch feature/reporting
    checkout feature/auth
    commit id: "Add login"
    commit id: "Add logout"
    checkout feature/reporting
    commit id: "Add daily report"
    commit id: "Add weekly report"
    checkout main
    merge feature/auth id: "Merge auth"
    merge feature/reporting id: "Merge reporting"

Step-by-Step: Feature Branch Workflow

1. Create a Feature Branch

epoch branch create feature/add-status-field

2. Switch to the Branch

epoch checkout feature/add-status-field

3. Make Changes to Your Data

Modify the underlying data (e.g., add records, update schema):

-- In your PostgreSQL database
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';
UPDATE users SET status = 'active';

4. Review Changes

epoch diff

Output:

Modified: users
  Schema changes:
    + status (varchar)
  Record changes:
    ~ 3 records modified (new field values)

5. Commit Your Changes

epoch commit -m "Add status field to users table"

6. View Branch History

epoch log --oneline

Output:

def456g Add status field to users table
abc123f Initial import of users table

7. Compare with Main

epoch diff main..feature/add-status-field

8. Merge to Main

epoch checkout main
epoch merge feature/add-status-field

9. Clean Up

epoch branch delete feature/add-status-field

Branch Lifecycle Visualization

The complete lifecycle of a feature branch:

stateDiagram-v2
    [*] --> Created: epoch branch create
    Created --> Active: epoch checkout
    Active --> Modified: Make data changes
    Modified --> Committed: epoch commit
    Committed --> Modified: More changes
    Committed --> Reviewed: epoch diff main..branch
    Reviewed --> Merged: epoch merge
    Merged --> Deleted: epoch branch delete
    Deleted --> [*]

Environment Promotion Flow

A typical flow for promoting changes through environments:

flowchart TD
    subgraph Development
        D1[Feature Branch] --> D2[Development Branch]
        D2 --> D3{Tests Pass?}
        D3 -->|No| D1
    end

    subgraph Staging
        D3 -->|Yes| S1[Promote to Staging]
        S1 --> S2{QA Approved?}
        S2 -->|No| D1
    end

    subgraph Production
        S2 -->|Yes| P1[Promote to Production]
        P1 --> P2[Tag Release]
    end
# Promote development to staging
epoch checkout staging
epoch merge development

# After QA approval, promote to production
epoch checkout production
epoch merge staging
epoch tag v1.2.0

Branch Comparison Summary

PatternUse CaseLifetimeMerge Target
FeatureNew functionalityDays to weeksdevelopment or main
EnvironmentDeployment stagesPermanentNext environment
HotfixCritical fixesHours to daysproduction + main
ReleaseVersion preparationDaysmain

Best Practices

  1. Keep branches short-lived - Merge frequently to avoid conflicts
  2. Use descriptive names - feature/user-auth is better than branch1
  3. One feature per branch - Makes merging and rollback easier
  4. Test before merging - Validate changes on the branch first
  5. Delete merged branches - Keep your branch list clean

Visualizing Your Repository

You can visualize your repository’s branch structure:

# List all branches with their relationships
epoch branch list --graph

# Show commit history across branches
epoch log --graph --all

Next Steps

Tutorial: Merging Data

Learn how to merge branches, handle conflicts, and understand merge strategies in Horizon Epoch.

Prerequisites

Merge Types

Fast-Forward Merge

When the target branch hasn’t changed since the source branch was created:

Before:
main:     A---B
               \
feature:        C---D

After (fast-forward):
main:     A---B---C---D
epoch merge feature/simple-change
# Fast-forward merge: main -> def456g

Three-Way Merge

When both branches have diverged:

Before:
main:     A---B---C
               \
feature:        D---E

After (three-way merge):
main:     A---B---C---F
               \     /
feature:        D---E
epoch merge feature/complex-change
# Merged feature/complex-change into main (commit: ghi789h)

Handling Conflicts

Conflicts occur when both branches modify the same record’s fields.

Detecting Conflicts

epoch merge feature/conflicting-changes
# CONFLICT: Merge conflict in users
# Automatic merge failed; fix conflicts and then commit.

Viewing Conflicts

epoch status

Output:

On branch main
You have unmerged paths.
  (fix conflicts and run "epoch merge --continue")
  (use "epoch merge --abort" to abort the merge)

Unmerged paths:
  both modified:   users (3 conflicts)

Inspecting Conflict Details

epoch conflicts show

Output:

Table: users
Record ID: 42
Field: email
  BASE:   alice@example.com
  OURS:   alice@company.com
  THEIRS: alice.smith@example.com

Field: status
  BASE:   active
  OURS:   active
  THEIRS: inactive

Resolving Conflicts

Interactive Resolution

epoch conflicts resolve --interactive

Accept One Side

# Accept all changes from target branch (ours)
epoch conflicts resolve --ours

# Accept all changes from source branch (theirs)
epoch conflicts resolve --theirs

Field-Level Resolution

epoch conflicts resolve users --record 42 --field email --value "alice.smith@company.com"

Completing the Merge

epoch merge --continue

Aborting the Merge

If you need to start over:

epoch merge --abort

Merge Strategies

Default Strategy

The default strategy uses three-way merge with field-level conflict detection:

epoch merge feature/my-branch

Ours Strategy

Keep all conflicting values from the target branch:

epoch merge feature/my-branch --strategy ours

Theirs Strategy

Accept all conflicting values from the source branch:

epoch merge feature/my-branch --strategy theirs

Best Practices

  1. Merge frequently - Small, frequent merges have fewer conflicts
  2. Review diffs first - Use epoch diff before merging
  3. Test after merge - Validate data integrity post-merge
  4. Document conflicts - Add context in merge commit messages

Example: Complete Merge Workflow

# 1. Ensure you're on the target branch
epoch checkout main

# 2. Pull latest changes (if using remote)
epoch pull

# 3. Preview what will be merged
epoch diff main..feature/my-feature

# 4. Perform the merge
epoch merge feature/my-feature

# 5. If conflicts, resolve them
epoch conflicts show
epoch conflicts resolve --interactive

# 6. Complete the merge
epoch merge --continue

# 7. Verify the result
epoch log --oneline -5
epoch status

Next Steps

Tutorial: Environment Promotion

Learn how to safely promote data changes from development through staging to production using Horizon Epoch’s branching model.

Prerequisites

  • Completed Merging Data tutorial
  • Understanding of environment-based workflows

The Environment Branch Model

A typical setup uses long-lived branches for each environment:

production    A---B-----------F---G
                   \         /
staging             C---D---E
                         \
development               H---I---J

Setting Up Environment Branches

1. Create Environment Branches

# Start from main (which represents production)
epoch branch create staging --from main
epoch branch create development --from staging

2. Protect Production

Configure branch protection to prevent direct commits:

epoch branch protect production --require-review --no-force-push

Development Workflow

1. Work on Development

All feature work starts on development or feature branches:

epoch checkout development
epoch branch create feature/new-report --from development

2. Develop and Test

Make changes and commit:

# Make data changes
epoch commit -m "Add sales report aggregations"

3. Merge to Development

epoch checkout development
epoch merge feature/new-report

Promoting to Staging

1. Review Changes

epoch diff staging..development

2. Create a Release Branch (Optional)

For complex releases:

epoch branch create release/v1.2 --from development

3. Promote to Staging

epoch checkout staging
epoch merge development  # or release/v1.2

4. Run Staging Tests

Validate the data in staging environment before proceeding.

Promoting to Production

1. Final Review

epoch diff production..staging

2. Promote to Production

epoch checkout production
epoch merge staging

3. Tag the Release

epoch tag v1.2.0 --message "Release 1.2.0: Sales report aggregations"

Rollback Procedures

Quick Rollback

If issues are found immediately after promotion:

# Revert the last merge
epoch revert HEAD

Rollback to Specific Version

# Find the commit to rollback to
epoch log --oneline -10

# Reset to that commit
epoch reset --hard abc123f

Emergency Production Rollback

# Create a hotfix from the last known good state
epoch checkout production
epoch reset --hard v1.1.0
epoch tag v1.1.1-rollback --message "Emergency rollback from v1.2.0"

Hotfix Workflow

For urgent production fixes:

# 1. Create hotfix from production
epoch branch create hotfix/critical-fix --from production

# 2. Make the fix
epoch checkout hotfix/critical-fix
# ... apply fix ...
epoch commit -m "Fix critical data issue"

# 3. Merge directly to production
epoch checkout production
epoch merge hotfix/critical-fix

# 4. Backport to other environments
epoch checkout staging
epoch merge hotfix/critical-fix

epoch checkout development
epoch merge hotfix/critical-fix

Automation Tips

CI/CD Integration

# Example GitHub Actions workflow
name: Promote to Staging

on:
  workflow_dispatch:
    inputs:
      source_branch:
        description: 'Branch to promote'
        default: 'development'

jobs:
  promote:
    runs-on: ubuntu-latest
    steps:
      - name: Promote to staging
        run: |
          epoch checkout staging
          epoch merge ${{ inputs.source_branch }}
          epoch push

Scheduled Promotions

# Promote development to staging every night
0 2 * * * epoch checkout staging && epoch merge development && epoch push

Best Practices

  1. Never commit directly to production - Always promote through the pipeline
  2. Tag production releases - Makes rollback and auditing easier
  3. Keep environments in sync - Regularly merge production back to development
  4. Automate where possible - Reduces human error
  5. Document promotions - Keep a changelog of what was promoted and when

Next Steps

Tutorial: ETL Branch Testing

Learn how to safely test ETL (Extract, Transform, Load) changes using Horizon Epoch branches before deploying to production.

Prerequisites

  • Completed Environment Promotion tutorial
  • Understanding of ETL workflows
  • Python environment with Horizon Epoch SDK

The Problem

ETL changes are risky:

  • Transformations might produce incorrect results
  • Schema changes might break downstream systems
  • Volume changes might cause performance issues

Traditional approaches:

  • Test on copied data (expensive, often stale)
  • Test in staging (still risky, shared environment)
  • Hope for the best (not recommended)

The Solution: Branch-Based ETL Testing

With Horizon Epoch, you can:

  1. Create a branch from production data
  2. Run your ETL changes against the branch
  3. Validate the results
  4. Merge only when confident

Step-by-Step Guide

1. Set Up Your Repository

import asyncio
from horizon_epoch import Client, StorageBackend

async def setup():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        await client.init("my-repo")

        # Add warehouse storage
        await client.add_storage(
            name="warehouse",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://localhost/data_warehouse"}
        )
        return client

# Note: In practice, you'd keep the client connection open throughout

2. Create a Test Branch

# Create branch from production
test_branch = await client.branch(
    name="etl-test/sales-aggregation-v2",
    start_point="main"  # production
)
print(f"Created test branch: {test_branch.name}")

3. Run Your ETL Transformation

import pandas as pd
from horizon_epoch import Author

# Checkout the test branch
await client.checkout("etl-test/sales-aggregation-v2")

# Your ETL logic here
df = pd.read_sql("""
    SELECT
        date_trunc('month', sale_date) as month,
        product_category,
        SUM(amount) as total_sales
    FROM sales
    GROUP BY 1, 2
""", connection)

# Write transformed data
df.to_sql(
    "monthly_sales_summary",
    connection,
    if_exists="replace",
    index=False
)

# Stage and commit the transformation
await client.stage_all()
await client.commit(
    message="Apply new sales aggregation logic v2",
    author=Author(name="ETL Pipeline", email="etl@example.com")
)

4. Validate the Results

# Compare with expected results
async def validate_transformation():
    # Already on the test branch from checkout
    # Check row counts
    result = connection.execute(
        "SELECT COUNT(*) FROM monthly_sales_summary"
    ).scalar()
    assert result > 0, "No rows in summary table"

    # Check for nulls in required fields
    nulls = connection.execute("""
        SELECT COUNT(*) FROM monthly_sales_summary
        WHERE month IS NULL OR total_sales IS NULL
    """).scalar()
    assert nulls == 0, f"Found {nulls} rows with null values"

    # Verify totals match
        branch_total = connection.execute(
            "SELECT SUM(total_sales) FROM monthly_sales_summary"
        ).scalar()

    # Compare with production
    prod_total = connection.execute(
        "SELECT SUM(amount) FROM sales"
    ).scalar()

    assert abs(branch_total - prod_total) < 0.01, \
        f"Total mismatch: {branch_total} vs {prod_total}"

    print("All validations passed!")

validate_transformation()

5. Compare with Production

# Get detailed diff
diff = client.diff(
    source="main",
    target="etl-test/sales-aggregation-v2"
)

for table_diff in diff.table_diffs:
    print(f"\nTable: {table_diff.table_name}")
    print(f"  Added: {table_diff.added_count}")
    print(f"  Modified: {table_diff.modified_count}")
    print(f"  Deleted: {table_diff.deleted_count}")

6. Review and Merge

If validations pass:

# Merge to production
result = client.merge(
    source="etl-test/sales-aggregation-v2",
    target="main"
)

if result.status == "success":
    print("ETL changes merged to production!")
    # Clean up test branch
    client.delete_branch("etl-test/sales-aggregation-v2")
else:
    print(f"Merge failed: {result.conflicts}")

Complete Example Script

#!/usr/bin/env python3
"""
ETL Branch Testing Example

Usage:
    python etl_branch_test.py
"""

import asyncio
from horizon_epoch import Client, Author, StorageBackend
from datetime import datetime

async def main():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Add warehouse storage if needed
        await client.add_storage(
            name="warehouse",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://localhost/data_warehouse"}
        )

        branch_name = f"etl-test/transform-{datetime.now():%Y%m%d-%H%M%S}"

        try:
            # 1. Create test branch
            print(f"Creating branch: {branch_name}")
            await client.branch(branch_name, start_point="main")

            # 2. Run transformation
            print("Running ETL transformation...")
            await client.checkout(branch_name)
            await run_transformation()
            await client.stage_all()
            await client.commit(
                message="Apply transformation",
                author=Author(name="ETL", email="etl@example.com")
            )

            # 3. Validate
            print("Validating results...")
            await validate_results()

            # 4. Review diff
            print("\nChanges summary:")
            diff = await client.diff(base="main", target=branch_name)
            for td in diff.table_diffs:
                print(f"  {td.table_name}: {td.total_changes} changes")

            # 5. Prompt for merge
            if input("\nMerge to production? (y/n): ").lower() == 'y':
                result = await client.merge(source=branch_name)
                if result.has_conflicts:
                    print(f"Conflicts: {result.conflict_count}")
                else:
                    print(f"Merged: {result.result_commit_id}")

        finally:
            # Clean up
            branches = await client.branches()
            if any(b.name == branch_name for b in branches):
                await client.delete_branch(branch_name)
                print(f"Cleaned up branch: {branch_name}")

def run_transformation():
    # Your ETL logic here
    pass

def validate_results():
    # Your validation logic here
    pass

if __name__ == "__main__":
    main()

Best Practices

  1. Name branches descriptively - Include ETL name and timestamp
  2. Automate validation - Write comprehensive checks
  3. Keep transformations atomic - One logical change per branch
  4. Document expected changes - Record what the ETL should do
  5. Time-box testing - Don’t let test branches linger

Next Steps

Tutorial: Schema and Constraint Versioning

This tutorial demonstrates how to version database constraints alongside your data, including schema introspection, constraint diffing, merge conflict resolution, and DDL migration generation.

Prerequisites

  • Horizon Epoch installed (Installation Guide)
  • PostgreSQL running with a sample database
  • Basic familiarity with Git-like version control concepts

Part 1: Schema Introspection with Constraints

Step 1: Create a Table with Constraints

First, let’s create a table with various constraints in PostgreSQL:

-- Create enum type
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered');

-- Create customers table
CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    name VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT uq_customers_email UNIQUE (email),
    CONSTRAINT chk_customers_email CHECK (email ~ '@')
);

-- Create orders table with foreign key
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    status order_status DEFAULT 'pending',
    total NUMERIC(10,2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id)
        REFERENCES customers (id) ON DELETE CASCADE,
    CONSTRAINT chk_orders_total CHECK (total >= 0)
);

-- Create index
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_status ON orders (status) WHERE status != 'delivered';

Step 2: Register Tables with Horizon Epoch

# Initialize repository
epoch init ecommerce-repo \
  --metadata-url "postgresql://localhost/horizon_epoch"

# Register tables (constraints are automatically detected)
epoch table add customers \
  --location "postgresql://localhost/mydb/public.customers"

epoch table add orders \
  --location "postgresql://localhost/mydb/public.orders"

# Initial commit captures schema with constraints
epoch commit -m "Initial schema with constraints"

Step 3: View Introspected Schema

# Show schema including constraints
epoch schema show customers

# Output:
# Table: customers
#
# Columns:
#   id          SERIAL       PRIMARY KEY
#   email       VARCHAR(255) NOT NULL
#   name        VARCHAR(255) NOT NULL
#   created_at  TIMESTAMP    DEFAULT CURRENT_TIMESTAMP
#
# Constraints:
#   uq_customers_email  UNIQUE (email)
#   chk_customers_email CHECK (email ~ '@')

Using Python:

import asyncio
from horizon_epoch import Client

async def view_constraints():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Get constraints for a table
        constraints = await client.get_constraints("customers", branch="main")

        print(f"\nUnique Constraints: {len(constraints.unique_constraints)}")
        print(f"Check Constraints: {len(constraints.check_constraints)}")
        print(f"Foreign Keys: {len(constraints.foreign_keys)}")

asyncio.run(view_constraints())

Part 2: Constraint Diffing Between Branches

Step 1: Create a Feature Branch

epoch branch create feature/add-phone

Step 2: Modify Constraints on the Branch

Add a phone number column with constraints:

-- On feature/add-phone branch
ALTER TABLE customers ADD COLUMN phone VARCHAR(20);
ALTER TABLE customers ADD CONSTRAINT uq_customers_phone UNIQUE (phone);
ALTER TABLE customers ADD CONSTRAINT chk_customers_phone
    CHECK (phone ~ '^\+?[0-9]{10,15}$');
epoch commit -m "Add phone column with constraints"

Step 3: View the Constraint Diff

# Compare branches
epoch diff main feature/add-phone --schema

# Output:
# Schema changes for 'customers':
#
# Added columns:
#   + phone VARCHAR(20)
#
# Added constraints:
#   + uq_customers_phone UNIQUE (phone)
#   + chk_customers_phone CHECK (phone ~ '^\+?[0-9]{10,15}$')

Using Python:

# Get constraint diff
diff = client.diff_schema("main", "feature/add-phone", table="customers")

print("Added constraints:")
for c in diff.constraints.added:
    print(f"  + {c.constraint_type}: {c.name}")

print("\nRemoved constraints:")
for c in diff.constraints.removed:
    print(f"  - {c.constraint_type}: {c.name}")

print("\nModified constraints:")
for m in diff.constraints.modified:
    print(f"  ~ {m.name}")
    for change in m.changes:
        print(f"      {change}")

Part 3: Constraint Merge Conflict Resolution

Constraint conflicts occur when the same constraint is modified differently on two branches.

Step 1: Create Conflicting Changes

# On main branch
epoch checkout main
-- Modify the email check constraint on main
ALTER TABLE customers DROP CONSTRAINT chk_customers_email;
ALTER TABLE customers ADD CONSTRAINT chk_customers_email
    CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');
epoch commit -m "Stricter email validation"

# On feature branch (already has different changes)
epoch checkout feature/add-phone
-- Also modify the email check constraint differently
ALTER TABLE customers DROP CONSTRAINT chk_customers_email;
ALTER TABLE customers ADD CONSTRAINT chk_customers_email
    CHECK (email LIKE '%@%.%');
epoch commit -m "Simpler email validation"

Step 2: Attempt Merge

epoch checkout main
epoch merge feature/add-phone

# Output:
# CONFLICT (constraint): chk_customers_email modified in both branches
#
# On main:
#   CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')
#
# On feature/add-phone:
#   CHECK (email LIKE '%@%.%')
#
# Automatic merge failed. Fix conflicts and commit.

Step 3: View Conflicts

epoch status

# Output:
# On branch main
# You have unmerged paths.
#
# Constraint conflicts:
#   (use "epoch resolve <constraint>" to mark resolution)
#
#   both modified:   customers.chk_customers_email

Using Python:

# Get merge conflicts
merge_result = client.merge("feature/add-phone", "main", dry_run=True)

for conflict in merge_result.constraint_conflicts:
    print(f"Conflict: {conflict.constraint_name}")
    print(f"  Type: {conflict.constraint_type}")
    print(f"  Base: {conflict.base}")
    print(f"  Ours: {conflict.ours}")
    print(f"  Theirs: {conflict.theirs}")
    print(f"  Differences: {conflict.differences}")

Step 4: Resolve Conflicts

Option A: Use ours (main branch version)

epoch resolve customers.chk_customers_email --ours
epoch commit -m "Merge feature/add-phone with stricter email validation"

Option B: Use theirs (feature branch version)

epoch resolve customers.chk_customers_email --theirs
epoch commit -m "Merge feature/add-phone with simpler email validation"

Option C: Use custom resolution

# Create custom constraint
epoch resolve customers.chk_customers_email --custom \
    "CHECK (email ~ '^[^@]+@[^@]+\.[^@]+$')"
epoch commit -m "Merge feature/add-phone with balanced email validation"

Using Python:

from horizon_epoch import ConstraintResolution

# Resolve using ours
client.resolve_constraint(
    "customers",
    "chk_customers_email",
    resolution=ConstraintResolution.UseOurs
)

# Or use theirs
client.resolve_constraint(
    "customers",
    "chk_customers_email",
    resolution=ConstraintResolution.UseTheirs
)

# Or provide custom
from horizon_epoch import CheckConstraint

custom_constraint = CheckConstraint(
    name="chk_customers_email",
    expression="email ~ '^[^@]+@[^@]+\\.[^@]+$'"
)
client.resolve_constraint(
    "customers",
    "chk_customers_email",
    resolution=ConstraintResolution.UseCustom(custom_constraint)
)

# Complete the merge
client.commit(message="Merge with resolved constraint conflict")

Part 4: DDL Migration Generation

Horizon Epoch can generate migration scripts for schema changes.

Step 1: View Pending Changes

# Show what migrations would be generated
epoch schema diff main feature/new-indexes --ddl

# Output:
# -- Migration: main -> feature/new-indexes
#
# -- UP
# CREATE INDEX CONCURRENTLY idx_customers_name ON customers (name);
# ALTER TABLE orders ADD CONSTRAINT chk_orders_status
#     CHECK (status IN ('pending', 'processing', 'shipped', 'delivered'));
#
# -- DOWN
# DROP INDEX CONCURRENTLY IF EXISTS idx_customers_name;
# ALTER TABLE orders DROP CONSTRAINT IF EXISTS chk_orders_status;

Step 2: Generate Migration Script

# Generate migration file
epoch schema migrate main feature/new-indexes \
    --output migrations/V2__add_indexes.sql

# Output:
# Generated migration: migrations/V2__add_indexes.sql
#   UP: 2 statements
#   DOWN: 2 statements
#   Warnings: 0

Using Python:

from horizon_epoch import PostgresDdlGenerator

# Get the schema diff
diff = client.diff_schema("main", "feature/new-indexes")

# Generate migration
generator = PostgresDdlGenerator()
migration = generator.generate_migration("customers", diff.constraints)

print("UP statements:")
for stmt in migration.up:
    print(f"  {stmt}")

print("\nDOWN statements:")
for stmt in migration.down:
    print(f"  {stmt}")

if migration.warnings:
    print("\nWarnings:")
    for warn in migration.warnings:
        print(f"  ! {warn}")

# Write to file
with open("migrations/V2__add_indexes.sql", "w") as f:
    f.write("-- UP\n")
    for stmt in migration.up:
        f.write(f"{stmt};\n")
    f.write("\n-- DOWN\n")
    for stmt in migration.down:
        f.write(f"{stmt};\n")

Step 3: Understanding Migration Ordering

The DDL generator follows a specific order to handle dependencies:

UP (Apply) Order:

  1. Create new enum types (before columns using them)
  2. Add values to modified enums
  3. Drop constraints being modified/removed (FK first)
  4. Drop indexes being modified/removed
  5. Handle primary key changes
  6. Add new/modified constraints (check, unique, then FK last)
  7. Create new indexes

DOWN (Reverse) Order:

  1. Drop new indexes
  2. Drop new constraints
  3. Re-add dropped indexes
  4. Re-add dropped constraints
  5. Re-add dropped enum types (with warnings)

Step 4: Check for Destructive Changes

# Check if migration has destructive changes
migration = generator.generate_migration("orders", diff.constraints)

if migration.has_destructive_changes():
    print("WARNING: Migration contains destructive changes!")
    for warn in migration.warnings:
        print(f"  {warn}")

Destructive changes include:

  • Dropping enum values (data may reference them)
  • Dropping foreign keys to tables being dropped
  • Removing NOT NULL on columns with NULL values

Part 5: Handling Enum Type Changes

Enum types require special handling because PostgreSQL only allows additive changes.

Step 1: Add Enum Values

epoch checkout main
-- Add new enum value (safe operation)
ALTER TYPE order_status ADD VALUE 'cancelled' AFTER 'delivered';
epoch commit -m "Add cancelled status"

Step 2: Check Enum Safety

# Get enum diff
diff = client.diff_schema("main", "feature-branch")

for enum_diff in diff.enum_types:
    if enum_diff.is_additive():
        print(f"Enum {enum_diff.name}: Safe to merge (additive only)")
    else:
        print(f"Enum {enum_diff.name}: UNSAFE - values removed or reordered")
        print(f"  Removed: {enum_diff.removed}")
        print(f"  Reordered: {enum_diff.reordered}")

Step 3: Handle Non-Additive Enum Changes

If you need to remove or reorder enum values, you must:

  1. Create a new enum type
  2. Migrate data to the new type
  3. Drop the old type
  4. Rename the new type
-- Create new enum
CREATE TYPE order_status_new AS ENUM ('pending', 'processing', 'shipped');

-- Migrate data (handle 'delivered' and 'cancelled' values)
UPDATE orders SET status = 'shipped'
    WHERE status IN ('delivered', 'cancelled');

-- Change column type
ALTER TABLE orders ALTER COLUMN status TYPE order_status_new
    USING status::text::order_status_new;

-- Drop old, rename new
DROP TYPE order_status;
ALTER TYPE order_status_new RENAME TO order_status;

Best Practices

1. Always Review Constraint Diffs Before Merge

# Check what constraints changed
epoch diff main feature-branch --schema --constraints-only

2. Use Dry-Run for Merges

# Preview merge without applying
result = client.merge("feature", "main", dry_run=True)
if result.has_conflicts:
    print("Resolve these conflicts before merging:")
    for c in result.conflicts:
        print(f"  {c}")

3. Generate and Review Migrations

# Always review generated DDL
epoch schema migrate main feature --output migration.sql --dry-run
cat migration.sql

4. Test Migrations on Branch Data

# Create test branch
epoch branch create test/migration

# Apply migration
psql -f migration.sql

# Verify data integrity
epoch validate test/migration

5. Use Semantic Constraint Names

Follow a consistent naming convention:

  • Foreign keys: fk_{table}_{column} or fk_{table}_{referenced_table}
  • Unique: uq_{table}_{columns}
  • Check: chk_{table}_{description}
  • Index: idx_{table}_{columns}

See Also

How to Connect PostgreSQL

Basic Connection

import asyncio
from horizon_epoch import Client

async def main():
    async with Client.connect("postgresql://user:pass@localhost/horizon_epoch") as client:
        # Client is now connected to metadata database
        await client.init("my-repo")

asyncio.run(main())

CLI Connection

# Set connection in environment
export EPOCH_METADATA_URL="postgresql://user:pass@localhost/horizon_epoch"

# Or use connection string directly
epoch init my-repo --metadata-url "postgresql://localhost/horizon_epoch"

# Register PostgreSQL table
epoch table add users \
    --location "postgresql://localhost/mydb/public.users"

Register a PostgreSQL Table

from horizon_epoch.client import _native

# Track a PostgreSQL table
loc = _native.StorageLocation.postgresql("main", "public", "users")
await client.track_table("users", loc)

# From a specific schema
loc = _native.StorageLocation.postgresql("main", "sales", "orders")
await client.track_table("orders", loc)

Add PostgreSQL as Storage Backend

# CLI - add storage backend
epoch storage add mydb \
    --type postgresql \
    --url "postgresql://user:pass@localhost/production"
from horizon_epoch import StorageBackend

await client.add_storage(
    name="mydb",
    backend=StorageBackend.POSTGRESQL,
    config={"url": "postgresql://user:pass@localhost/production"}
)

With SSL

# CLI with SSL
epoch storage add mydb \
    --type postgresql \
    --url "postgresql://user:pass@host/db?sslmode=require"

For stricter SSL verification:

epoch storage add mydb \
    --type postgresql \
    --url "postgresql://user:pass@host/db?sslmode=verify-full&sslrootcert=/path/to/ca.crt"

Connection String Format

postgresql://[user[:password]@][host][:port][/database][?param=value&...]

Common parameters:

  • sslmode - disable, require, verify-ca, verify-full
  • sslrootcert - Path to root certificate
  • sslcert - Path to client certificate
  • sslkey - Path to client key
  • connect_timeout - Connection timeout in seconds
  • application_name - Application name for logging

Advanced Authentication

For enterprise environments requiring additional security:

Troubleshooting

Connection Refused

Error: connection refused (os error 111)
  • Verify PostgreSQL is running: pg_isready -h localhost
  • Check host and port in connection string
  • Verify firewall rules

Authentication Failed

Error: password authentication failed
  • Verify username and password
  • Check pg_hba.conf for allowed authentication methods
  • Ensure user has access to the database

SSL Errors

Error: SSL connection required

Add ?sslmode=require to connection string, or for self-signed certificates:

?sslmode=verify-ca&sslrootcert=/path/to/ca.crt

Connect MySQL

Connect Horizon Epoch to MySQL databases.

CLI Configuration

# Add MySQL storage backend
epoch storage add mydb \
    --type mysql \
    --url "mysql://user:pass@localhost:3306/database"

Connection String Format

mysql://[user[:password]@][host][:port]/database[?param=value&...]

Common parameters:

  • ssl-mode - disabled, preferred, required, verify_ca, verify_identity
  • ssl-ca - Path to CA certificate
  • connect_timeout - Connection timeout in seconds

SSL/TLS Modes

ModeDescription
disabledNo SSL
preferredUse SSL if available (default)
requiredRequire SSL, don’t verify certificate
verify_caRequire SSL, verify CA
verify_identityRequire SSL, verify CA and hostname

With SSL

epoch storage add mydb \
    --type mysql \
    --url "mysql://user:pass@host/db?ssl-mode=required"

With Certificate Verification

epoch storage add mydb \
    --type mysql \
    --url "mysql://user:pass@host/db?ssl-mode=verify_ca&ssl-ca=/path/to/ca.pem"

Register Tables

epoch table add customers \
    --location "mysql://mydb/customers"

SSH Tunnel Support

MySQL connections support SSH tunneling for secure access:

epoch storage add mydb \
    --type mysql \
    --url "mysql://user:pass@localhost:3306/db" \
    --ssh-host "bastion.example.com" \
    --ssh-user "ec2-user" \
    --ssh-key "~/.ssh/id_rsa"

See SSH Tunnels for more details.

AWS RDS MySQL

# Standard connection
epoch storage add rds \
    --type mysql \
    --url "mysql://admin:pass@mydb.cluster.us-east-1.rds.amazonaws.com:3306/mydb?ssl-mode=required"

For IAM authentication, see AWS Secrets.

Troubleshooting

Connection Refused

  • Verify MySQL is running and accepting connections
  • Check host, port, and firewall rules
  • Verify user has remote access permissions

SSL Errors

  • Ensure SSL mode matches server configuration
  • For self-signed certs, use verify_ca with ssl-ca parameter

Authentication Failed

  • Verify username and password
  • Check user has access from your host ('user'@'%' vs 'user'@'localhost')

Connect Microsoft SQL Server

Connect Horizon Epoch to Microsoft SQL Server databases.

CLI Configuration

# Add SQL Server storage backend
epoch storage add mydb \
    --type mssql \
    --url "mssql://user:pass@localhost:1433/database"

Connection String Format

mssql://[user[:password]@][host][:port]/database[?param=value&...]

Common parameters:

  • encrypt - off, on, required
  • trust_server_certificate - true/false
  • application_name - Application name for logging

Encryption Modes

ModeDescription
offNo encryption
onEncrypt if server supports it
requiredRequire encryption (recommended)

With Encryption

epoch storage add mydb \
    --type mssql \
    --url "mssql://user:pass@host/db?encrypt=required"

Trust Self-Signed Certificate

epoch storage add mydb \
    --type mssql \
    --url "mssql://user:pass@host/db?encrypt=required&trust_server_certificate=true"

Authentication Methods

SQL Server Authentication

epoch storage add mydb \
    --type mssql \
    --url "mssql://sa:password@localhost/database"

Windows Authentication

epoch storage add mydb \
    --type mssql \
    --url "mssql://localhost/database?integrated_security=true"

Register Tables

epoch table add customers \
    --location "mssql://mydb/dbo.customers"

Azure SQL Database

epoch storage add azure \
    --type mssql \
    --url "mssql://user@server:pass@server.database.windows.net:1433/database?encrypt=required"

Troubleshooting

Connection Refused

  • Verify SQL Server is running and TCP/IP is enabled
  • Check SQL Server Configuration Manager for network settings
  • Verify firewall allows port 1433

Login Failed

  • Verify SQL Server authentication is enabled (mixed mode)
  • Check username and password
  • Verify user has access to the database

Encryption Errors

  • Ensure encrypt setting matches server configuration
  • For self-signed certs, use trust_server_certificate=true

Connect SQLite

Connect Horizon Epoch to SQLite databases.

CLI Configuration

# File-based database
epoch storage add local \
    --type sqlite \
    --path "/path/to/database.db"

# In-memory database (for testing)
epoch storage add test \
    --type sqlite \
    --path ":memory:"

Configuration Options

OptionDescriptionDefault
journal_modeWAL, DELETE, TRUNCATE, etc.WAL
foreign_keysEnable foreign key enforcementtrue
busy_timeoutLock wait timeout (ms)5000
epoch storage add local \
    --type sqlite \
    --path "/data/mydb.db" \
    --journal-mode wal

Enable Foreign Keys

epoch storage add local \
    --type sqlite \
    --path "/data/mydb.db" \
    --foreign-keys true

Register Tables

epoch table add users \
    --location "sqlite://local/users"

Use Cases

Development/Testing

SQLite is ideal for local development:

# Create a local test database
epoch storage add dev \
    --type sqlite \
    --path "./test.db"

epoch init my-repo --storage dev

Embedded Applications

For applications that need embedded data versioning without external databases.

Edge/Offline

SQLite works well for edge deployments or offline-first applications.

Constraint Support

SQLite has partial constraint support:

ConstraintSupport
Primary KeyFull
UniqueFull
Foreign KeyRequires foreign_keys=true
CheckFull
IndexFull

Foreign keys must be explicitly enabled:

epoch storage add local \
    --type sqlite \
    --path "/data/mydb.db" \
    --foreign-keys true

Troubleshooting

Database Locked

  • Increase busy_timeout for high-concurrency scenarios
  • Use WAL mode for better concurrent access
  • Ensure only one process writes at a time

Foreign Key Violations Not Detected

  • Ensure foreign_keys=true is set
  • Foreign key enforcement is off by default in SQLite

Permission Denied

  • Check file and directory permissions
  • Ensure the directory exists

How to Connect S3

Connect Horizon Epoch to S3-compatible storage for Delta Lake tables.

CLI Configuration

# Environment variables
export EPOCH_S3_ENDPOINT="http://localhost:9000"
export EPOCH_S3_BUCKET="horizon-data"
export EPOCH_S3_ACCESS_KEY="minioadmin"
export EPOCH_S3_SECRET_KEY="minioadmin"

# Add S3 storage backend
epoch storage add datalake \
    --type s3 \
    --bucket my-data-bucket \
    --region us-east-1

# Register a Delta table
epoch table add sales \
    --location "s3://my-bucket/delta/sales"

Python SDK

import asyncio
from horizon_epoch import Client, StorageBackend

async def main():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        await client.init("my-repo")

        # Add S3 storage backend
        await client.add_storage(
            name="datalake",
            backend=StorageBackend.S3,
            config={
                "bucket": "my-data-bucket",
                "region": "us-east-1"
            }
        )

asyncio.run(main())

Using Environment Variables

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"

The SDK will use AWS environment variables automatically.

MinIO / S3-Compatible Endpoints

# CLI
epoch storage add minio \
    --type s3 \
    --bucket horizon-data \
    --endpoint "http://localhost:9000" \
    --force-path-style
await client.add_storage(
    name="minio",
    backend=StorageBackend.S3,
    config={
        "bucket": "horizon-data",
        "endpoint": "http://localhost:9000",
        "access_key": "minioadmin",
        "secret_key": "minioadmin",
        "force_path_style": True
    }
)

IAM Roles (EC2/ECS/EKS)

When running on AWS infrastructure with IAM roles:

await client.add_storage(
    name="datalake",
    backend=StorageBackend.S3,
    config={
        "bucket": "my-data-bucket",
        "use_instance_credentials": True
    }
)

Assume Role

For cross-account access:

await client.add_storage(
    name="cross-account",
    backend=StorageBackend.S3,
    config={
        "bucket": "other-account-bucket",
        "assume_role_arn": "arn:aws:iam::123456789012:role/HorizonEpochAccess",
        "external_id": "optional-external-id"
    }
)

Registering Delta Tables

# CLI
epoch table add sales \
    --location "s3://my-bucket/delta/sales"
from horizon_epoch.client import _native

loc = _native.StorageLocation.s3("my-bucket", "delta/sales")
await client.track_table("sales", loc)

Table Paths

Tables are stored as Delta Lake tables:

s3://my-bucket/
  delta/
    sales/
      _delta_log/        # Delta transaction log
      part-00000.parquet
      part-00001.parquet
    users/
      _delta_log/
      part-00000.parquet

Configuration File

# ~/.epoch/config.toml
[storage.s3.default]
bucket = "horizon-data"
endpoint = "http://localhost:9000"
access_key = "minioadmin"
secret_key = "minioadmin"

Advanced Authentication

For enterprise environments:

Required Bucket Permissions

Minimum IAM permissions needed:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        }
    ]
}

Troubleshooting

Access Denied

  • Verify IAM permissions include all required actions
  • Check bucket policy allows access from your account/role
  • Ensure bucket region matches configuration

Endpoint Not Found

  • For MinIO, ensure force_path_style=True is set
  • Verify the endpoint URL is correct
  • Check network connectivity to the endpoint

Slow Operations

  • Consider enabling transfer acceleration for large files
  • Use regional endpoints when possible
  • Check for network latency to the S3 region

Connect Azure Blob Storage

Connect Horizon Epoch to Azure Blob Storage for Delta Lake tables.

CLI Configuration

# Add Azure storage backend
epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data"

Authentication Methods

Account Key

epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data" \
    --account-key "your-account-key"

Or via environment variable:

export AZURE_STORAGE_ACCOUNT_KEY="your-account-key"

epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data"

SAS Token

epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data" \
    --sas-token "?sv=2021-06-08&ss=b&srt=sco..."

Service Principal

export AZURE_CLIENT_ID="your-client-id"
export AZURE_CLIENT_SECRET="your-client-secret"
export AZURE_TENANT_ID="your-tenant-id"

epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data" \
    --auth-method service_principal

Managed Identity

For Azure VMs, AKS, or App Service with managed identity:

epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data" \
    --auth-method managed_identity

Register Delta Tables

epoch table add events \
    --location "azure://datalake/delta/events"

Table Storage Structure

Tables are stored as Delta Lake format:

container/
  delta/
    events/
      _delta_log/
      part-00000.parquet
      part-00001.parquet

Configuration File

# ~/.epoch/config.toml
[storage.azure.datalake]
account = "mystorageaccount"
container = "data"
prefix = "horizon-epoch/"
auth_method = "account_key"

Required Permissions

Minimum permissions for the storage account:

  • Storage Blob Data Reader (read operations)
  • Storage Blob Data Contributor (read/write operations)

Troubleshooting

AuthorizationFailure

  • Verify account name and credentials
  • Check container exists
  • Ensure identity has proper RBAC roles

Container Not Found

  • Verify container name is correct
  • Check storage account name

Slow Operations

  • Use regional endpoints when possible
  • Consider Azure Private Link for private network access

Connect Google Cloud Storage

Connect Horizon Epoch to Google Cloud Storage for Delta Lake tables.

CLI Configuration

# Add GCS storage backend
epoch storage add datalake \
    --type gcs \
    --bucket "my-gcs-bucket" \
    --project "my-project-id"

Authentication Methods

Service Account Key

# Set credentials file path
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

epoch storage add datalake \
    --type gcs \
    --bucket "my-gcs-bucket" \
    --project "my-project-id"

Or specify directly:

epoch storage add datalake \
    --type gcs \
    --bucket "my-gcs-bucket" \
    --project "my-project-id" \
    --credentials-file "/path/to/service-account.json"

Application Default Credentials (ADC)

For GCE, GKE, Cloud Run, or local development with gcloud auth:

# Local development: authenticate with gcloud
gcloud auth application-default login

epoch storage add datalake \
    --type gcs \
    --bucket "my-gcs-bucket" \
    --project "my-project-id" \
    --auth-method adc

Workload Identity (GKE)

For GKE with Workload Identity configured:

epoch storage add datalake \
    --type gcs \
    --bucket "my-gcs-bucket" \
    --project "my-project-id" \
    --auth-method adc

Register Delta Tables

epoch table add events \
    --location "gcs://datalake/delta/events"

Table Storage Structure

Tables are stored as Delta Lake format:

bucket/
  delta/
    events/
      _delta_log/
      part-00000.parquet
      part-00001.parquet

Configuration File

# ~/.epoch/config.toml
[storage.gcs.datalake]
bucket = "my-gcs-bucket"
project_id = "my-project-id"
prefix = "horizon-epoch/"
auth_method = "service_account"
credentials_file = "/path/to/service-account.json"

Required Permissions

Minimum IAM permissions for the service account:

roles/storage.objectViewer  # Read operations
roles/storage.objectCreator # Write operations
roles/storage.objectAdmin   # Full access (recommended)

Or custom role with:

  • storage.objects.get
  • storage.objects.list
  • storage.objects.create
  • storage.objects.delete

Troubleshooting

Permission Denied

  • Verify service account has required IAM roles
  • Check bucket-level permissions
  • Ensure project ID is correct

Bucket Not Found

  • Verify bucket name is correct
  • Check bucket is in the correct project

Credentials Not Found

  • Set GOOGLE_APPLICATION_CREDENTIALS environment variable
  • Or use --credentials-file parameter
  • For ADC, run gcloud auth application-default login

Connect Local Filesystem

Connect Horizon Epoch to local filesystem storage for Delta Lake tables.

CLI Configuration

# Add local filesystem storage backend
epoch storage add local \
    --type local \
    --path "/data/horizon-epoch"

Use Cases

Development and Testing

Local storage is ideal for development:

epoch storage add dev \
    --type local \
    --path "./data"

epoch init my-repo --storage dev

Single-Machine Deployments

For applications that don’t need distributed storage:

epoch storage add data \
    --type local \
    --path "/var/lib/horizon-epoch"

Air-Gapped Environments

For environments without cloud access:

epoch storage add offline \
    --type local \
    --path "/mnt/data/epoch"

Register Delta Tables

epoch table add events \
    --location "local://local/events"

Table Storage Structure

Tables are stored as Delta Lake format:

/data/horizon-epoch/
  events/
    _delta_log/
      00000000000000000000.json
    part-00000-*.parquet
    part-00001-*.parquet
  users/
    _delta_log/
    part-00000-*.parquet

Configuration File

# ~/.epoch/config.toml
[storage.local.dev]
path = "/data/horizon-epoch"

Features

  • Delta Lake format - Same format as S3/Azure/GCS
  • SQL queries - Query via DataFusion
  • Copy-on-write - Efficient branching
  • Time travel - Access historical versions

Permissions

Ensure the Epoch process has read/write access:

# Create directory with proper permissions
mkdir -p /data/horizon-epoch
chown -R epoch:epoch /data/horizon-epoch
chmod 750 /data/horizon-epoch

Troubleshooting

Permission Denied

  • Check directory exists and is writable
  • Verify process user has access

Disk Full

  • Monitor disk space
  • Use epoch gc to clean up old versions
  • Configure retention policies

Path Not Found

  • Ensure parent directory exists
  • Use absolute paths to avoid ambiguity

How to Resolve Conflicts

When merging branches, conflicts occur when both branches modify the same record’s fields. This guide explains how to identify, understand, and resolve conflicts.

Understanding Conflicts

A conflict happens when:

  1. Both branches modify the same record (identified by primary key)
  2. Both branches change the same field to different values

Example:

  • Branch A changes user 42’s email to alice@company.com
  • Branch B changes user 42’s email to alice.smith@example.com

Detecting Conflicts

When you run a merge:

epoch merge feature/branch

If conflicts exist:

CONFLICT: Merge conflict in users
Automatic merge failed; fix conflicts and then commit.

Viewing Conflicts

List Conflicting Tables

epoch status

Output:

On branch main
You have unmerged paths.

Unmerged paths:
  both modified:   users (3 conflicts)
  both modified:   orders (1 conflict)

View Conflict Details

epoch conflicts show

Output:

Table: users
------------------------------------------------------------
Record ID: 42
  Field: email
    BASE:   alice@example.com
    OURS:   alice@company.com
    THEIRS: alice.smith@example.com

  Field: phone
    BASE:   555-0100
    OURS:   555-0100
    THEIRS: 555-0199

Record ID: 87
  Field: status
    BASE:   active
    OURS:   inactive
    THEIRS: suspended
------------------------------------------------------------

Export Conflicts to File

epoch conflicts show --format json > conflicts.json

Resolution Methods

epoch conflicts resolve --interactive

This walks you through each conflict:

Conflict 1 of 4: users.email (record 42)
  BASE:   alice@example.com
  OURS:   alice@company.com
  THEIRS: alice.smith@example.com

Choose resolution:
  [o] Keep ours (alice@company.com)
  [t] Keep theirs (alice.smith@example.com)
  [b] Keep base (alice@example.com)
  [c] Enter custom value
  [s] Skip (resolve later)
>

2. Accept All from One Side

Accept all changes from your branch (target):

epoch conflicts resolve --ours

Accept all changes from the source branch:

epoch conflicts resolve --theirs

3. Per-Table Resolution

# Accept theirs for users table
epoch conflicts resolve users --theirs

# Accept ours for orders table
epoch conflicts resolve orders --ours

4. Per-Record Resolution

# Keep our version of record 42
epoch conflicts resolve users --record 42 --ours

# Keep their version of record 87
epoch conflicts resolve users --record 87 --theirs

5. Per-Field Resolution

# Set specific value for a field
epoch conflicts resolve users \
    --record 42 \
    --field email \
    --value "alice.final@company.com"

6. Python SDK Resolution

# Get conflicts
conflicts = client.get_conflicts()

for conflict in conflicts:
    print(f"Table: {conflict.table_name}")
    print(f"Record: {conflict.record_id}")

    for field in conflict.fields:
        print(f"  {field.name}:")
        print(f"    base={field.base_value}")
        print(f"    ours={field.ours_value}")
        print(f"    theirs={field.theirs_value}")

        # Resolve programmatically
        client.resolve_conflict(
            table=conflict.table_name,
            record_id=conflict.record_id,
            field=field.name,
            value=determine_winner(field)  # Your logic
        )

# Continue merge
client.merge_continue()

Completing the Merge

After resolving all conflicts:

epoch merge --continue

Or if you want to abandon the merge:

epoch merge --abort

Conflict Prevention Strategies

  1. Merge frequently - Smaller, more frequent merges have fewer conflicts
  2. Communicate - Coordinate with team on who owns which data
  3. Use feature flags - Test changes without modifying shared records
  4. Partition work - Different team members work on different record sets

Best Practices

  1. Understand the context - Read the commit messages from both branches
  2. Check business rules - Some conflicts have a “correct” answer based on business logic
  3. Document decisions - Add context in your merge commit message
  4. Verify after merge - Run validation queries after resolving conflicts

Example Workflow

# 1. Start merge
epoch merge feature/user-updates

# 2. See what conflicted
epoch conflicts show

# 3. Resolve interactively
epoch conflicts resolve --interactive

# 4. Verify resolutions
epoch status
epoch diff HEAD

# 5. Complete merge with descriptive message
epoch merge --continue -m "Merge user-updates: resolved email conflicts using company domain policy"

How to Migrate from Snapshots

If you’re currently using database snapshots or manual backups for versioning, this guide helps you migrate to Horizon Epoch.

Current Snapshot-Based Workflow

Many teams use approaches like:

  • Daily database dumps
  • PostgreSQL pg_dump snapshots
  • AWS RDS snapshots
  • Manual CREATE TABLE ... AS SELECT copies

These approaches have limitations:

  • Full data copies are expensive
  • No granular change tracking
  • Difficult to compare versions
  • Merging changes is manual

Migration Overview

  1. Initialize a Horizon Epoch repository
  2. Import your existing data as the initial commit
  3. Set up tables for tracking
  4. Train your team on the new workflow

Step-by-Step Migration

1. Prepare Your Environment

# Start services
docker compose -f docker/docker-compose.yml up -d

# Build Horizon Epoch from source (see Installation guide)
cargo build --release

2. Initialize Repository

epoch init production-data \
    --metadata-url "postgresql://localhost/horizon_epoch" \
    --description "Migrated from daily snapshots"

3. Register Existing Tables

# Register each table you want to version
epoch table add customers \
    --location "postgresql://localhost/mydb/public.customers"

epoch table add orders \
    --location "postgresql://localhost/mydb/public.orders"

epoch table add products \
    --location "postgresql://localhost/mydb/public.products"

4. Create Initial Commit

epoch commit -m "Initial import from production snapshot 2024-01-15"

5. Tag Important Snapshots

If you have historical snapshots you want to reference:

# Tag the current state
epoch tag snapshot-2024-01-15 \
    --message "Migration baseline from daily snapshot"

6. Import Historical Snapshots (Optional)

If you have historical data you want to preserve:

import asyncio
from horizon_epoch import Client, Author

async def import_snapshots():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # For each historical snapshot
        for snapshot_date in historical_dates:
            # Restore snapshot to a temporary database
            restore_snapshot(snapshot_date, temp_db)

            # Create a branch for this point in time
            branch_name = f"history/{snapshot_date}"
            await client.branch(branch_name)

            # Register tables from restored snapshot
            # (pointing to temp database)

            # Commit
            await client.commit(
                message=f"Historical snapshot: {snapshot_date}",
                author=Author(name="Migration", email="ops@example.com")
            )

            # Tag for easy reference
            await client.tag_create(
                name=f"snapshot-{snapshot_date}",
                message="Imported from backup"
            )

asyncio.run(import_snapshots())

Mapping Snapshot Workflows

Daily Backup Replacement

Before:

# Nightly cron job
pg_dump mydb > /backups/mydb-$(date +%Y%m%d).sql

After:

# Nightly cron job
epoch commit -m "Daily snapshot $(date +%Y-%m-%d)"
epoch tag daily-$(date +%Y%m%d)

Pre-Change Backup

Before:

# Before making changes
pg_dump mydb > /backups/pre-migration.sql
# Make changes
# If something goes wrong, restore from pre-migration.sql

After:

# Before making changes
epoch tag pre-migration-$(date +%Y%m%d)
# Make changes
epoch commit -m "Applied migration X"
# If something goes wrong
epoch reset --hard pre-migration-$(date +%Y%m%d)

Environment Copies

Before:

# Create staging from production
pg_dump prod_db | psql staging_db

After:

# Create staging branch from production
epoch branch create staging --from main
# Staging now has zero-copy access to production data

Handling Large Datasets

For very large databases:

1. Incremental Registration

Register tables in batches:

# Critical tables first
epoch table add customers orders products
epoch commit -m "Core business tables"

# Then secondary tables
epoch table add logs analytics events
epoch commit -m "Add operational tables"

2. Exclude Large/Non-Critical Tables

Some tables might not need versioning:

# Configure exclusions via CLI
# epoch config set exclude_patterns "audit_logs_*,*_archive,temp_*"

# Or skip these tables when registering
# Only register the tables you want to version

3. Use Partitioned Commits

For very large initial imports:

# Commit in chunks
epoch commit --tables customers,orders -m "Batch 1: customer data"
epoch commit --tables products,inventory -m "Batch 2: product data"

Validation

After migration, verify everything works:

# Check repository status
epoch status

# View commit history
epoch log

# Verify tables are registered
epoch table list

# Test branching
epoch branch create test-migration
epoch checkout test-migration
# Make a small change, commit, merge back
epoch checkout main
epoch merge test-migration
epoch branch delete test-migration

Team Training

Key concepts to communicate:

  1. Branches replace copies - No more copying entire databases
  2. Commits are lightweight - Only changes are stored
  3. Tags mark important points - Like naming a backup
  4. Merging combines changes - No more manual comparison

Rollback Plan

If migration doesn’t go smoothly:

  1. Your existing snapshot workflow still works
  2. Horizon Epoch metadata is separate from your data
  3. Remove Horizon Epoch without affecting production:
    # Remove metadata database
    dropdb horizon_epoch
    

Next Steps

Multi-Backend Setup

Configure Horizon Epoch to work with multiple storage backends simultaneously.

Overview

Horizon Epoch can manage data across different storage systems:

  • Multiple PostgreSQL databases
  • Multiple S3 buckets
  • Mixed PostgreSQL + S3 environments
  • Cross-region or cross-account configurations

Architecture

┌─────────────────────────────────────────────────┐
│           Horizon Epoch Metadata                │
│         (Single PostgreSQL Database)            │
└─────────────────────┬───────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
┌───────────┐  ┌───────────┐  ┌───────────┐
│ PostgreSQL│  │ PostgreSQL│  │    S3     │
│  (users)  │  │  (orders) │  │  (events) │
└───────────┘  └───────────┘  └───────────┘

Configuration

Multiple PostgreSQL Databases

import asyncio
from horizon_epoch import Client, StorageBackend

async def setup_multi_backend():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        await client.init("multi-backend-repo")

        # Add multiple PostgreSQL backends
        await client.add_storage(
            name="users_db",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://localhost/users"}
        )

        await client.add_storage(
            name="orders_db",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://prod-orders.cluster.rds.amazonaws.com/orders"}
        )

        await client.add_storage(
            name="analytics_db",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://analytics.internal/warehouse"}
        )

asyncio.run(setup_multi_backend())

Multiple S3 Buckets

# Add multiple S3 backends
await client.add_storage(
    name="raw_data",
    backend=StorageBackend.S3,
    config={"bucket": "company-raw-data", "region": "us-east-1"}
)

await client.add_storage(
    name="processed_data",
    backend=StorageBackend.S3,
    config={"bucket": "company-processed", "region": "us-west-2"}
)

await client.add_storage(
    name="archive",
    backend=StorageBackend.S3,
    config={"bucket": "company-archive", "region": "eu-west-1"}
)

Mixed PostgreSQL + S3

# Add both PostgreSQL and S3 backends
await client.add_storage(
    name="transactional",
    backend=StorageBackend.POSTGRESQL,
    config={"url": "postgresql://localhost/production"}
)

await client.add_storage(
    name="datalake",
    backend=StorageBackend.S3,
    config={"bucket": "company-datalake"}
)

Registering Tables

Specify the storage backend when registering tables:

from horizon_epoch.client import _native

# PostgreSQL table
loc = _native.StorageLocation.postgresql("users_db", "public", "users")
await client.track_table("users", loc)

# Different PostgreSQL database
loc = _native.StorageLocation.postgresql("orders_db", "public", "orders")
await client.track_table("orders", loc)

# S3 Delta table
loc = _native.StorageLocation.s3("datalake", "delta/events")
await client.track_table("events", loc)

CLI Registration

# PostgreSQL
epoch table add users --location "postgresql://users_db/public.users"

# S3
epoch table add events --location "s3://datalake/delta/events"

Configuration File

# epoch.toml
[metadata]
url = "postgresql://localhost/horizon_epoch"

# PostgreSQL backends
[storage.postgres.users_db]
url = "postgresql://localhost/users"

[storage.postgres.orders_db]
url = "postgresql://prod-orders.cluster.rds.amazonaws.com/orders"
aws_secret_id = "horizon-epoch/orders-db"  # Credentials from Secrets Manager

[storage.postgres.analytics]
host = "analytics.internal"
database = "warehouse"
vault_path = "secret/data/analytics-db"  # Credentials from Vault

# S3 backends
[storage.s3.raw]
bucket = "company-raw-data"
region = "us-east-1"

[storage.s3.processed]
bucket = "company-processed"
region = "us-west-2"
assume_role_arn = "arn:aws:iam::123456789012:role/DataAccess"

[storage.s3.archive]
bucket = "company-archive"
region = "eu-west-1"
endpoint = "https://s3.eu-west-1.amazonaws.com"

Cross-Storage Operations

Branching

Branches span all registered tables:

# Creates branch affecting all tables
epoch branch create feature/new-schema

Commits

Commits can include changes across backends:

# Commit changes from any backend
epoch commit -m "Update user preferences and events schema"

Diff

Compare changes across storage types:

epoch diff main feature/new-schema

Output:

PostgreSQL (users_db):
  users: 5 modified, 2 added

S3 (datalake):
  events: schema changed, 1000 added

Merge

Merges coordinate across all backends:

epoch merge feature/new-schema

Storage Routing

Default Backend

Set a default storage backend via configuration:

epoch config set default_storage users_db

Explicit Routing

Always specify backend in table location:

# Explicit backend reference via StorageLocation
loc = _native.StorageLocation.postgresql("users_db", "public", "users")
await client.track_table("users", loc)

Cross-Account S3

await client.add_storage(
    name="partner_data",
    backend=StorageBackend.S3,
    config={
        "bucket": "partner-shared-bucket",
        "assume_role_arn": "arn:aws:iam::999888777666:role/HorizonEpochAccess",
        "external_id": "partner-integration-id"
    }
)

Best Practices

1. Separate Metadata from Data

The metadata database should be separate from data storage:

# Connect to dedicated metadata database
async with Client.connect("postgresql://metadata.internal/horizon_epoch") as client:
    # Add production data storage (different server)
    await client.add_storage(
        name="production",
        backend=StorageBackend.POSTGRESQL,
        config={"url": "postgresql://data.internal/production"}
    )

2. Use Consistent Naming

# Good: Clear, consistent names
await client.add_storage("prod_users", StorageBackend.POSTGRESQL, {...})
await client.add_storage("prod_orders", StorageBackend.POSTGRESQL, {...})
await client.add_storage("prod_events", StorageBackend.S3, {...})

# Avoid: Inconsistent or unclear names like "db1" or "bucket"

3. Document Backend Purpose

# epoch.toml
# Users database - primary transactional storage
[storage.postgres.users]
url = "postgresql://..."

# Orders database - order processing system
[storage.postgres.orders]
url = "postgresql://..."

# Event lake - analytics events from all services
[storage.s3.events]
bucket = "company-events"

4. Handle Partial Failures

Operations across multiple backends may partially fail:

try:
    result = client.commit(message="Multi-backend update")
except PartialCommitError as e:
    print(f"Committed to: {e.successful_backends}")
    print(f"Failed on: {e.failed_backends}")
    # Handle recovery

Troubleshooting

Backend Not Found

Error: Storage backend 'mydb' not found
  • Verify backend is configured in config
  • Check backend name spelling

Cross-Backend Transaction

Warning: Cross-backend operations are not atomic
  • Horizon Epoch provides best-effort consistency
  • Use single backend for strict atomicity requirements

Permission Mismatch

Error: Access denied on backend 'orders_db'
  • Verify credentials for each backend
  • Check each backend has required permissions

Next Steps

Branch-Aware Queries

Query data from specific branches, compare between branches, and understand how branch context affects data access.

Overview

When you query data through Horizon Epoch:

  • Each branch shows its own version of the data
  • Changes on a branch don’t affect other branches
  • You can query multiple branches simultaneously

Setting Branch Context

CLI

# Set current branch
epoch checkout feature/new-schema

# All subsequent operations use this branch
epoch query "SELECT * FROM users"

Python SDK

import asyncio
from horizon_epoch import Client

async def query_branch():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Checkout the branch
        await client.checkout("feature/new-schema")

        # Get status to see current branch
        status = await client.status()
        print(f"On branch: {status.branch}")

        # Get data diff between branches
        diff = await client.diff(
            base="main",
            target="feature/new-schema"
        )
        for table_diff in diff.table_diffs:
            print(f"{table_diff.table_name}: {table_diff.total_changes}")

asyncio.run(query_branch())

How Branch Queries Work

Copy-on-Write Overlay

When you query a branch:

  1. Base data comes from the parent branch
  2. Branch-specific changes overlay the base
  3. You see the combined result
main:     users = [Alice, Bob, Charlie]
                            │
feature:  overlay = [Charlie → Charles]
                            │
query:    result = [Alice, Bob, Charles]

Query Flow

-- On branch 'feature/update-names'
SELECT * FROM users WHERE id = 3;

-- Horizon Epoch executes:
-- 1. Check branch overlay for id=3
-- 2. If found, return overlay version
-- 3. If not found, return base version

Comparing Branches

Diff Query

Compare data between two branches:

# Get records that differ between branches
diff = client.diff_query(
    table="users",
    source="main",
    target="feature/updates"
)

for change in diff:
    print(f"{change.type}: {change.record}")

Side-by-Side Comparison

# Query both branches
main_data = client.query("SELECT * FROM users", branch="main")
feature_data = client.query("SELECT * FROM users", branch="feature/updates")

# Compare programmatically
import pandas as pd
main_df = pd.DataFrame(main_data)
feature_df = pd.DataFrame(feature_data)
comparison = main_df.compare(feature_df)

CLI Comparison

# Show diff
epoch diff main feature/updates --table users

# Show records only in one branch
epoch diff main feature/updates --only-in-target

Querying Multiple Branches

Union Across Branches

# Get all unique records across branches
results = client.query_branches(
    query="SELECT * FROM users",
    branches=["main", "feature/a", "feature/b"],
    mode="union"
)

Compare Aggregate Results

# Run same query on multiple branches
for branch in ["main", "staging", "production"]:
    result = client.query(
        "SELECT COUNT(*) as user_count FROM users",
        branch=branch
    )
    print(f"{branch}: {result[0]['user_count']} users")

Time-Travel Queries

Query data at a specific point in time:

By Commit ID

# Query at specific commit
client.query(
    "SELECT * FROM users",
    at_commit="abc123f"
)

By Tag

# Query at tagged version
client.query(
    "SELECT * FROM users",
    at_tag="v1.0.0"
)

By Timestamp

from datetime import datetime, timedelta

# Query data as of yesterday
yesterday = datetime.now() - timedelta(days=1)
client.query(
    "SELECT * FROM users",
    at_time=yesterday
)

CLI Time-Travel

# Query at commit
epoch query "SELECT * FROM users" --at abc123f

# Query at tag
epoch query "SELECT * FROM users" --at v1.0.0

# Query at time
epoch query "SELECT * FROM users" --at "2024-01-15 10:00:00"

Performance Considerations

Branch Depth

Deeply nested branches may be slower:

main → dev → feature → sub-feature → experiment

Each level adds overlay resolution. Keep branches shallow when possible.

Indexing

Branch overlays use primary keys for efficient lookup:

# Fast: uses primary key
client.query("SELECT * FROM users WHERE id = 123", branch="feature")

# Slower: scans overlay + base
client.query("SELECT * FROM users WHERE name = 'Alice'", branch="feature")

Caching

Query caching can be configured via the CLI:

epoch config set query_cache_enabled true
epoch config set query_cache_ttl 300

Branch Isolation Guarantees

Read Isolation

Each branch sees consistent data:

  • Changes on other branches are invisible
  • Commits create point-in-time snapshots
  • Queries always return consistent results

Write Isolation

Changes on a branch don’t affect others until merged:

# On feature branch
await client.checkout("feature")
# Make changes via your database connection...
await client.stage_all()
await client.commit(message="Deactivate user 1")

# Main branch is unchanged - checkout and diff to compare
await client.checkout("main")
# Main still has the original data

Common Patterns

A/B Testing

# Set up test branches
await client.branch("experiment/new-algo", start_point="main")

# Run algorithm on each branch
for branch in ["main", "experiment/new-algo"]:
    await client.checkout(branch)
    run_algorithm()
    results = measure_metrics()
    print(f"{branch}: {results}")

Data Validation

# Validate changes before merge
await client.checkout("feature/migration")

# Use diff to check for issues
diff = await client.diff(base="main", target="feature/migration")
for table_diff in diff.table_diffs:
    if table_diff.total_changes > 1000:
        raise ValidationError(f"Too many changes in {table_diff.table_name}")

Environment Comparison

# Compare data across environments
for env in ["development", "staging", "production"]:
    result = client.query(
        "SELECT COUNT(*) as count, MAX(updated_at) as latest FROM orders",
        branch=env
    )
    print(f"{env}: {result}")

Troubleshooting

Wrong Branch Data

Expected data from 'feature' but got 'main' data
  • Verify branch context is set correctly
  • Check if query specifies explicit branch
  • Verify commit was made on correct branch

Missing Changes

Can't see changes I just made
  • Changes must be committed to be visible in queries
  • Uncommitted changes are in working state only
  • Use epoch status to see uncommitted changes

Slow Queries

  • Reduce branch nesting depth
  • Ensure queries use primary keys when possible
  • Consider materializing long-lived branches

Next Steps

How to: Schema Versioning

This guide covers schema versioning workflows including upgrading existing repositories, migrating schemas, handling rollbacks, and working with schemas across multiple branches.

Prerequisites

  • Horizon Epoch installed (Installation Guide)
  • PostgreSQL running with a repository configured
  • Familiarity with basic version control concepts

Understanding Schema Versioning

Horizon Epoch versions database schemas alongside your data. Every commit captures:

  • Column definitions: Name, type, nullability, defaults
  • Constraints: Foreign keys, unique, check, exclusion
  • Indexes: B-tree, GIN, GiST, BRIN, hash
  • Enum types: Custom PostgreSQL enums

Schema snapshots are stored as JSONB in the table_versions table, enabling full schema history and three-way merge for schema changes.


Upgrading Existing Repositories

If you have an existing Horizon Epoch repository created before schema versioning was added, you can backfill schema information for historical commits.

Step 1: Check Current Status

# See which tables have schema data
epoch schema status

# Output:
# Table      | Latest Schema | Historical | Live DB  | Status
# -----------|---------------|------------|----------|--------
# users      | Yes           | Partial    | Synced   | OK
# orders     | Yes           | No         | Synced   | Needs backfill
# products   | No            | No         | Drifted  | Needs migration

Step 2: Backfill Historical Schemas

For tables where live database is accessible, backfill historical commits:

# Backfill schema for specific table
epoch migrate-schemas users --backfill

# Backfill all tables with missing schemas
epoch migrate-schemas --backfill --all

# Preview what would be backfilled (dry-run)
epoch migrate-schemas --backfill --all --dry-run

What happens during backfill:

  1. For each commit without schema data, Epoch checks if the table location is accessible
  2. If accessible, it introspects the current schema and stores it
  3. If not accessible, the commit is marked as “schema unavailable”

Step 3: Verify Migration

# Verify all tables have schema data
epoch schema status

# View schema at a specific commit
epoch schema show users --at abc123f

Migration Guide for Existing Repositories

Scenario: Adding Schema Versioning to Production

If you have a production repository and want to enable full schema versioning:

1. Create a Baseline

# Create a new commit that captures current schema
epoch commit -m "Capture schema baseline" --include-schema

This creates a commit with full schema snapshots for all registered tables.

2. Update Your Workflow

From this point forward, all commits automatically include schema snapshots.

# Normal workflow - schema is captured automatically
epoch branch create feature/add-column
# ... make schema changes ...
epoch commit -m "Add email_verified column"

3. Handle Historical Data (Optional)

If you need schema history for older commits:

# Option A: Backfill from live database (assumes schema hasn't changed)
epoch migrate-schemas --backfill --since "2024-01-01"

# Option B: Import from external migrations
epoch migrate-schemas --import-from flyway ./migrations
epoch migrate-schemas --import-from liquibase ./changelog.xml

Rollback Procedures

Rolling Back Schema Changes

If a schema change causes issues, you can roll back using several methods:

Method 1: Revert the Commit

Create a new commit that reverses the schema change:

# Revert the problematic commit
epoch revert abc123f

# This creates a new commit with:
# - Data changes reversed
# - Schema changes reversed
# - DDL generated to undo schema modifications

Method 2: Reset to Previous Commit

Reset the branch to a known-good state:

# Soft reset (keep changes for review)
epoch reset --soft HEAD~1

# Hard reset (discard changes)
epoch reset --hard HEAD~1

# Reset to specific commit
epoch reset --hard v1.0.0

After reset, apply the DDL to synchronize the live database:

# Generate rollback DDL
epoch schema diff HEAD v1.0.0 --show-ddl > rollback.sql

# Review and apply
psql -f rollback.sql

Method 3: Checkout and Apply

Create a new branch from a known-good state:

# Create branch from good state
epoch branch create hotfix/rollback --from v1.0.0

# Generate DDL to bring live DB to that state
epoch schema apply users --from v1.0.0 --dry-run

# Apply after review
epoch schema apply users --from v1.0.0

Handling Rollback Failures

Some schema changes cannot be automatically rolled back:

Change TypeRollback DifficultyNotes
Add columnEasyDROP COLUMN
Add indexEasyDROP INDEX
Add constraintEasyDROP CONSTRAINT
Drop columnData LossColumn data is gone
Change column typeMay failData may not fit old type
Remove enum valueMay failRows may use removed value

For destructive changes, consider:

# 1. Check for data dependencies
epoch schema diff HEAD~1 HEAD --check-data-impact

# 2. Create data backup before proceeding
epoch export users --at HEAD~1 > users_backup.parquet

# 3. Apply rollback with force flag
epoch schema apply users --from HEAD~1 --force

Multi-Branch Schema Workflows

Feature Branch Schema Changes

When developing schema changes on a feature branch:

# 1. Create feature branch
epoch branch create feature/add-preferences

# 2. Make schema changes
psql -c "ALTER TABLE users ADD COLUMN preferences JSONB"

# 3. Commit (schema captured automatically)
epoch commit -m "Add user preferences column"

# 4. Preview merge
epoch merge main --dry-run
# Shows: Schema changes to apply on main

# 5. Merge when ready
epoch merge main

Parallel Schema Development

When multiple branches modify the same table:

# Branch A adds column
epoch checkout feature-a
psql -c "ALTER TABLE users ADD COLUMN theme VARCHAR(20)"
epoch commit -m "Add theme column"

# Branch B adds different column
epoch checkout feature-b
psql -c "ALTER TABLE users ADD COLUMN locale VARCHAR(10)"
epoch commit -m "Add locale column"

# Merge A into main
epoch checkout main
epoch merge feature-a  # Success

# Merge B into main
epoch merge feature-b  # Also success - no conflict

Both columns are added because they don’t conflict.

Schema Conflict Resolution

When branches modify the same constraint:

# Branch A modifies check constraint
epoch checkout feature-a
psql -c "ALTER TABLE users DROP CONSTRAINT chk_age"
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 18)"
epoch commit -m "Require adult users"

# Branch B modifies same constraint differently
epoch checkout feature-b
psql -c "ALTER TABLE users DROP CONSTRAINT chk_age"
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 13)"
epoch commit -m "Allow teen users"

# Attempt merge
epoch checkout main
epoch merge feature-a  # Success
epoch merge feature-b  # CONFLICT

# Resolve conflict
epoch conflicts show
# CONFLICT (constraint): chk_age modified in both branches

# Choose resolution
epoch conflicts resolve --constraint chk_age --custom "CHECK (age >= 16)"
epoch commit -m "Merge feature-b with compromise age check"

Environment Promotion with Schema

When promoting schema changes between environments:

# 1. Develop on dev branch
epoch checkout dev
# ... make changes ...
epoch commit -m "Schema updates for v2.0"

# 2. Generate migration for staging
epoch schema diff staging dev --show-ddl > staging_migration.sql

# 3. Review migration
cat staging_migration.sql

# 4. Apply to staging environment
PGHOST=staging-db psql -f staging_migration.sql

# 5. Merge to staging branch
epoch checkout staging
epoch merge dev

# 6. Repeat for production
epoch schema diff production staging --show-ddl > prod_migration.sql

Best Practices

1. Always Review Schema Diffs

# Before merging, check schema changes
epoch diff main feature-branch --schema

# Generate DDL preview
epoch schema diff main feature-branch --show-ddl

2. Use Descriptive Commit Messages

# Bad
epoch commit -m "Update schema"

# Good
epoch commit -m "Add email_verified column with default false for GDPR compliance"

3. Test Migrations on Branch Data

# Create test branch
epoch branch create test/migration-validation

# Apply migration
epoch schema apply users --from feature-branch --dry-run

# Validate
epoch validate --table users

4. Handle Large Tables Carefully

# For large tables, use online mode
epoch schema apply users --from feature-branch --online

# Monitor progress
epoch schema apply --status

5. Version Your Enum Changes Carefully

# Adding values is safe
psql -c "ALTER TYPE status ADD VALUE 'archived'"

# Removing values requires migration
# See: docs/how-to/ddl-migrations.md

Troubleshooting

Schema Not Captured

If schema isn’t being captured in commits:

# Check table is registered
epoch table list

# Re-register with schema introspection
epoch table add users --location "postgresql://mydb/public.users" --introspect-schema

Schema Drift Detected

If live database differs from stored schema:

# See differences
epoch schema diff users --live

# Capture current state
epoch commit -m "Sync schema from live database" --rescan-schema

# Or apply stored schema to database
epoch schema apply users --from HEAD --force

Backfill Fails

If schema backfill fails:

# Check connectivity
epoch doctor

# Try specific table
epoch migrate-schemas users --backfill --verbose

# Skip inaccessible tables
epoch migrate-schemas --backfill --skip-inaccessible

See Also

How to: DDL Migrations

This guide covers generating DDL migration scripts, integrating with external migration tools, and safely applying schema changes to production databases.

Prerequisites

  • Horizon Epoch installed (Installation Guide)
  • PostgreSQL database with schema versioning enabled
  • Understanding of DDL and database migrations

Generating Migrations with PostgresDdlGenerator

Horizon Epoch includes a DDL generator that produces properly ordered PostgreSQL migration scripts.

Basic Migration Generation

# Generate migration between branches
epoch schema diff main feature-branch --show-ddl

# Output migration to file
epoch schema diff main feature-branch --ddl-output migrations/V002__add_preferences.sql

Migration Script Structure

Generated migrations include both UP and DOWN sections:

-- Migration: main -> feature/add-preferences
-- Generated: 2024-12-18T10:30:00Z

-- ============================================================
-- UP: Apply changes
-- ============================================================

-- Create new enum types first
CREATE TYPE notification_type AS ENUM ('email', 'sms', 'push');

-- Drop constraints that will be modified
ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;

-- Add new columns
ALTER TABLE users ADD COLUMN preferences JSONB;
ALTER TABLE users ADD COLUMN notification_pref notification_type DEFAULT 'email';

-- Add new constraints
ALTER TABLE users ADD CONSTRAINT chk_users_email
    CHECK (email ~ '^[^@]+@[^@]+\.[^@]+$');

-- Create indexes
CREATE INDEX CONCURRENTLY idx_users_preferences
    ON users USING gin (preferences);

-- ============================================================
-- DOWN: Reverse changes
-- ============================================================

DROP INDEX CONCURRENTLY IF EXISTS idx_users_preferences;

ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;

ALTER TABLE users DROP COLUMN IF EXISTS notification_pref;
ALTER TABLE users DROP COLUMN IF EXISTS preferences;

-- WARNING: Cannot remove enum values in PostgreSQL
-- Enum 'notification_type' preserved (manual cleanup required)

Statement Ordering

The DDL generator follows a specific order to handle dependencies:

UP (Apply) Order:

  1. Create new enum types (before columns using them)
  2. Add values to modified enums
  3. Drop constraints being modified/removed (FK first, then others)
  4. Drop indexes being modified/removed
  5. Handle primary key changes
  6. Add/modify columns
  7. Add new constraints (check, unique, exclusion, then FK last)
  8. Create new indexes

DOWN (Reverse) Order:

  1. Drop new indexes
  2. Drop new constraints (FK first)
  3. Drop new columns
  4. Re-add dropped columns (if data preserved)
  5. Re-add dropped constraints
  6. Re-add dropped indexes

DDL Safety Features

Dry-Run Mode

Preview changes without applying them:

# Show what would be executed
epoch schema apply users --from feature-branch --dry-run

# Output:
# Dry-run mode - no changes will be applied
#
# Statements to execute:
#   1. ALTER TABLE users ADD COLUMN preferences JSONB
#   2. CREATE INDEX CONCURRENTLY idx_users_preferences ON users USING gin (preferences)
#
# Estimated impact:
#   Table size: 2.3 GB
#   Row count: 15,000,000
#   Recommended: Use --online mode for large table

Destructive Operation Warnings

The DDL generator identifies and warns about destructive operations:

epoch schema diff main feature-branch --show-ddl

# WARNING: Destructive operations detected:
#   - DROP COLUMN: users.legacy_field
#   - ALTER COLUMN TYPE: orders.amount (numeric -> integer, may lose precision)
#   - DROP INDEX: idx_users_legacy
#
# Use --force to generate DDL anyway

Destructive Operation Types:

OperationRisk LevelDescription
DROP COLUMNHighData permanently lost
DROP TABLECriticalTable and all data lost
ALTER COLUMN TYPEMediumMay fail or lose precision
DROP CONSTRAINTLowConstraint removed
DROP INDEXLowIndex removed (can rebuild)
DROP TYPEMediumEnum type removed
TRUNCATE TABLECriticalAll data removed

Lock Timeout Configuration

Prevent long-running locks on production tables:

# Set lock timeout (default: 30 seconds)
epoch schema apply users --from feature-branch --lock-timeout 10s

# Set statement timeout (default: 5 minutes)
epoch schema apply users --from feature-branch --statement-timeout 2m

Retry Logic

Configure automatic retries for transient failures:

# Retry up to 5 times with exponential backoff
epoch schema apply users --from feature-branch --retries 5

# Configure backoff
epoch schema apply users --from feature-branch \
    --retries 5 \
    --initial-delay 500ms \
    --max-delay 30s

Integration with External Tools

pt-online-schema-change (Percona Toolkit for MySQL)

While Horizon Epoch primarily targets PostgreSQL, you can generate compatible scripts for MySQL migrations:

# Generate MySQL-compatible DDL
epoch schema diff main feature-branch --dialect mysql --ddl-output migration.sql

# Use with pt-online-schema-change
pt-online-schema-change \
    --execute \
    --alter "ADD COLUMN preferences JSON" \
    D=mydb,t=users

Recommended workflow:

  1. Generate migration with Epoch
  2. Extract ALTER TABLE statements
  3. Apply using pt-online-schema-change for large tables
  4. Update Epoch commit after successful application
# After applying via pt-osc
epoch commit -m "Applied preferences column via pt-osc" --rescan-schema

pg_repack (PostgreSQL)

For table reorganization and bloat reduction:

# Identify tables needing repack
epoch schema status --check-bloat

# Output:
# Table      | Size    | Dead Tuples | Bloat % | Recommendation
# -----------|---------|-------------|---------|----------------
# users      | 2.3 GB  | 150,000     | 15%     | Consider pg_repack
# orders     | 8.1 GB  | 2,500,000   | 28%     | Recommended

# Run pg_repack
pg_repack -d mydb -t users

# Rescan schema after repack
epoch commit -m "Reorganized users table" --rescan-schema

Using pg_repack with index changes:

# Generate index DDL
epoch schema diff main feature-branch --indexes-only --ddl-output indexes.sql

# Create indexes with pg_repack (minimal locking)
pg_repack -d mydb -t users --only-indexes

# Or use CONCURRENTLY option in Epoch
epoch schema apply users --from feature-branch --online

Flyway Integration

Export migrations for Flyway:

# Generate Flyway-compatible migration
epoch schema diff main feature-branch \
    --format flyway \
    --ddl-output db/migration/V2__add_preferences.sql

# Migration file follows Flyway naming convention
cat db/migration/V2__add_preferences.sql

Liquibase Integration

Export migrations for Liquibase:

# Generate Liquibase changeset
epoch schema diff main feature-branch \
    --format liquibase \
    --ddl-output db/changelog/changes/002-add-preferences.xml

Example output:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog">
    <changeSet id="002-add-preferences" author="epoch">
        <addColumn tableName="users">
            <column name="preferences" type="JSONB"/>
        </addColumn>
        <createIndex indexName="idx_users_preferences" tableName="users">
            <column name="preferences" type="gin"/>
        </createIndex>
        <rollback>
            <dropIndex indexName="idx_users_preferences" tableName="users"/>
            <dropColumn tableName="users" columnName="preferences"/>
        </rollback>
    </changeSet>
</databaseChangeLog>

Online Mode for Production

Using CONCURRENTLY

For indexes on large tables, use online mode to avoid blocking:

# Create indexes without blocking writes
epoch schema apply users --from feature-branch --online

# This generates:
# CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
#
# Instead of:
# CREATE INDEX idx_users_email ON users (email);

Limitations of CONCURRENTLY

  • Cannot run inside a transaction
  • May take longer than regular index creation
  • Requires more resources (CPU, I/O)
  • Index must be validated after creation
# Check index validity after CONCURRENTLY creation
epoch schema status users --check-indexes

# Output:
# Index                    | Valid | Size   | Last Used
# -------------------------|-------|--------|----------
# idx_users_email          | Yes   | 45 MB  | 2 min ago
# idx_users_preferences    | No    | 0 MB   | Never

If an index is invalid:

# Reindex with CONCURRENTLY
epoch schema apply users --reindex idx_users_preferences --online

Transactional DDL

PostgreSQL supports transactional DDL, allowing atomic migrations:

# Apply migration in a transaction (default)
epoch schema apply users --from feature-branch

# Disable transaction (required for CONCURRENTLY)
epoch schema apply users --from feature-branch --no-transaction --online

What happens on failure:

# With transaction (default):
# - All statements run in one transaction
# - On failure, all changes rolled back
# - Database remains unchanged

# Without transaction (--no-transaction):
# - Statements run individually
# - On failure, partial changes may remain
# - Manual cleanup may be needed

Table Size Considerations

Small Tables (< 100MB)

Standard migration is safe:

epoch schema apply users --from feature-branch

Medium Tables (100MB - 1GB)

Use lock timeouts:

epoch schema apply users --from feature-branch \
    --lock-timeout 10s \
    --retries 3

Large Tables (1GB - 10GB)

Use online mode:

epoch schema apply users --from feature-branch \
    --online \
    --statement-timeout 30m

Very Large Tables (> 10GB)

Consider specialized tools:

# 1. Generate DDL
epoch schema diff main feature-branch --ddl-output migration.sql

# 2. Review for large table operations
epoch schema diff main feature-branch --analyze-impact

# 3. Use pg_repack or manual process
pg_repack -d mydb -t users --no-superuser-check

Python SDK Usage

import asyncio
from horizon_epoch import Client

async def generate_migration():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Get constraint diff between branches
        diff = await client.diff_constraints("main", "feature-branch", table_name="users")

# Generate migration
generator = PostgresDdlGenerator()
migration = generator.generate_migration("users", diff.constraints)

# Print statements
print("UP statements:")
for stmt in migration.up:
    print(f"  {stmt};")

print("\nDOWN statements:")
for stmt in migration.down:
    print(f"  {stmt};")

# Check for warnings
if migration.warnings:
    print("\nWARNINGS:")
    for warn in migration.warnings:
        print(f"  - {warn}")

# Write to file
migration.write_to_file("migrations/V002__update_users.sql")

# Execute with safety features
from horizon_epoch.ddl import DdlExecutor, ExecutionConfig

executor = DdlExecutor(client.connection)
config = ExecutionConfig(
    dry_run=True,
    lock_timeout_seconds=10,
    statement_timeout_seconds=300,
    online=True,
)

result = executor.execute(migration.up, config)
print(f"Executed {result.statements_run} statements")

Best Practices

1. Always Use Dry-Run First

epoch schema apply users --from feature-branch --dry-run

2. Backup Before Destructive Changes

# Export table data
epoch export users --format parquet > backup/users_$(date +%Y%m%d).parquet

# Then apply
epoch schema apply users --from feature-branch --force

3. Use Online Mode for Production

epoch schema apply users --from feature-branch --online

4. Set Appropriate Timeouts

epoch schema apply users --from feature-branch \
    --lock-timeout 10s \
    --statement-timeout 5m

5. Monitor Long-Running Migrations

# In another terminal
epoch schema apply --status

# Or query PostgreSQL directly
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE query LIKE 'ALTER%' OR query LIKE 'CREATE INDEX%';

See Also

Credential Providers

Horizon Epoch supports multiple methods for providing credentials to storage backends. This guide covers when to use each provider and how to configure them.

Overview

Credential providers abstract how secrets are retrieved:

  • Static - Hardcoded credentials (dev/testing only)
  • Environment - From environment variables
  • File - From files on disk
  • Vault - From HashiCorp Vault
  • AWS - From AWS Secrets Manager or IAM

Static Credentials

Use for development and testing only.

from horizon_epoch import Config

config = Config(
    metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
    name="mydb",
    url="postgresql://user:password@localhost/db"
)
# epoch.toml
[storage.postgres.mydb]
url = "postgresql://user:password@localhost/db"

Environment Variables

Recommended for containers and CI/CD.

export EPOCH_POSTGRES_MYDB_URL="postgresql://user:pass@host/db"
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
config.add_postgres(
    name="mydb",
    url="${EPOCH_POSTGRES_MYDB_URL}"  # Variable substitution
)
# epoch.toml - variables are substituted at runtime
[storage.postgres.mydb]
url = "${EPOCH_POSTGRES_MYDB_URL}"

File-Based Credentials

For credentials managed by external tools (e.g., Kubernetes secrets).

config.add_postgres(
    name="mydb",
    credentials_file="/run/secrets/db-password"
)
# epoch.toml
[storage.postgres.mydb]
host = "db.example.com"
database = "production"
username = "app_user"
password_file = "/run/secrets/db-password"

File format (JSON):

{
    "password": "secret-value"
}

Or plain text:

secret-value

Credential Caching

Credentials are cached to reduce external calls:

config = Config(
    credential_cache_ttl=300,  # 5 minutes
    credential_cache_max_size=100
)

For frequently rotating credentials:

config = Config(
    credential_cache_ttl=60,  # 1 minute
    credential_refresh_before_expiry=30  # Refresh 30s before expiry
)

Credential Refresh

Some providers support automatic refresh:

# AWS credentials auto-refresh when using IAM roles
config.add_s3(
    name="datalake",
    bucket="my-bucket",
    use_instance_credentials=True  # Auto-refresh enabled
)

Provider Priority

When multiple credential sources are available:

  1. Explicit configuration (URL, password)
  2. Environment variables
  3. Credential files
  4. External providers (Vault, AWS)
  5. Instance credentials (IAM roles)

Security Best Practices

1. Never Commit Credentials

# .gitignore
epoch.toml
.env
*.pem
*.key

2. Use Separate Credentials Per Environment

# epoch.dev.toml
[storage.postgres.mydb]
url = "${DEV_DATABASE_URL}"

# epoch.prod.toml
[storage.postgres.mydb]
url = "${PROD_DATABASE_URL}"

3. Rotate Credentials Regularly

Use providers that support rotation:

  • AWS Secrets Manager with rotation
  • Vault dynamic database credentials
  • Short-lived IAM credentials

4. Audit Credential Access

Enable logging for credential operations:

config = Config(
    log_credential_access=True,  # Log when credentials are accessed
    log_level="info"
)

5. Use Least Privilege

Create dedicated credentials for Horizon Epoch with only required permissions.

Troubleshooting

Credential Not Found

Error: Credential 'EPOCH_POSTGRES_MYDB_URL' not found
  • Check environment variable is set
  • Verify file path exists and is readable
  • Check Vault/AWS connectivity

Credential Expired

Error: Credential expired and refresh failed
  • Check IAM role permissions
  • Verify Vault token is valid
  • Check system clock sync (for time-based auth)

Permission Denied

Error: Permission denied reading credential file
  • Check file permissions: chmod 600 /path/to/credential
  • Verify process user has read access

Next Steps

HashiCorp Vault Integration

Integrate Horizon Epoch with HashiCorp Vault for secure credential management.

Overview

Vault integration supports:

  • Static secrets (KV v2)
  • Dynamic database credentials
  • PKI certificates for mTLS
  • Multiple authentication methods

Prerequisites

  • HashiCorp Vault server (1.x or later)
  • Vault CLI (for setup)
  • Network access from Horizon Epoch to Vault

Authentication Methods

Token Authentication

Simplest method, suitable for development:

from horizon_epoch import Config

config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_token="hvs.xxx..."
)
export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="hvs.xxx..."

AppRole Authentication

Recommended for applications:

config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_auth_method="approle",
    vault_role_id="xxx-xxx-xxx",
    vault_secret_id="${VAULT_SECRET_ID}"  # From env or file
)

Setup in Vault:

# Enable AppRole
vault auth enable approle

# Create policy
vault policy write horizon-epoch - <<EOF
path "secret/data/horizon-epoch/*" {
  capabilities = ["read"]
}
path "database/creds/horizon-epoch-*" {
  capabilities = ["read"]
}
EOF

# Create role
vault write auth/approle/role/horizon-epoch \
    token_policies="horizon-epoch" \
    token_ttl=1h \
    token_max_ttl=4h

Kubernetes Authentication

For Kubernetes deployments:

config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_auth_method="kubernetes",
    vault_role="horizon-epoch"
)

AWS IAM Authentication

For AWS environments:

config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_auth_method="aws",
    vault_role="horizon-epoch-ec2"
)

KV Secrets (Static)

Store database credentials in KV v2:

# Store secret
vault kv put secret/horizon-epoch/postgres \
    username=epoch_user \
    password=secret123
config.add_postgres(
    name="mydb",
    host="db.example.com",
    database="production",
    vault_path="secret/data/horizon-epoch/postgres"
)

Dynamic Database Credentials

Vault can generate short-lived database credentials:

# Configure database secrets engine
vault secrets enable database

vault write database/config/postgres \
    plugin_name=postgresql-database-plugin \
    allowed_roles="horizon-epoch-*" \
    connection_url="postgresql://{{username}}:{{password}}@db:5432/production" \
    username="vault" \
    password="vault-password"

vault write database/roles/horizon-epoch-readonly \
    db_name=postgres \
    creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
    default_ttl="1h" \
    max_ttl="24h"
config.add_postgres(
    name="mydb",
    host="db.example.com",
    database="production",
    vault_role="database/creds/horizon-epoch-readonly"
)

PKI Certificates

For mTLS authentication:

# Enable PKI
vault secrets enable pki

# Configure PKI (simplified)
vault write pki/root/generate/internal \
    common_name="Horizon Epoch CA"

vault write pki/roles/horizon-epoch-client \
    allowed_domains="epoch.internal" \
    allow_subdomains=true \
    max_ttl="720h"
config.add_postgres(
    name="mydb",
    host="db.example.com",
    vault_pki_role="pki/issue/horizon-epoch-client",
    sslmode="verify-full"
)

Configuration File

# epoch.toml
[vault]
addr = "https://vault.example.com:8200"
auth_method = "approle"
role_id = "xxx-xxx-xxx"
secret_id_file = "/run/secrets/vault-secret-id"

[storage.postgres.production]
host = "db.example.com"
database = "production"
vault_path = "secret/data/horizon-epoch/postgres"

[storage.postgres.readonly]
host = "db.example.com"
database = "production"
vault_role = "database/creds/horizon-epoch-readonly"

Troubleshooting

Permission Denied

Error: permission denied on path "secret/data/..."
  • Check policy is attached to token/role
  • Verify path is correct (KV v2 uses secret/data/ prefix)
  • Check Vault audit log for details

Token Expired

Error: Vault token expired
  • Token TTL may be too short
  • Enable token renewal in config
  • Use AppRole for automatic re-authentication

Connection Failed

Error: failed to connect to Vault
  • Verify VAULT_ADDR is correct
  • Check network connectivity
  • Verify TLS certificates if using HTTPS

Certificate Issues

Error: x509: certificate signed by unknown authority
config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_ca_cert="/path/to/ca.crt"
)

Best Practices

  1. Use AppRole or Kubernetes auth in production
  2. Enable audit logging in Vault
  3. Use dynamic credentials when possible
  4. Set appropriate TTLs - short enough for security, long enough for operations
  5. Rotate static secrets regularly

Next Steps

AWS Secrets Integration

Integrate Horizon Epoch with AWS Secrets Manager, IAM authentication, and other AWS credential services.

AWS Secrets Manager

Basic Configuration

from horizon_epoch import Config

config = Config(
    aws_region="us-east-1"
).add_postgres(
    name="production",
    host="mydb.cluster-xxx.us-east-1.rds.amazonaws.com",
    database="production",
    aws_secret_id="horizon-epoch/production/postgres"
)

Secret Format

Store credentials in Secrets Manager:

{
    "username": "epoch_user",
    "password": "secret-password",
    "host": "mydb.cluster-xxx.us-east-1.rds.amazonaws.com",
    "port": 5432,
    "database": "production"
}
aws secretsmanager create-secret \
    --name horizon-epoch/production/postgres \
    --secret-string '{"username":"epoch_user","password":"secret-password"}'

Secret Rotation

Enable automatic rotation:

aws secretsmanager rotate-secret \
    --secret-id horizon-epoch/production/postgres \
    --rotation-lambda-arn arn:aws:lambda:us-east-1:xxx:function:SecretsRotation \
    --rotation-rules AutomaticallyAfterDays=30

Horizon Epoch automatically uses refreshed credentials.

RDS IAM Authentication

Connect to RDS without passwords using IAM:

Prerequisites

  1. Enable IAM authentication on RDS:
aws rds modify-db-instance \
    --db-instance-identifier mydb \
    --enable-iam-database-authentication
  1. Create IAM policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "rds-db:connect",
            "Resource": "arn:aws:rds-db:us-east-1:123456789012:dbuser:mydb/epoch_user"
        }
    ]
}
  1. Create database user:
CREATE USER epoch_user WITH LOGIN;
GRANT rds_iam TO epoch_user;

Configuration

config.add_postgres(
    name="production",
    host="mydb.cluster-xxx.us-east-1.rds.amazonaws.com",
    database="production",
    username="epoch_user",
    use_iam_auth=True
)

IAM Roles for S3

Instance Credentials (EC2/ECS/Lambda)

config.add_s3(
    name="datalake",
    bucket="my-data-bucket",
    use_instance_credentials=True  # Uses instance profile
)

Assume Role

For cross-account access:

config.add_s3(
    name="partner-data",
    bucket="partner-bucket",
    assume_role_arn="arn:aws:iam::999888777666:role/HorizonEpochAccess",
    external_id="partner-external-id"  # If required
)

Role Chaining

For complex permission structures:

config.add_s3(
    name="restricted",
    bucket="restricted-bucket",
    assume_roles=[
        "arn:aws:iam::111222333444:role/JumpRole",
        "arn:aws:iam::555666777888:role/TargetRole"
    ]
)

AWS SSO / Identity Center

For development with SSO:

# Configure SSO profile
aws configure sso

# Set profile
export AWS_PROFILE=my-sso-profile

# Or in config
config = Config(
    aws_profile="my-sso-profile"
)

Configuration File

# epoch.toml
[aws]
region = "us-east-1"
profile = "production"  # Optional: use specific profile

[storage.postgres.production]
host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com"
database = "production"
aws_secret_id = "horizon-epoch/production/postgres"

[storage.postgres.analytics]
host = "analytics.cluster-xxx.us-east-1.rds.amazonaws.com"
database = "analytics"
username = "epoch_user"
use_iam_auth = true

[storage.s3.datalake]
bucket = "company-datalake"
assume_role_arn = "arn:aws:iam::123456789012:role/DataLakeAccess"

Required IAM Permissions

Secrets Manager Access

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret"
            ],
            "Resource": "arn:aws:secretsmanager:us-east-1:*:secret:horizon-epoch/*"
        }
    ]
}

S3 Access

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        }
    ]
}

Assume Role

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::*:role/HorizonEpoch*"
        }
    ]
}

Troubleshooting

Access Denied (Secrets Manager)

Error: AccessDeniedException when calling GetSecretValue
  • Check IAM policy allows secretsmanager:GetSecretValue
  • Verify secret ARN matches policy
  • Check if secret has resource-based policy

IAM Auth Token Failed

Error: PAM authentication failed for user
  • Verify IAM authentication is enabled on RDS
  • Check user has rds_iam role granted
  • Verify IAM policy has correct resource ARN

Credentials Expired

Error: ExpiredTokenException
  • Instance credentials refresh automatically
  • For assumed roles, check session duration
  • For SSO, run aws sso login

Best Practices

  1. Use IAM authentication over passwords when possible
  2. Enable secret rotation in Secrets Manager
  3. Use instance roles instead of access keys
  4. Scope permissions narrowly
  5. Enable CloudTrail for audit logging

Next Steps

SSH Tunnels

Connect to databases through SSH tunnels for secure access to private networks.

When to Use SSH Tunnels

  • Database is in a private network (no direct access)
  • Connecting through a bastion/jump host
  • Security policy requires encrypted connections
  • Accessing on-premises databases from cloud

Basic SSH Tunnel

from horizon_epoch import Config

config = Config(
    metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
    name="production",
    host="internal-db.private",  # Internal hostname
    database="production",
    username="epoch_user",
    password="secret",
    ssh_tunnel={
        "host": "bastion.example.com",
        "user": "ubuntu",
        "key_file": "~/.ssh/id_rsa"
    }
)

Configuration Options

SSH Key Authentication

ssh_tunnel={
    "host": "bastion.example.com",
    "port": 22,  # Default
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "key_passphrase": "${SSH_KEY_PASSPHRASE}"  # If key is encrypted
}

SSH Agent

Use keys loaded in SSH agent:

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "use_agent": True
}

SSH Certificate Authentication

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "cert_file": "~/.ssh/id_rsa-cert.pub"
}

Jump Hosts (ProxyJump)

For multi-hop connections:

ssh_tunnel={
    "host": "internal-bastion.private",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "jump_host": {
        "host": "external-bastion.example.com",
        "user": "ubuntu",
        "key_file": "~/.ssh/id_rsa"
    }
}

CLI Configuration

# Environment variables
export EPOCH_SSH_HOST="bastion.example.com"
export EPOCH_SSH_USER="ubuntu"
export EPOCH_SSH_KEY_FILE="~/.ssh/id_rsa"
# epoch.toml
[storage.postgres.production]
host = "internal-db.private"
database = "production"
username = "epoch_user"
password_file = "/run/secrets/db-password"

[storage.postgres.production.ssh_tunnel]
host = "bastion.example.com"
user = "ubuntu"
key_file = "~/.ssh/id_rsa"

Host Key Verification

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "known_hosts_file": "~/.ssh/known_hosts",
    "strict_host_key_checking": True
}

Accept on First Use

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "strict_host_key_checking": "accept-new"
}
# Only for testing/development
ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "strict_host_key_checking": False
}

Connection Management

Keep-Alive

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "keepalive_interval": 60,  # Send keepalive every 60 seconds
    "keepalive_count_max": 3   # Disconnect after 3 missed keepalives
}

Auto-Reconnection

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "auto_reconnect": True,
    "reconnect_delay": 5,      # Wait 5 seconds between attempts
    "max_reconnect_attempts": 3
}

Connection Timeout

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "connect_timeout": 30  # 30 second timeout
}

Multiple Tunnels

For accessing multiple databases:

config.add_postgres(
    name="prod_users",
    host="users-db.private",
    database="users",
    ssh_tunnel=bastion_config
).add_postgres(
    name="prod_orders",
    host="orders-db.private",
    database="orders",
    ssh_tunnel=bastion_config  # Same tunnel reused
)

Troubleshooting

Connection Refused

Error: ssh: connect to host bastion.example.com port 22: Connection refused
  • Verify bastion host is running
  • Check security groups/firewall allow port 22
  • Verify DNS resolution

Authentication Failed

Error: ssh: handshake failed: ssh: unable to authenticate
  • Verify SSH key is correct
  • Check key has correct permissions: chmod 600 ~/.ssh/id_rsa
  • Verify user exists on bastion
  • Try connecting manually: ssh -i ~/.ssh/id_rsa ubuntu@bastion.example.com

Host Key Verification Failed

Error: Host key verification failed
  • Add host to known_hosts: ssh-keyscan bastion.example.com >> ~/.ssh/known_hosts
  • Or use strict_host_key_checking: "accept-new"

Tunnel Connection Lost

Error: SSH tunnel disconnected
  • Enable keepalive settings
  • Enable auto-reconnect
  • Check bastion stability
  • Review SSH server ClientAliveInterval setting

Slow Connection

  • Use connection pooling (tunnel is reused)
  • Consider running Horizon Epoch closer to the bastion
  • Check network latency between client and bastion

Security Best Practices

  1. Use key-based authentication - Never use passwords
  2. Protect private keys - Use encrypted keys, proper permissions
  3. Verify host keys - Enable strict checking in production
  4. Limit bastion access - Restrict who can SSH to bastion
  5. Audit connections - Enable logging on bastion host
  6. Rotate keys regularly - Use short-lived certificates when possible

Next Steps

mTLS Authentication

Configure mutual TLS (client certificate) authentication for database connections.

Overview

mTLS provides:

  • Strong authentication via X.509 certificates
  • Encrypted connections (TLS)
  • No password management required
  • Certificate-based identity

Prerequisites

  • TLS-enabled database server
  • Client certificate and key
  • CA certificate (for verification)

Basic Configuration

from horizon_epoch import Config

config = Config(
    metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
    name="production",
    host="db.example.com",
    database="production",
    username="epoch_user",
    sslmode="verify-full",
    ssl_cert="/path/to/client.crt",
    ssl_key="/path/to/client.key",
    ssl_rootcert="/path/to/ca.crt"
)

TLS Modes

ModeServer CertClient CertDescription
disableNoNoUnencrypted connection
requireNoNoEncrypted, no verification
verify-caYesOptionalVerify server cert against CA
verify-fullYesOptionalVerify cert + hostname

For mTLS, use verify-full with client certificates.

Certificate Files

Client Certificate

PEM-encoded X.509 certificate:

-----BEGIN CERTIFICATE-----
MIIDXTCCAkWgAwIBAgIJAJC1HiIAZAiU...
-----END CERTIFICATE-----

Client Key

PEM-encoded private key:

-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA7Zq7k...
-----END RSA PRIVATE KEY-----

Or encrypted key:

-----BEGIN ENCRYPTED PRIVATE KEY-----
MIIFHzBJBgkqhkiG9w0BBQ0wPDA...
-----END ENCRYPTED PRIVATE KEY-----

CA Certificate

For verifying server certificate:

-----BEGIN CERTIFICATE-----
MIIDrzCCApegAwIBAgIQCDvg...
-----END CERTIFICATE-----

Configuration File

# epoch.toml
[storage.postgres.production]
host = "db.example.com"
database = "production"
username = "epoch_user"
sslmode = "verify-full"
ssl_cert = "/etc/epoch/certs/client.crt"
ssl_key = "/etc/epoch/certs/client.key"
ssl_rootcert = "/etc/epoch/certs/ca.crt"

Using Vault PKI

Dynamically issue certificates from Vault:

config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_token="${VAULT_TOKEN}"
).add_postgres(
    name="production",
    host="db.example.com",
    database="production",
    username="epoch_user",
    sslmode="verify-full",
    vault_pki_role="pki/issue/epoch-client",
    ssl_rootcert="/etc/epoch/certs/ca.crt"
)

Certificates are automatically renewed before expiry.

PostgreSQL Server Setup

Enable SSL on PostgreSQL server:

# postgresql.conf
ssl = on
ssl_cert_file = '/var/lib/postgresql/server.crt'
ssl_key_file = '/var/lib/postgresql/server.key'
ssl_ca_file = '/var/lib/postgresql/ca.crt'

Configure client certificate authentication:

# pg_hba.conf
# TYPE  DATABASE  USER         ADDRESS        METHOD
hostssl all       epoch_user   0.0.0.0/0      cert clientcert=verify-full

Create user with certificate CN:

CREATE USER epoch_user;
-- The CN in the client cert must match the username

Encrypted Private Keys

If your key is password-protected:

config.add_postgres(
    name="production",
    host="db.example.com",
    ssl_key="/path/to/encrypted.key",
    ssl_key_password="${SSL_KEY_PASSWORD}"
)

Certificate Rotation

Manual Rotation

  1. Generate new certificate
  2. Update configuration
  3. Restart/reload application

Automatic with Vault

Certificates are automatically rotated:

config = Config(
    vault_pki_role="pki/issue/epoch-client",
    cert_renewal_threshold=0.7  # Renew when 70% of TTL elapsed
)

Troubleshooting

Certificate Expired

Error: SSL error: certificate has expired
  • Check certificate expiry: openssl x509 -enddate -noout -in client.crt
  • Renew certificate
  • If using Vault, check renewal is working

Certificate Verification Failed

Error: SSL error: certificate verify failed
  • Verify CA certificate is correct
  • Check certificate chain is complete
  • Verify server hostname matches certificate

Permission Denied on Key File

Error: could not load private key file
  • Check file permissions: chmod 600 client.key
  • Verify file is readable by process user

Key Doesn’t Match Certificate

Error: key values mismatch
  • Verify key matches certificate:
    openssl x509 -noout -modulus -in client.crt | md5sum
    openssl rsa -noout -modulus -in client.key | md5sum
    

Wrong Password for Encrypted Key

Error: bad decrypt
  • Verify password is correct
  • Check key file format (PKCS#8 vs traditional)

Security Best Practices

  1. Protect private keys - Use encrypted keys, restrict file permissions
  2. Use short-lived certificates - Rotate frequently
  3. Verify server certificates - Always use verify-full
  4. Audit certificate usage - Log certificate fingerprints
  5. Use separate certificates - Don’t share between environments

CLI Usage

# Using environment variables
export EPOCH_SSL_CERT="/path/to/client.crt"
export EPOCH_SSL_KEY="/path/to/client.key"
export EPOCH_SSL_ROOTCERT="/path/to/ca.crt"

epoch init my-repo \
    --metadata-url "postgresql://user@db.example.com/horizon_epoch?sslmode=verify-full"

Next Steps

CLI Reference

Complete reference for the epoch command-line interface.

Global Options

OptionDescription
-C, --directory <PATH>Run as if started in PATH
--format <FORMAT>Output format: text, json
-q, --quietSuppress non-essential output
-v, --verboseEnable verbose output
--config <FILE>Path to config file
--versionShow version
--helpShow help

Commands

epoch init

Initialize a new repository.

epoch init <NAME> [OPTIONS]

Arguments:

  • <NAME> - Repository name (required)

Options:

  • -d, --description <DESC> - Repository description
  • -b, --default-branch <BRANCH> - Default branch name (default: main)
  • --metadata-url <URL> - Metadata database URL

Examples:

epoch init my-repo
epoch init my-repo --description "Production data"
epoch init my-repo --metadata-url "postgresql://localhost/horizon_epoch"

epoch branch

Manage branches.

epoch branch [SUBCOMMAND]

Subcommands:

epoch branch list

List all branches.

epoch branch list
epoch branch list --format json

epoch branch create

Create a new branch.

epoch branch create <NAME> [--from <SOURCE>]

Options:

  • --from <SOURCE> - Source branch (default: current branch)

Examples:

epoch branch create feature-x
epoch branch create hotfix --from production

epoch branch delete

Delete a branch.

epoch branch delete <NAME> [--force]

Options:

  • -f, --force - Force delete even if unmerged

epoch checkout

Switch to a branch.

epoch checkout <BRANCH>

Examples:

epoch checkout main
epoch checkout feature/new-schema

epoch commit

Create a new commit.

epoch commit -m <MESSAGE> [OPTIONS]

Options:

  • -m, --message <MSG> - Commit message (required)
  • -a, --all - Commit all changed tables
  • --tables <TABLES> - Comma-separated list of tables to commit
  • --author <NAME> - Override author name
  • --email <EMAIL> - Override author email

Examples:

epoch commit -m "Add user preferences"
epoch commit -m "Update orders" --tables orders,order_items

epoch merge

Merge a branch into current branch.

epoch merge <BRANCH> [OPTIONS]

Options:

  • --strategy <STRATEGY> - Merge strategy: default, ours, theirs
  • --no-commit - Merge but don’t commit
  • --message <MSG> - Custom merge commit message
  • --continue - Continue after conflict resolution
  • --abort - Abort in-progress merge

Examples:

epoch merge feature/updates
epoch merge feature/updates --strategy theirs
epoch merge --continue
epoch merge --abort

epoch diff

Show differences between branches or commits.

epoch diff [SOURCE] [TARGET] [OPTIONS]

Arguments:

  • <SOURCE> - Source branch/commit (default: HEAD)
  • <TARGET> - Target branch/commit

Options:

  • --table <TABLE> - Show diff for specific table
  • --stat - Show statistics only
  • --name-only - Show only changed table names

Examples:

epoch diff                           # Uncommitted changes
epoch diff main                      # Current branch vs main
epoch diff main feature/updates      # Between two branches
epoch diff abc123f def456g           # Between commits
epoch diff --stat                    # Summary only

epoch log

Show commit history.

epoch log [OPTIONS]

Options:

  • -n, --max-count <N> - Limit output to N commits
  • --oneline - One line per commit
  • --since <DATE> - Show commits since date
  • --until <DATE> - Show commits until date
  • --author <PATTERN> - Filter by author
  • --table <TABLE> - Show commits affecting table

Examples:

epoch log
epoch log --oneline -10
epoch log --since "2024-01-01"
epoch log --table users

epoch status

Show working tree status.

epoch status [OPTIONS]

Options:

  • --short - Short format
  • --porcelain - Machine-readable format

Examples:

epoch status
epoch status --short

epoch table

Manage tables.

epoch table [SUBCOMMAND]

epoch table add

Register a table for tracking.

epoch table add <NAME> --location <LOCATION> [OPTIONS]

Options:

  • --location <URL> - Table location (required)
  • --primary-key <COLS> - Primary key columns
  • --storage <NAME> - Storage backend name (alternative to location URL)

Location Formats by Backend:

BackendFormatExample
PostgreSQLpostgresql://[user:pass@]host[:port]/database/schema.tablepostgresql://localhost/mydb/public.users
MySQLmysql://[user:pass@]host[:port]/database/tablemysql://localhost/analytics/events
SQL Servermssql://[user:pass@]host[:port]/database/schema.tablemssql://sqlserver/warehouse/dbo.orders
SQLitesqlite:///path/to/file.db/tablesqlite:///data/local.db/users
S3s3://bucket/prefix/paths3://datalake/delta/events
Azure Blobazure://account/container/prefixazure://myaccount/data/tables/users
GCSgs://bucket/prefix/pathgs://my-bucket/delta/products
Localfile:///path/to/directoryfile:///data/parquet/users

Examples:

# PostgreSQL
epoch table add users --location "postgresql://localhost/mydb/public.users" \
    --primary-key id

# MySQL
epoch table add orders --location "mysql://mysql-host/shop/orders" \
    --primary-key order_id

# SQL Server
epoch table add inventory --location "mssql://sqlserver/warehouse/dbo.inventory" \
    --primary-key sku

# SQLite
epoch table add config --location "sqlite:///data/app.db/config" \
    --primary-key key

# S3/Delta Lake
epoch table add events --location "s3://datalake/delta/events" \
    --primary-key event_id

# Azure Blob
epoch table add telemetry --location "azure://myaccount/data/telemetry" \
    --primary-key id

# GCS
epoch table add products --location "gs://my-bucket/delta/products" \
    --primary-key product_id

# Local filesystem
epoch table add local_data --location "file:///data/parquet/sales" \
    --primary-key transaction_id

# Using storage backend name (when storage is already registered)
epoch table add users --storage prod-pg --primary-key id

epoch table list

List registered tables.

epoch table list [--format json]

epoch table remove

Unregister a table.

epoch table remove <NAME> [--force]

epoch tag

Manage tags.

epoch tag [SUBCOMMAND]

epoch tag create

Create a tag.

epoch tag <NAME> [COMMIT] [OPTIONS]

Options:

  • -m, --message <MSG> - Tag message

Examples:

epoch tag v1.0.0
epoch tag v1.0.0 abc123f -m "Release 1.0.0"

epoch tag list

List all tags.

epoch tag list

epoch tag delete

Delete a tag.

epoch tag delete <NAME>

epoch reset

Reset current branch to a commit.

epoch reset [OPTIONS] <COMMIT>

Options:

  • --soft - Keep changes staged
  • --hard - Discard all changes

Examples:

epoch reset --soft HEAD~1
epoch reset --hard v1.0.0

epoch revert

Create a commit that undoes a previous commit.

epoch revert <COMMIT> [OPTIONS]

Options:

  • --no-commit - Revert but don’t commit

Examples:

epoch revert abc123f
epoch revert HEAD

epoch stash

Stash uncommitted changes.

epoch stash [SUBCOMMAND]

epoch stash push

Save changes to stash.

epoch stash push [-m <MESSAGE>]

epoch stash pop

Apply and remove latest stash.

epoch stash pop [<STASH>]

epoch stash list

List stashes.

epoch stash list

epoch storage

Manage storage backends.

epoch storage [SUBCOMMAND]

epoch storage add

Register a new storage backend.

epoch storage add <NAME> --type <TYPE> [OPTIONS]

Arguments:

  • <NAME> - Unique name for the storage backend (required)

Options:

  • --type <TYPE> - Backend type (required): postgresql, mysql, mssql, sqlite, s3, azureblob, gcs, local
  • --description <DESC> - Optional description
  • --set-default - Set as the default storage backend

PostgreSQL/MySQL/MSSQL Options:

  • --host <HOST> - Database host (required)
  • --port <PORT> - Database port (PostgreSQL: 5432, MySQL: 3306, MSSQL: 1433)
  • --database <DB> - Database name (required)
  • --schema <SCHEMA> - Schema name (PostgreSQL only)
  • --username <USER> - Username
  • --ssl-mode <MODE> - SSL mode

SQLite Options:

  • --database <PATH> - Database file path (required)

S3 Options:

  • --bucket <BUCKET> - Bucket name (required)
  • --region <REGION> - AWS region
  • --endpoint <URL> - Custom endpoint (for MinIO, etc.)
  • --path-prefix <PREFIX> - Key prefix
  • --path-style - Use path-style URLs

Azure Blob Options:

  • --account <ACCOUNT> - Storage account name (required)
  • --container <CONTAINER> - Container name (required)
  • --path-prefix <PREFIX> - Blob prefix

GCS Options:

  • --bucket <BUCKET> - Bucket name (required)
  • --project <PROJECT> - GCP project ID
  • --path-prefix <PREFIX> - Object prefix

Local Filesystem Options:

  • --base-path <PATH> - Base directory path (required)

Credential Options:

  • --credentials <TYPE> - Credential resolution: environment, aws-profile, gcp-service-account, azure-managed-identity, vault
  • --profile <NAME> - AWS profile name (with aws-profile)
  • --key-file <PATH> - GCP service account key file (with gcp-service-account)
  • --client-id <ID> - Azure client ID (with azure-managed-identity)
  • --vault-path <PATH> - Vault secret path (with vault)
  • --vault-role <ROLE> - Vault role name (with vault)

Examples:

# PostgreSQL
epoch storage add prod-pg --type postgresql \
    --host db.example.com --database production

# MySQL
epoch storage add analytics-mysql --type mysql \
    --host mysql.example.com --port 3306 --database analytics

# SQL Server
epoch storage add warehouse-mssql --type mssql \
    --host sqlserver.example.com --database warehouse

# SQLite
epoch storage add local-sqlite --type sqlite --database ./data/local.db

# S3
epoch storage add datalake --type s3 \
    --bucket company-datalake --region us-west-2

# S3-compatible (MinIO)
epoch storage add dev-minio --type s3 \
    --bucket local-data --endpoint http://localhost:9000 --path-style

# Azure Blob
epoch storage add azure-store --type azureblob \
    --account myaccount --container data

# GCS
epoch storage add gcs-store --type gcs \
    --bucket my-bucket --project my-project

# Local Filesystem
epoch storage add local-fs --type local --base-path /data/epoch

# With credentials
epoch storage add prod-s3 --type s3 --bucket prod-data \
    --credentials aws-profile --profile production

epoch storage list

List configured storage backends.

epoch storage list [--type <TYPE>] [--format json]

Options:

  • --type <TYPE> - Filter by backend type

epoch storage show

Show detailed information about a storage backend.

epoch storage show <NAME>

epoch storage update

Update a storage backend configuration.

epoch storage update <NAME> [OPTIONS]

Options:

  • --description <DESC> - Update description
  • --disable - Disable the backend
  • --enable - Enable the backend

epoch storage remove

Remove a storage backend.

epoch storage remove <NAME> [--force]

epoch storage default

Manage the default storage backend.

epoch storage default set <NAME>   # Set default
epoch storage default show         # Show current default
epoch storage default clear        # Clear default

epoch storage test-credentials

Test credential provider for a storage backend.

epoch storage test-credentials <NAME>
epoch storage test-credentials --all

epoch storage credentials

Show credential status for storage backends.

epoch storage credentials <NAME>
epoch storage credentials --all

epoch storage tunnels

Show SSH tunnel status for storage backends.

epoch storage tunnels [NAME]

Schema Commands

epoch schema show

Display schema for a table at a specific commit or branch.

epoch schema show <TABLE> [OPTIONS]

Arguments:

  • <TABLE> - Table name (required)

Options:

  • --at <REF> - Show schema at specific commit, branch, or tag
  • --live - Show live database schema (not stored snapshot)
  • --generate-ddl - Output as PostgreSQL DDL
  • --format <FORMAT> - Output format: text, json

Examples:

# Show current schema
epoch schema show users

# Show schema at specific commit
epoch schema show users --at abc123f

# Show schema on a branch
epoch schema show users --at feature/new-fields

# Compare with live database
epoch schema show users --live

# Generate DDL from schema
epoch schema show users --generate-ddl

epoch schema diff

Compare schemas between two refs and optionally generate migration DDL.

epoch schema diff <TABLE> <SOURCE> <TARGET> [OPTIONS]

Arguments:

  • <TABLE> - Table name (required)
  • <SOURCE> - Source ref (branch, commit, or tag)
  • <TARGET> - Target ref (branch, commit, or tag)

Options:

  • --show-ddl - Show DDL migration statements
  • --ddl-output <FILE> - Write DDL to file
  • --format <FORMAT> - Output format: text, json, flyway, liquibase
  • --columns-only - Show only column changes
  • --constraints-only - Show only constraint changes
  • --indexes-only - Show only index changes

Examples:

# Compare schemas between branches
epoch schema diff users main feature/updates

# Generate migration DDL
epoch schema diff users main feature/updates --show-ddl

# Write DDL to file
epoch schema diff users main feature/updates --ddl-output migrations/V2.sql

# Generate Flyway-compatible migration
epoch schema diff users main feature/updates --format flyway --ddl-output db/migration/V2__update_users.sql

epoch schema apply

Apply schema changes from a ref to the live database.

epoch schema apply <TABLE> --from <REF> [OPTIONS]

Arguments:

  • <TABLE> - Table name (required)

Options:

  • --from <REF> - Source ref with desired schema (required)
  • --dry-run - Show what would be executed without applying
  • --force - Allow destructive operations
  • --online - Use CONCURRENTLY for index operations
  • --lock-timeout <DURATION> - Lock timeout (default: 30s)
  • --statement-timeout <DURATION> - Statement timeout (default: 5m)
  • --retries <N> - Number of retries on failure (default: 3)
  • --no-transaction - Run statements individually (required for CONCURRENTLY)

Examples:

# Preview changes (dry-run)
epoch schema apply users --from feature/updates --dry-run

# Apply schema changes
epoch schema apply users --from feature/updates

# Apply with online mode for large tables
epoch schema apply users --from feature/updates --online

# Apply with custom timeouts
epoch schema apply users --from feature/updates \
    --lock-timeout 10s \
    --statement-timeout 10m

# Force destructive changes (after backup)
epoch schema apply users --from feature/updates --force

epoch schema status

Show schema drift status for tables.

epoch schema status [TABLES...] [OPTIONS]

Arguments:

  • [TABLES...] - Specific tables to check (optional, all if omitted)

Options:

  • --check-bloat - Also check table bloat
  • --check-indexes - Check index validity
  • --format <FORMAT> - Output format: text, json

Examples:

# Check all tables
epoch schema status

# Check specific tables
epoch schema status users orders

# Check with bloat analysis
epoch schema status --check-bloat

# JSON output for automation
epoch schema status --format json

Output:

Table      | Latest Schema | Historical | Live DB  | Status
-----------|---------------|------------|----------|--------
users      | Yes           | Yes        | Synced   | OK
orders     | Yes           | Partial    | Synced   | OK
products   | Yes           | No         | Drifted  | Needs sync
events     | No            | No         | N/A      | S3 table

epoch migrate-schemas

Backfill schema data for existing commits.

epoch migrate-schemas [OPTIONS]

Options:

  • --backfill - Backfill missing schemas from live database
  • --all - Process all tables
  • --table <TABLE> - Process specific table
  • --since <DATE> - Only backfill commits since date
  • --skip-inaccessible - Skip tables that can’t be introspected
  • --import-from <TOOL> - Import from external tool (flyway, liquibase)
  • --dry-run - Show what would be done

Examples:

# Backfill all tables (dry-run)
epoch migrate-schemas --backfill --all --dry-run

# Backfill specific table
epoch migrate-schemas --backfill --table users

# Backfill since a date
epoch migrate-schemas --backfill --all --since "2024-01-01"

# Import from Flyway migrations
epoch migrate-schemas --import-from flyway ./db/migration

epoch conflicts

Manage merge conflicts.

epoch conflicts [SUBCOMMAND]

epoch conflicts show

Show current conflicts.

epoch conflicts show [--format json]

epoch conflicts resolve

Resolve conflicts.

epoch conflicts resolve [OPTIONS]

Options:

  • --interactive - Interactive resolution
  • --ours - Accept our version
  • --theirs - Accept their version
  • --table <TABLE> - Resolve specific table
  • --record <ID> - Resolve specific record
  • --field <FIELD> - Resolve specific field
  • --value <VALUE> - Set specific value

epoch query

Run a query against the repository.

epoch query <SQL> [OPTIONS]

Options:

  • --branch <BRANCH> - Query specific branch
  • --at <REF> - Query at commit/tag/time
  • --format <FMT> - Output format: table, json, csv

Examples:

epoch query "SELECT * FROM users"
epoch query "SELECT COUNT(*) FROM orders" --branch staging
epoch query "SELECT * FROM users" --at v1.0.0 --format json

epoch local

Manage local development environment.

epoch local [SUBCOMMAND]

epoch local start

Start local services.

epoch local start

epoch local stop

Stop local services.

epoch local stop

epoch local status

Show local service status.

epoch local status

Configuration File

~/.epoch/config.toml:

[defaults]
format = "text"
author_name = "Your Name"
author_email = "you@example.com"

[metadata]
url = "postgresql://localhost/horizon_epoch"

Environment Variables

VariableDescription
EPOCH_METADATA_URLMetadata database URL
EPOCH_CONFIG_FILEConfig file path
EPOCH_LOG_LEVELLog level (debug, info, warn, error)
EPOCH_FORMATDefault output format

Python SDK Reference

Complete reference for the Horizon Epoch Python SDK.

Installation

The Python SDK requires building from source (see Installation Guide).

# From project root
cd python && uv sync
uv run maturin develop --manifest-path ../crates/horizon-epoch-py/Cargo.toml

Quick Start

from horizon_epoch import Client, ClientConfig, Author

# Create client configuration
config = ClientConfig(
    metadata_url="postgresql://localhost/horizon_epoch"
)

# Use async context manager for connection management
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
    # Initialize a repository
    await client.init("my-data-repo")

    # Create a branch
    await client.branch("feature-x")

    # Make changes and commit
    await client.commit("Initial commit")

Classes

ClientConfig

Configuration for the Horizon Epoch client.

from horizon_epoch import ClientConfig, Author

config = ClientConfig(
    metadata_url="postgresql://user:pass@localhost/horizon_epoch",
    default_branch="main",
    author=Author(name="Dev", email="dev@example.com"),
    timeout_seconds=30.0,
    retry_count=3,
    log_level="INFO",
)
ParameterTypeDefaultDescription
metadata_urlstrrequiredMetadata database URL
default_branchstr"main"Default branch name
authorAuthorNoneDefault commit author
timeout_secondsfloat30.0Operation timeout
retry_countint3Retry count for transient failures
log_levelstr"INFO"Logging level

Client

Main client for interacting with Horizon Epoch.

Connection Management

from horizon_epoch import Client

# Option 1: Async context manager (recommended)
async with Client.connect("postgresql://...") as client:
    await client.init("my-repo")

# Option 2: Manual connection management
client = Client.from_url("postgresql://...")
await client.open()
try:
    await client.init("my-repo")
finally:
    await client.close()

Repository Operations

# Initialize a new repository
config = await client.init(
    name="my-repo",
    storage_configs={"primary": {"backend": "postgresql", "url": "..."}},
    description="My data repository"
)

# Open an existing repository
config = await client.open_repository("my-repo")

# Get repository status
status = await client.status(branch="main")

Branch Operations

# Create a branch
branch = await client.branch(
    name="feature/new-schema",
    start_point="main",  # Optional, defaults to default branch
    checkout=True        # Optional, defaults to True
)

# List all branches
branches = await client.branches()

# Get a specific branch
branch = await client.get_branch("main")

# Checkout a branch
status = await client.checkout("feature/new-schema")

# Delete a branch
await client.delete_branch("old-feature", force=False)

# Compare branches
comparison = await client.compare_branches("feature-x", "main")
print(f"Ahead: {comparison.ahead_count}, Behind: {comparison.behind_count}")

Commit Operations

from horizon_epoch import Author

# Create a commit
commit = await client.commit(
    message="Add user preferences",
    author=Author(name="Dev", email="dev@example.com"),
    branch="feature-x",      # Optional
    tables=["users"]         # Optional, defaults to all staged
)

# View commit history
log = await client.log(ref="main", limit=10, skip=0)
for entry in log.entries:
    print(f"{entry.commit.short_id}: {entry.commit.message}")

# Get commit details
commit = await client.show("abc123")

Diff Operations

# Diff between branches
diff = await client.diff(
    base="main",
    target="feature-x",
    tables=["users"],           # Optional
    include_record_diff=True    # Optional
)

for table_diff in diff.table_diffs:
    print(f"{table_diff.table_name}: {table_diff.total_changes} changes")

Merge Operations

from horizon_epoch import MergeStrategy

# Merge a branch
result = await client.merge(
    source="feature-x",
    target="main",                      # Optional, defaults to default branch
    strategy=MergeStrategy.THREE_WAY,   # Optional
    message="Merge feature-x",          # Optional
    dry_run=False                       # Optional
)

if result.has_conflicts:
    print(f"Conflicts: {result.conflict_count}")
else:
    print(f"Merged: {result.result_commit_id}")

# Abort a merge
await client.abort_merge(conflict_id="...")

Storage Operations

Horizon Epoch supports 8 storage backend types across databases, object storage, and filesystems.

from horizon_epoch import StorageBackend

# =============================================================================
# DATABASE BACKENDS
# =============================================================================

# PostgreSQL
backend_id = await client.add_storage(
    name="postgres-warehouse",
    backend=StorageBackend.POSTGRESQL,
    config={"url": "postgresql://user:pass@localhost:5432/mydb"},
    description="PostgreSQL data warehouse"
)

# MySQL
backend_id = await client.add_storage(
    name="mysql-analytics",
    backend=StorageBackend.MYSQL,
    config={"url": "mysql://user:pass@localhost:3306/analytics"},
    description="MySQL analytics database"
)

# Microsoft SQL Server
backend_id = await client.add_storage(
    name="mssql-legacy",
    backend=StorageBackend.MSSQL,
    config={"url": "mssql://user:pass@localhost:1433/legacy_db"},
    description="SQL Server legacy system"
)

# SQLite
backend_id = await client.add_storage(
    name="sqlite-local",
    backend=StorageBackend.SQLITE,
    config={"path": "/data/local.db"},
    description="Local SQLite database"
)

# =============================================================================
# OBJECT STORAGE BACKENDS
# =============================================================================

# AWS S3 / S3-compatible storage
backend_id = await client.add_storage(
    name="s3-datalake",
    backend=StorageBackend.S3,
    config={
        "bucket": "my-datalake",
        "region": "us-west-2",
        "endpoint": None,  # Use AWS default, or specify for MinIO/LocalStack
    },
    description="S3 data lake"
)

# Azure Blob Storage
backend_id = await client.add_storage(
    name="azure-warehouse",
    backend=StorageBackend.AZURE,
    config={
        "account": "mystorageaccount",
        "container": "data-warehouse",
    },
    description="Azure Blob data warehouse"
)

# Google Cloud Storage
backend_id = await client.add_storage(
    name="gcs-analytics",
    backend=StorageBackend.GCS,
    config={
        "bucket": "my-analytics-bucket",
        "project": "my-gcp-project",
    },
    description="GCS analytics storage"
)

# =============================================================================
# FILESYSTEM BACKEND
# =============================================================================

# Local filesystem
backend_id = await client.add_storage(
    name="local-dev",
    backend=StorageBackend.LOCAL,
    config={"path": "/data/epoch-storage"},
    description="Local development storage"
)

# =============================================================================
# STORAGE MANAGEMENT
# =============================================================================

# List all storage backends
backends = await client.list_storage()
for name, backend_type in backends.items():
    print(f"{name}: {backend_type.value}")

# Get storage details
config = await client.get_storage("s3-datalake")
print(f"Backend: {config.name}, Type: {config.backend_type}")

# Get detailed storage info
backends_info = await client.list_storage_info()
for backend in backends_info:
    status = "default" if backend.is_default else "enabled"
    print(f"{backend.name} ({backend.backend_type}): {status}")

# Remove storage backend
await client.remove_storage("old-storage")

Table Operations

Tables are tracked using StorageLocation objects that specify where the data lives. The native module provides factory methods for all 8 storage backends.

# Access native module for StorageLocation
from horizon_epoch.client import _native

# =============================================================================
# DATABASE STORAGE LOCATIONS
# =============================================================================

# PostgreSQL: connection_name, schema, table
loc = _native.StorageLocation.postgresql("main", "public", "users")
reg = await client.track_table("users", loc)

# MySQL: connection_name, database, table
loc = _native.StorageLocation.mysql("analytics", "sales_db", "orders")
reg = await client.track_table("orders", loc)

# Microsoft SQL Server: connection_name, database, schema, table
loc = _native.StorageLocation.mssql("legacy", "warehouse", "dbo", "customers")
reg = await client.track_table("customers", loc)

# SQLite: connection_name, table
loc = _native.StorageLocation.sqlite("local-db", "products")
reg = await client.track_table("products", loc)

# =============================================================================
# OBJECT STORAGE LOCATIONS
# =============================================================================

# S3: bucket, prefix, format (parquet, delta, csv)
loc = _native.StorageLocation.s3("my-datalake", "tables/events/", format="parquet")
reg = await client.track_table("events", loc)

# S3 with Delta Lake format
loc = _native.StorageLocation.s3("my-datalake", "delta/transactions/", format="delta")
reg = await client.track_table("transactions", loc)

# Azure Blob Storage: account, container, prefix, format
loc = _native.StorageLocation.azure_blob(
    "mystorageaccount", "warehouse", "data/metrics/", format="parquet"
)
reg = await client.track_table("metrics", loc)

# Google Cloud Storage: bucket, prefix, format
loc = _native.StorageLocation.gcs("my-analytics", "tables/sessions/", format="parquet")
reg = await client.track_table("sessions", loc)

# =============================================================================
# FILESYSTEM STORAGE LOCATIONS
# =============================================================================

# Local filesystem: path, format
loc = _native.StorageLocation.local_filesystem("/data/exports/reports/", format="parquet")
reg = await client.track_table("reports", loc)

# =============================================================================
# TABLE MANAGEMENT
# =============================================================================

# View registration result
print(f"Tracked: {reg.table_name}")
print(f"Content hash: {reg.content_hash}")
print(f"Record count: {reg.record_count}")
print(f"Location: {reg.storage_location_uri}")

# List all tracked tables
tables = await client.list_tables()
for table in tables:
    print(f"  - {table}")

# Untrack a table (data is preserved, only tracking is removed)
await client.untrack_table("old-table")

Staging Operations

# Stage all changes
result = await client.stage_all(branch="main")
print(f"Staged {result.tables_affected} tables")

# Stage specific table
await client.stage("users", branch="main")

# Unstage all
result = await client.unstage_all(branch="main")

# Unstage specific table
await client.unstage("users", branch="main")

Constraint Operations

# Get constraints for a table
constraints = await client.get_constraints("users", branch="main")
print(f"Foreign keys: {len(constraints.foreign_keys)}")
print(f"Unique: {len(constraints.unique_constraints)}")

# Add a constraint (using native types)
from horizon_epoch.client import _native
constraint = _native.UniqueConstraint("uq_email", ["email"])
await client.add_constraint("users", constraint)

# Remove a constraint
await client.remove_constraint("users", "uq_email")

# Diff constraints between branches
diff = await client.diff_constraints("main", "feature-x", table_name="users")

# Add an index
col = _native.IndexColumn("email")
idx = _native.IndexDefinition("idx_email", [col])
await client.add_index("users", idx)

# Remove an index
await client.remove_index("users", "idx_email")

Models

Author

from horizon_epoch import Author

author = Author(name="Jane Developer", email="jane@example.com")

Branch

from horizon_epoch import Branch

# Returned from client.branch(), client.branches(), client.get_branch()
branch.name           # str
branch.head_commit_id # str
branch.created_at     # datetime
branch.is_default     # bool

Commit

from horizon_epoch import Commit

# Returned from client.commit(), client.show()
commit.id             # str (full commit ID)
commit.short_id       # str (abbreviated)
commit.message        # str
commit.author         # Author
commit.timestamp      # datetime
commit.parent_ids     # List[str]

MergeResult

from horizon_epoch import MergeResult

result.source_branch     # str
result.target_branch     # str
result.is_fast_forward   # bool
result.has_conflicts     # bool
result.conflict_count    # int
result.result_commit_id  # Optional[str]
result.conflicts         # List[RecordConflict]

Status

from horizon_epoch import Status

status.branch            # Optional[str]
status.commit_id         # Optional[str]
status.is_detached       # bool
status.staged_tables     # List[str]
status.unstaged_tables   # List[str]

Enums

from horizon_epoch import MergeStrategy, StorageBackend

# Merge strategies
MergeStrategy.THREE_WAY   # Standard three-way merge
MergeStrategy.OURS        # Keep target branch version
MergeStrategy.THEIRS      # Keep source branch version
MergeStrategy.MANUAL      # Fail on conflicts

# Storage backends - Database
StorageBackend.POSTGRESQL  # PostgreSQL database
StorageBackend.MYSQL       # MySQL database
StorageBackend.MSSQL       # Microsoft SQL Server
StorageBackend.SQLITE      # SQLite database

# Storage backends - Object Storage
StorageBackend.S3          # AWS S3 / S3-compatible storage
StorageBackend.AZURE       # Azure Blob Storage
StorageBackend.GCS         # Google Cloud Storage

# Storage backends - Filesystem
StorageBackend.LOCAL       # Local filesystem

Exceptions

from horizon_epoch import (
    HorizonEpochError,           # Base exception
    RepositoryNotInitializedError,
    BranchNotFoundError,
    BranchAlreadyExistsError,
    CommitNotFoundError,
    MergeConflictError,
    ConnectionFailedError,
    StorageError,
    ValidationError,
)

Async Support

All client methods are async and should be awaited:

import asyncio
from horizon_epoch import Client

async def main():
    async with Client.connect("postgresql://...") as client:
        await client.init("my-repo")
        await client.branch("feature")
        await client.commit("Initial commit")

asyncio.run(main())

Native Module Access

For advanced operations, access the native Rust module directly:

from horizon_epoch.client import (
    _native,                  # Native module (if available)
    is_native_available,      # Check if native bindings loaded
    get_native_version,       # Get native module version
    get_native_info,          # Get diagnostic info
    require_native,           # Get native module or raise ImportError
)

if is_native_available():
    print(f"Native version: {get_native_version()}")

    # StorageLocation factory methods for all 8 backends
    # Database backends
    pg_loc = _native.StorageLocation.postgresql("conn", "schema", "table")
    mysql_loc = _native.StorageLocation.mysql("conn", "database", "table")
    mssql_loc = _native.StorageLocation.mssql("conn", "database", "schema", "table")
    sqlite_loc = _native.StorageLocation.sqlite("conn", "table")

    # Object storage backends
    s3_loc = _native.StorageLocation.s3("bucket", "prefix/", format="parquet")
    azure_loc = _native.StorageLocation.azure_blob("account", "container", "prefix/", format="parquet")
    gcs_loc = _native.StorageLocation.gcs("bucket", "prefix/", format="parquet")

    # Filesystem backend
    local_loc = _native.StorageLocation.local_filesystem("/path/to/data/", format="parquet")

    # Constraint types
    unique = _native.UniqueConstraint("uq_email", ["email"])
    check = _native.CheckConstraint("ck_age", "age >= 0")
    fk = _native.ForeignKeyConstraint("fk_user", ["user_id"], "users", ["id"])

    # Index types
    col = _native.IndexColumn("email")
    idx = _native.IndexDefinition("idx_email", [col])

Example Usage

import asyncio
from horizon_epoch import Client, Author, MergeStrategy

async def main():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Initialize repository
        await client.init("my-data-repo")

        # Create and checkout feature branch
        await client.branch("feature/new-schema")

        # Make changes (via your data pipeline)
        # ...

        # Stage and commit
        await client.stage_all()
        await client.commit(
            message="Add user preferences",
            author=Author(name="Dev", email="dev@example.com")
        )

        # Merge back to main
        result = await client.merge(
            source="feature/new-schema",
            strategy=MergeStrategy.THREE_WAY
        )

        if result.has_conflicts:
            print(f"Resolve {result.conflict_count} conflicts")
        else:
            print(f"Merged successfully: {result.result_commit_id}")

asyncio.run(main())

REST API Reference

The Horizon Epoch REST API provides HTTP endpoints for all version control operations. This reference documents all available endpoints, their parameters, and response formats.

Overview

Base URL

http://localhost:8000/api/v1

The API prefix is configurable via the EPOCH_API_PREFIX environment variable (default: /api/v1).

Content Type

All requests and responses use JSON format:

Content-Type: application/json

Running the API Server

# Start the server
uv run uvicorn horizon_epoch.api:app --reload --host 0.0.0.0 --port 8000

# With debug mode (enables Swagger UI)
EPOCH_DEBUG=true uv run uvicorn horizon_epoch.api:app --reload

Authentication

Current Status: The API does not currently implement authentication. All endpoints are open.

When authentication is implemented, the following methods will be supported:

Planned: API Key Authentication

curl -H "Authorization: Bearer <api-key>" \
  http://localhost:8000/api/v1/repositories

Planned: Basic Auth

curl -u username:password \
  http://localhost:8000/api/v1/repositories

For production deployments, we recommend placing the API behind a reverse proxy (nginx, Traefik) that handles authentication.


Health Endpoints

Health check endpoints are available at the root level (not under /api/v1).

Health Check

Check if the API server is running.

GET /health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "connected": true
}

curl Example:

curl http://localhost:8000/health

Readiness Check

Check if the API is ready to handle requests.

GET /ready

Response:

{
  "ready": true,
  "metadata_connected": true,
  "repository": "my-repo"
}

curl Example:

curl http://localhost:8000/ready

API Root

Get basic API information.

GET /

Response:

{
  "name": "Horizon Epoch API",
  "version": "0.1.0",
  "api_prefix": "/api/v1",
  "docs": "/docs"
}

curl Example:

curl http://localhost:8000/

Repository Endpoints

Manage repositories and their configuration.

Create Repository

Initialize a new Horizon Epoch repository.

POST /api/v1/repositories

Request Body:

{
  "name": "my-data-repo",
  "storage_configs": {
    "primary": {
      "backend": "postgresql",
      "url": "postgresql://user:pass@localhost/mydb"
    }
  }
}
FieldTypeRequiredDescription
namestringYesRepository name (min 1 character)
storage_configsobjectNoOptional storage backend configurations

Response: 201 Created

{
  "name": "my-data-repo",
  "default_branch": "main",
  "metadata_url": "postgresql://localhost/horizon_epoch"
}

curl Example:

curl -X POST http://localhost:8000/api/v1/repositories \
  -H "Content-Type: application/json" \
  -d '{"name": "my-data-repo"}'

Open Repository

Open an existing repository.

POST /api/v1/repositories/{name}/open

Path Parameters:

ParameterTypeDescription
namestringRepository name

Response: 200 OK

{
  "name": "my-data-repo",
  "default_branch": "main",
  "metadata_url": "postgresql://localhost/horizon_epoch"
}

curl Example:

curl -X POST http://localhost:8000/api/v1/repositories/my-data-repo/open

Get Repository Status

Get the current status of the repository.

GET /api/v1/repositories/status

Query Parameters:

ParameterTypeRequiredDescription
repositorystringNoRepository name (uses current if not specified)

Response:

{
  "branch": "main",
  "commit_id": "abc123def456789",
  "is_detached": false,
  "is_clean": true,
  "staged_count": 0,
  "unstaged_count": 0,
  "merge_in_progress": false
}

curl Example:

# Current repository
curl http://localhost:8000/api/v1/repositories/status

# Specific repository
curl "http://localhost:8000/api/v1/repositories/status?repository=my-data-repo"

List Tracked Tables

List all tables being tracked for version control.

GET /api/v1/repositories/tables

Query Parameters:

ParameterTypeRequiredDescription
repositorystringNoRepository name

Response:

["users", "orders", "products"]

curl Example:

curl http://localhost:8000/api/v1/repositories/tables

Add Storage Backend

Add a new storage backend to the repository.

POST /api/v1/repositories/storage

Request Body:

{
  "name": "warehouse",
  "backend": "postgresql",
  "config": {
    "url": "postgresql://user:pass@warehouse-host/analytics"
  }
}
FieldTypeRequiredDescription
namestringYesStorage configuration name
backendstringYesBackend type (see below)
configobjectYesBackend-specific configuration

Supported Backend Types:

BackendValueRequired Config Fields
PostgreSQLpostgresqlurl or (host, database)
MySQLmysqlurl or (host, database)
SQL Servermssqlurl or (host, database)
SQLitesqlitepath
S3s3bucket
Azure Blobazureaccount, container
GCSgcsbucket
Local Filesystemlocalpath

Backend Configuration Examples:

// PostgreSQL
{
  "name": "prod-pg",
  "backend": "postgresql",
  "config": {
    "host": "db.example.com",
    "port": 5432,
    "database": "production",
    "schema": "public"
  }
}

// MySQL
{
  "name": "analytics-mysql",
  "backend": "mysql",
  "config": {
    "host": "mysql.example.com",
    "port": 3306,
    "database": "analytics"
  }
}

// SQL Server
{
  "name": "warehouse-mssql",
  "backend": "mssql",
  "config": {
    "host": "sqlserver.example.com",
    "port": 1433,
    "database": "warehouse"
  }
}

// SQLite
{
  "name": "local-sqlite",
  "backend": "sqlite",
  "config": {
    "path": "/data/local.db"
  }
}

// S3
{
  "name": "datalake",
  "backend": "s3",
  "config": {
    "bucket": "company-datalake",
    "region": "us-west-2",
    "prefix": "epoch/"
  }
}

// Azure Blob
{
  "name": "azure-store",
  "backend": "azure",
  "config": {
    "account": "myaccount",
    "container": "data",
    "prefix": "epoch/"
  }
}

// GCS
{
  "name": "gcs-store",
  "backend": "gcs",
  "config": {
    "bucket": "my-bucket",
    "project": "my-project",
    "prefix": "epoch/"
  }
}

// Local Filesystem
{
  "name": "local-fs",
  "backend": "local",
  "config": {
    "path": "/data/epoch"
  }
}

Response: 201 Created

{
  "status": "created",
  "name": "warehouse"
}

curl Examples:

# PostgreSQL
curl -X POST http://localhost:8000/api/v1/repositories/storage \
  -H "Content-Type: application/json" \
  -d '{
    "name": "warehouse",
    "backend": "postgresql",
    "config": {"host": "db.example.com", "database": "warehouse"}
  }'

# S3
curl -X POST http://localhost:8000/api/v1/repositories/storage \
  -H "Content-Type: application/json" \
  -d '{
    "name": "datalake",
    "backend": "s3",
    "config": {"bucket": "company-datalake", "region": "us-west-2"}
  }'

# Azure Blob
curl -X POST http://localhost:8000/api/v1/repositories/storage \
  -H "Content-Type: application/json" \
  -d '{
    "name": "azure-store",
    "backend": "azure",
    "config": {"account": "myaccount", "container": "data"}
  }'

Track Table

Start tracking a table for version control.

POST /api/v1/repositories/tables/track

Request Body:

{
  "table_name": "users",
  "storage_name": "primary",
  "primary_key": ["id"]
}
FieldTypeRequiredDescription
table_namestringYesTable name to track
storage_namestringYesStorage backend containing the table
primary_keyarrayYesPrimary key column(s)

Response: 201 Created

{
  "status": "tracking",
  "table": "users"
}

curl Example:

curl -X POST http://localhost:8000/api/v1/repositories/tables/track \
  -H "Content-Type: application/json" \
  -d '{
    "table_name": "users",
    "storage_name": "primary",
    "primary_key": ["id"]
  }'

Branch Endpoints

Create, list, and manage branches.

Create Branch

Create a new branch.

POST /api/v1/branches

Request Body:

{
  "name": "feature/add-preferences",
  "start_point": "main",
  "checkout": true
}
FieldTypeRequiredDefaultDescription
namestringYes-Branch name (min 1 character)
start_pointstringNoHEADCommit ID or branch to start from
checkoutbooleanNotrueWhether to checkout the new branch

Response: 201 Created

{
  "name": "feature/add-preferences",
  "head_commit_id": "abc123def456789",
  "is_protected": false,
  "created_at": "2024-12-18T10:30:00Z"
}

curl Example:

# Create and checkout a new branch
curl -X POST http://localhost:8000/api/v1/branches \
  -H "Content-Type: application/json" \
  -d '{"name": "feature/add-preferences"}'

# Create from specific commit without checkout
curl -X POST http://localhost:8000/api/v1/branches \
  -H "Content-Type: application/json" \
  -d '{"name": "hotfix/urgent", "start_point": "abc123", "checkout": false}'

List Branches

List all branches in the repository.

GET /api/v1/branches

Query Parameters:

ParameterTypeRequiredDescription
repositorystringNoRepository name

Response:

{
  "branches": [
    {
      "name": "main",
      "head_commit_id": "abc123def456789",
      "is_protected": true,
      "created_at": "2024-12-01T00:00:00Z"
    },
    {
      "name": "feature/add-preferences",
      "head_commit_id": "def456abc789012",
      "is_protected": false,
      "created_at": "2024-12-18T10:30:00Z"
    }
  ],
  "current": "main"
}

curl Example:

curl http://localhost:8000/api/v1/branches

Checkout Branch

Switch to a different branch.

POST /api/v1/branches/{name}/checkout

Path Parameters:

ParameterTypeDescription
namestringBranch name to checkout

Response:

{
  "status": "checked_out",
  "branch": "feature/add-preferences",
  "commit_id": "def456abc789012"
}

curl Example:

curl -X POST http://localhost:8000/api/v1/branches/feature%2Fadd-preferences/checkout

Note: URL-encode branch names containing / (e.g., feature/xyz becomes feature%2Fxyz).

Delete Branch

Delete a branch.

DELETE /api/v1/branches/{name}

Path Parameters:

ParameterTypeDescription
namestringBranch name to delete

Query Parameters:

ParameterTypeRequiredDefaultDescription
forcebooleanNofalseForce delete even if unmerged
repositorystringNo-Repository name

Response: 204 No Content

curl Example:

# Normal delete (fails if unmerged)
curl -X DELETE http://localhost:8000/api/v1/branches/feature%2Fadd-preferences

# Force delete
curl -X DELETE "http://localhost:8000/api/v1/branches/feature%2Fadd-preferences?force=true"

Compare Branches

Compare two branches to see how they’ve diverged.

GET /api/v1/branches/compare

Query Parameters:

ParameterTypeRequiredDescription
sourcestringYesSource branch
targetstringYesTarget branch
repositorystringNoRepository name

Response:

{
  "source_branch": "feature/add-preferences",
  "target_branch": "main",
  "ahead_count": 3,
  "behind_count": 1,
  "can_fast_forward": false,
  "merge_base_id": "abc123def456789"
}

curl Example:

curl "http://localhost:8000/api/v1/branches/compare?source=feature/add-preferences&target=main"

Merge Branch

Merge a branch into the current branch.

POST /api/v1/branches/merge

Request Body:

{
  "source": "feature/add-preferences",
  "strategy": "THREE_WAY",
  "message": "Merge feature/add-preferences into main",
  "no_commit": false
}
FieldTypeRequiredDefaultDescription
sourcestringYes-Source branch to merge from
strategystringNoTHREE_WAYMerge strategy: THREE_WAY, FAST_FORWARD, OURS, THEIRS
messagestringNoautoCustom merge commit message
no_commitbooleanNofalseStage but don’t commit merge

Response:

{
  "source_branch": "feature/add-preferences",
  "target_branch": "main",
  "is_fast_forward": false,
  "has_conflicts": false,
  "conflict_count": 0,
  "result_commit_id": "ghi789jkl012345"
}

Response (with conflicts): 409 Conflict

{
  "error": "MergeConflictError",
  "message": "Merge conflict in 1 table(s)",
  "details": {
    "conflict_tables": ["users"],
    "conflict_count": 3
  }
}

curl Examples:

# Standard merge
curl -X POST http://localhost:8000/api/v1/branches/merge \
  -H "Content-Type: application/json" \
  -d '{"source": "feature/add-preferences"}'

# Fast-forward only
curl -X POST http://localhost:8000/api/v1/branches/merge \
  -H "Content-Type: application/json" \
  -d '{"source": "feature/add-preferences", "strategy": "FAST_FORWARD"}'

# Merge without committing
curl -X POST http://localhost:8000/api/v1/branches/merge \
  -H "Content-Type: application/json" \
  -d '{"source": "feature/add-preferences", "no_commit": true}'

Abort Merge

Abort an in-progress merge operation.

POST /api/v1/branches/merge/abort

Response:

{
  "status": "aborted"
}

curl Example:

curl -X POST http://localhost:8000/api/v1/branches/merge/abort

Commit Endpoints

Create commits and view commit history.

Create Commit

Create a new commit with staged changes.

POST /api/v1/commits

Request Body:

{
  "message": "Add user preferences feature",
  "author_name": "Jane Developer",
  "author_email": "jane@example.com"
}
FieldTypeRequiredDescription
messagestringYesCommit message (min 1 character)
author_namestringNoAuthor name (uses config default)
author_emailstringNoAuthor email (uses config default)

Response: 201 Created

{
  "id": "abc123def456789abcdef0123456789abcdef01",
  "short_id": "abc123d",
  "message": "Add user preferences feature",
  "author_name": "Jane Developer",
  "author_email": "jane@example.com",
  "created_at": "2024-12-18T10:45:00Z",
  "parent_count": 1
}

curl Example:

curl -X POST http://localhost:8000/api/v1/commits \
  -H "Content-Type: application/json" \
  -d '{"message": "Add user preferences feature"}'

List Commits

Get commit history.

GET /api/v1/commits

Query Parameters:

ParameterTypeRequiredDefaultDescription
refstringNoHEADBranch or commit reference
limitintegerNo100Max commits to return (1-1000)
offsetintegerNo0Offset for pagination
repositorystringNo-Repository name

Response:

{
  "commits": [
    {
      "id": "abc123def456789abcdef0123456789abcdef01",
      "short_id": "abc123d",
      "message": "Add user preferences feature",
      "author_name": "Jane Developer",
      "author_email": "jane@example.com",
      "created_at": "2024-12-18T10:45:00Z",
      "parent_count": 1
    }
  ],
  "total": 42,
  "has_more": true
}

curl Examples:

# Get recent commits
curl http://localhost:8000/api/v1/commits

# Get commits from specific branch
curl "http://localhost:8000/api/v1/commits?ref=feature/add-preferences"

# Pagination
curl "http://localhost:8000/api/v1/commits?limit=10&offset=20"

Get Commit

Get details of a specific commit.

GET /api/v1/commits/{commit_id}

Path Parameters:

ParameterTypeDescription
commit_idstringFull or short commit ID

Response:

{
  "id": "abc123def456789abcdef0123456789abcdef01",
  "short_id": "abc123d",
  "message": "Add user preferences feature",
  "author_name": "Jane Developer",
  "author_email": "jane@example.com",
  "created_at": "2024-12-18T10:45:00Z",
  "parent_count": 1
}

curl Example:

curl http://localhost:8000/api/v1/commits/abc123def456789

Get Diff

Show differences between commits or branches.

GET /api/v1/commits/diff

Query Parameters:

ParameterTypeRequiredDescription
basestringNoBase commit/branch (defaults to parent)
targetstringNoTarget commit/branch (defaults to HEAD)
repositorystringNoRepository name

Response:

{
  "base_commit_id": "abc123def456789",
  "target_commit_id": "def456ghi789012",
  "tables_changed": 2,
  "total_inserts": 15,
  "total_updates": 8,
  "total_deletes": 3
}

curl Examples:

# Diff working state vs HEAD
curl http://localhost:8000/api/v1/commits/diff

# Diff between branches
curl "http://localhost:8000/api/v1/commits/diff?base=main&target=feature/add-preferences"

# Diff between commits
curl "http://localhost:8000/api/v1/commits/diff?base=abc123&target=def456"

Stage All Changes

Stage all pending changes for commit.

POST /api/v1/commits/stage

Response:

{
  "status": "staged",
  "count": 5
}

curl Example:

curl -X POST http://localhost:8000/api/v1/commits/stage

Unstage All Changes

Remove all changes from the staging area.

POST /api/v1/commits/unstage

Response:

{
  "status": "unstaged",
  "count": 5
}

curl Example:

curl -X POST http://localhost:8000/api/v1/commits/unstage

Error Responses

All errors follow a consistent format:

{
  "error": "ErrorClassName",
  "message": "Human-readable error description",
  "details": {
    "additional": "context information"
  }
}

HTTP Status Codes

CodeMeaningWhen Used
200OKSuccessful GET/POST operations
201CreatedResource successfully created
204No ContentSuccessful DELETE operations
400Bad RequestInvalid request format or parameters
404Not FoundResource not found
409ConflictMerge conflicts, duplicate resources
422Unprocessable EntityValidation errors
503Service UnavailableDatabase connection issues

Error Types

ErrorHTTP StatusDescription
BranchNotFoundError404Specified branch doesn’t exist
CommitNotFoundError404Specified commit doesn’t exist
RepositoryNotInitializedError404Repository not initialized or opened
BranchAlreadyExistsError409Branch name already in use
MergeConflictError409Merge has unresolved conflicts
ConnectionFailedError503Unable to connect to metadata database
ValidationError422Request body validation failed

Example Error Responses

Branch Not Found:

{
  "error": "BranchNotFoundError",
  "message": "Branch 'feature/nonexistent' not found",
  "details": {
    "branch": "feature/nonexistent"
  }
}

Validation Error:

{
  "detail": [
    {
      "type": "string_too_short",
      "loc": ["body", "name"],
      "msg": "String should have at least 1 character",
      "input": "",
      "ctx": {"min_length": 1}
    }
  ]
}

Connection Error:

{
  "error": "ConnectionFailedError",
  "message": "Failed to connect to metadata database",
  "details": {
    "url": "postgresql://localhost/horizon_epoch"
  }
}

Pagination

List endpoints support pagination via limit and offset query parameters.

GET /api/v1/commits?limit=10&offset=20
ParameterTypeDefaultRangeDescription
limitinteger1001-1000Maximum items per page
offsetinteger00+Number of items to skip

Paginated responses include metadata:

{
  "items": [...],
  "total": 150,
  "has_more": true
}

OpenAPI Specification

The API provides auto-generated OpenAPI 3.0 documentation:

EndpointDescription
GET /openapi.jsonRaw OpenAPI specification
GET /docsSwagger UI (interactive, debug mode only)
GET /redocReDoc (alternative UI, debug mode only)

Enable debug mode to access Swagger UI:

EPOCH_DEBUG=true uv run uvicorn horizon_epoch.api:app --reload

Then visit: http://localhost:8000/docs

Export OpenAPI Spec

You can export the OpenAPI specification using the provided script:

# Export as JSON
cd python
uv run python scripts/export_openapi.py --output ../docs/openapi.json

# Export as YAML
uv run python scripts/export_openapi.py --format yaml --output ../docs/openapi.yaml

Environment Configuration

The API can be configured via environment variables:

VariableDefaultDescription
EPOCH_METADATA_URLpostgresql://localhost:5432/horizon_epochMetadata database URL
EPOCH_DEFAULT_BRANCHmainDefault branch name
EPOCH_API_PREFIX/api/v1API route prefix
EPOCH_DEBUGfalseEnable debug mode (exposes Swagger UI)
EPOCH_LOG_LEVELINFOLogging level
EPOCH_AUTHOR_NAME-Default author name for commits
EPOCH_AUTHOR_EMAIL-Default author email for commits

Example:

export EPOCH_METADATA_URL="postgresql://user:pass@localhost:5432/horizon_epoch"
export EPOCH_DEBUG=true
export EPOCH_AUTHOR_NAME="API User"
export EPOCH_AUTHOR_EMAIL="api@example.com"

uv run uvicorn horizon_epoch.api:app --host 0.0.0.0 --port 8000

Rate Limiting

Current Status: Rate limiting is not currently implemented.

When rate limiting is implemented, limits will be returned in response headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1705312200

CORS

Cross-Origin Resource Sharing is enabled for all origins by default. For production, configure allowed origins appropriately.

Current configuration:

  • Origins: * (all)
  • Methods: All HTTP methods
  • Headers: All headers
  • Credentials: Allowed

See Also

Configuration Reference

Complete reference for Horizon Epoch configuration options.

Configuration Sources

Configuration is loaded from (in order of precedence):

  1. Command-line arguments (highest priority)
  2. Environment variables
  3. Local repository configuration (.epoch/config)
  4. Configuration file (epoch.toml)
  5. Default values (lowest priority)

Configuration File Search Order

The CLI searches for configuration files in this order:

  1. ./epoch.toml (current directory)
  2. ./config/epoch.toml
  3. ./.epoch/config.toml
  4. ~/.config/epoch/config.toml
  5. /etc/epoch/config.toml

The first file found is used. Override with --config or EPOCH_CONFIG_FILE.

Configuration File

Default location: ~/.epoch/config.toml or ./epoch.toml

Full Example

# epoch.toml

[metadata]
url = "postgresql://localhost/horizon_epoch"
pool_size = 10
connect_timeout = 30

[defaults]
branch = "main"
author_name = "Your Name"
author_email = "you@example.com"
format = "text"

[logging]
level = "info"
format = "pretty"  # or "json"
file = "/var/log/epoch/epoch.log"

# PostgreSQL storage backends
[storage.postgres.production]
url = "postgresql://user:pass@prod-db.example.com/production"
pool_size = 20
connect_timeout = 30

[storage.postgres.staging]
host = "staging-db.example.com"
port = 5432
database = "staging"
username = "epoch_user"
password_file = "/run/secrets/db-password"
sslmode = "verify-full"
ssl_rootcert = "/etc/ssl/certs/ca.crt"

[storage.postgres.secure]
host = "secure-db.internal"
database = "secure"
vault_path = "secret/data/secure-db"

[storage.postgres.secure.ssh_tunnel]
host = "bastion.example.com"
user = "ubuntu"
key_file = "~/.ssh/id_rsa"

# S3 storage backends
[storage.s3.datalake]
bucket = "company-datalake"
region = "us-east-1"
prefix = "horizon-epoch/"

[storage.s3.partner]
bucket = "partner-shared"
region = "us-west-2"
assume_role_arn = "arn:aws:iam::123456789012:role/PartnerAccess"
external_id = "partner-123"

[storage.s3.minio]
bucket = "local-data"
endpoint = "http://localhost:9000"
access_key = "minioadmin"
secret_key = "minioadmin"
force_path_style = true

# Vault integration
[vault]
addr = "https://vault.example.com:8200"
auth_method = "approle"
role_id = "xxx-xxx-xxx"
secret_id_file = "/run/secrets/vault-secret-id"
ca_cert = "/etc/ssl/certs/vault-ca.crt"

# AWS integration
[aws]
region = "us-east-1"
profile = "production"

# Credential caching
[credentials]
cache_enabled = true
cache_ttl = 300
refresh_before_expiry = 60

Environment Variables

All configuration can be set via environment variables with EPOCH_ prefix:

Core Settings

VariableDescriptionDefault
EPOCH_METADATA_URLMetadata database URL-
DATABASE_URLFallback for metadata URL (SQLx compatible)-
EPOCH_CONFIG_FILEConfig file path~/.epoch/config.toml
EPOCH_DEFAULT_BRANCHDefault branch namemain
EPOCH_FORMATOutput formattext

Author Settings

VariableDescriptionDefault
EPOCH_AUTHOR_NAMEDefault author name for commitsgit config user.name
EPOCH_AUTHOR_EMAILDefault author email for commitsgit config user.email

Author information is resolved in this priority order:

  1. Command-line arguments (--author-name, --author-email)
  2. Environment variables (EPOCH_AUTHOR_NAME, EPOCH_AUTHOR_EMAIL)
  3. Local repository config (.epoch/config)
  4. Git configuration (git config user.name, git config user.email)

Editor Settings

VariableDescriptionDefault
EPOCH_EDITOREditor for interactive operations-
EDITORFallback editor-
VISUALSecondary fallback editor-

Editor is selected in priority order: EPOCH_EDITOR > EDITOR > VISUAL > platform default (vi on Unix, notepad.exe on Windows).

Commit Signing

VariableDescriptionDefault
EPOCH_SIGNING_KEYPath to signing key file (ASCII-armored)-
EPOCH_SIGNING_KEY_PASSWORDPassword for signing key- (prompts)

Logging

VariableDescriptionDefault
EPOCH_LOG_LEVELLog level (error/warn/info/debug/trace)info
EPOCH_LOG_FORMATLog format (pretty/json)pretty
EPOCH_LOG_FILELog file path-
RUST_LOGRust logging filter (module=level format)-

API Server

VariableDescriptionDefault
EPOCH_API_HOSTAPI server bind address0.0.0.0
EPOCH_API_PORTAPI server port8000
EPOCH_API_PREFIXAPI route prefix/api/v1
EPOCH_DEBUGEnable debug mode (exposes Swagger UI)false
EPOCH_SERVER_URLRemote server URL for CLI-

OAuth/OIDC Authentication

VariableDescription
EPOCH_OAUTH_CLIENT_IDDefault OAuth client ID
EPOCH_OAUTH_CLIENT_SECRETDefault OAuth client secret
EPOCH_OAUTH_{PROVIDER}_CLIENT_IDProvider-specific client ID
EPOCH_OAUTH_{PROVIDER}_CLIENT_SECRETProvider-specific client secret
EPOCH_OAUTH_OIDC_ISSUEROIDC issuer URL for custom provider

Supported providers: GITHUB, GOOGLE, AZURE, OKTA, OIDC (custom).

Example:

# GitHub OAuth
export EPOCH_OAUTH_GITHUB_CLIENT_ID="your-client-id"
export EPOCH_OAUTH_GITHUB_CLIENT_SECRET="your-client-secret"
epoch auth login --method oauth --provider github

# Custom OIDC provider
export EPOCH_OAUTH_CLIENT_ID="your-client-id"
export EPOCH_OAUTH_OIDC_ISSUER="https://your-idp.example.com"
epoch auth login --method oauth --provider oidc

PostgreSQL

VariableDescription
EPOCH_POSTGRES_{NAME}_URLFull connection URL
EPOCH_POSTGRES_{NAME}_HOSTDatabase host
EPOCH_POSTGRES_{NAME}_PORTDatabase port
EPOCH_POSTGRES_{NAME}_DATABASEDatabase name
EPOCH_POSTGRES_{NAME}_USERNAMEUsername
EPOCH_POSTGRES_{NAME}_PASSWORDPassword
EPOCH_POSTGRES_{NAME}_SSLMODESSL mode

S3

VariableDescription
EPOCH_S3_{NAME}_BUCKETBucket name
EPOCH_S3_{NAME}_REGIONAWS region
EPOCH_S3_{NAME}_ENDPOINTCustom endpoint
EPOCH_S3_{NAME}_ACCESS_KEYAccess key ID
EPOCH_S3_{NAME}_SECRET_KEYSecret access key
AWS_ACCESS_KEY_IDDefault AWS access key
AWS_SECRET_ACCESS_KEYDefault AWS secret key
AWS_DEFAULT_REGIONDefault AWS region

MySQL

VariableDescription
EPOCH_MYSQL_{NAME}_URLFull connection URL
EPOCH_MYSQL_{NAME}_HOSTDatabase host
EPOCH_MYSQL_{NAME}_PORTDatabase port
EPOCH_MYSQL_{NAME}_DATABASEDatabase name
EPOCH_MYSQL_{NAME}_USERNAMEUsername
EPOCH_MYSQL_{NAME}_PASSWORDPassword

SQL Server

VariableDescription
EPOCH_MSSQL_{NAME}_URLFull connection URL
EPOCH_MSSQL_{NAME}_HOSTDatabase host
EPOCH_MSSQL_{NAME}_PORTDatabase port
EPOCH_MSSQL_{NAME}_DATABASEDatabase name
EPOCH_MSSQL_{NAME}_USERNAMEUsername
EPOCH_MSSQL_{NAME}_PASSWORDPassword

SQLite

VariableDescription
EPOCH_SQLITE_{NAME}_PATHDatabase file path

Azure Blob Storage

VariableDescription
EPOCH_AZURE_{NAME}_ACCOUNTStorage account name
EPOCH_AZURE_{NAME}_CONTAINERContainer name
EPOCH_AZURE_{NAME}_CONNECTION_STRINGConnection string
EPOCH_AZURE_{NAME}_ACCESS_KEYStorage access key
AZURE_TENANT_IDAzure AD tenant ID
AZURE_CLIENT_IDAzure AD client ID
AZURE_CLIENT_SECRETAzure AD client secret

Google Cloud Storage

VariableDescription
EPOCH_GCS_{NAME}_BUCKETBucket name
EPOCH_GCS_{NAME}_PROJECTGCP project ID
GOOGLE_APPLICATION_CREDENTIALSService account key file path

Local Filesystem

VariableDescription
EPOCH_LOCAL_{NAME}_PATHBase directory path

Vault

VariableDescription
VAULT_ADDRVault server address
VAULT_TOKENVault token
VAULT_ROLE_IDAppRole role ID
VAULT_SECRET_IDAppRole secret ID

SSH

VariableDescription
EPOCH_SSH_HOSTSSH tunnel host
EPOCH_SSH_USERSSH username
EPOCH_SSH_KEY_FILESSH private key path
SSH_AUTH_SOCKSSH agent socket

Configuration Sections

[metadata]

Metadata database configuration.

OptionTypeDefaultDescription
urlstring-PostgreSQL connection URL
pool_sizeint10Connection pool size
connect_timeoutint30Connection timeout (seconds)
idle_timeoutint600Idle connection timeout
max_lifetimeint3600Max connection lifetime

[defaults]

Default values for operations.

OptionTypeDefaultDescription
branchstring“main”Default branch
author_namestring-Default author name
author_emailstring-Default author email
formatstring“text”Output format

[logging]

Logging configuration.

OptionTypeDefaultDescription
levelstring“info”Log level
formatstring“pretty”Log format
filestring-Log file path
jsonboolfalseJSON log format

[storage.postgres.{name}]

PostgreSQL backend configuration.

OptionTypeDefaultDescription
urlstring-Full connection URL
hoststring-Database host
portint5432Database port
databasestring-Database name
usernamestring-Username
passwordstring-Password
password_filestring-Password file path
sslmodestring“prefer”SSL mode
ssl_certstring-Client certificate
ssl_keystring-Client key
ssl_rootcertstring-CA certificate
pool_sizeint10Pool size
connect_timeoutint30Timeout
vault_pathstring-Vault secret path
vault_rolestring-Vault dynamic role
aws_secret_idstring-AWS Secrets Manager ID
use_iam_authboolfalseUse RDS IAM auth

[storage.postgres.{name}.ssh_tunnel]

SSH tunnel configuration for PostgreSQL.

OptionTypeDefaultDescription
hoststring-Bastion host
portint22SSH port
userstring-SSH username
key_filestring-Private key path
key_passphrasestring-Key passphrase
use_agentboolfalseUse SSH agent
known_hosts_filestring-Known hosts file
strict_host_key_checkingbool/stringtrueHost key checking
keepalive_intervalint60Keepalive interval
connect_timeoutint30Connection timeout
auto_reconnectbooltrueAuto-reconnect

[storage.s3.{name}]

S3 backend configuration.

OptionTypeDefaultDescription
bucketstring-Bucket name
regionstring-AWS region
endpointstring-Custom endpoint
prefixstring“”Key prefix
access_keystring-Access key ID
secret_keystring-Secret access key
session_tokenstring-Session token
force_path_styleboolfalsePath-style URLs
use_instance_credentialsboolfalseUse instance profile
assume_role_arnstring-Role to assume
external_idstring-External ID
vault_pathstring-Vault secret path
aws_secret_idstring-Secrets Manager ID

[storage.mysql.{name}]

MySQL backend configuration.

OptionTypeDefaultDescription
urlstring-Full connection URL
hoststring-Database host
portint3306Database port
databasestring-Database name
usernamestring-Username
passwordstring-Password
password_filestring-Password file path
ssl_modestring“preferred”SSL mode
ssl_castring-CA certificate
pool_sizeint10Pool size
connect_timeoutint30Timeout
vault_pathstring-Vault secret path

[storage.mysql.{name}.ssh_tunnel]

SSH tunnel configuration for MySQL (same options as PostgreSQL).

[storage.mssql.{name}]

SQL Server backend configuration.

OptionTypeDefaultDescription
urlstring-Full connection URL
hoststring-Database host
portint1433Database port
databasestring-Database name
usernamestring-Username
passwordstring-Password
password_filestring-Password file path
trust_certboolfalseTrust server certificate
encryptbooltrueEncrypt connection
pool_sizeint10Pool size
connect_timeoutint30Timeout
vault_pathstring-Vault secret path
use_windows_authboolfalseUse Windows auth

[storage.sqlite.{name}]

SQLite backend configuration.

OptionTypeDefaultDescription
pathstring-Database file path
modestring“rwc”Open mode (r/rw/rwc)
in_memoryboolfalseUse in-memory database
busy_timeoutint5000Busy timeout (ms)
journal_modestring“wal”Journal mode

[storage.azure.{name}]

Azure Blob Storage backend configuration.

OptionTypeDefaultDescription
accountstring-Storage account name
containerstring-Container name
prefixstring“”Blob prefix
auth_methodstring“connection_string”Auth method
connection_stringstring-Connection string
access_keystring-Storage access key
sas_tokenstring-SAS token
tenant_idstring-Azure AD tenant ID
client_idstring-Azure AD client ID
client_secretstring-Azure AD client secret
use_managed_identityboolfalseUse managed identity
vault_pathstring-Vault secret path

[storage.gcs.{name}]

Google Cloud Storage backend configuration.

OptionTypeDefaultDescription
bucketstring-Bucket name
project_idstring-GCP project ID
prefixstring“”Object prefix
auth_methodstring“service_account”Auth method
credentials_filestring-Service account key file
use_adcboolfalseUse Application Default Credentials
vault_pathstring-Vault secret path

[storage.local.{name}]

Local filesystem backend configuration.

OptionTypeDefaultDescription
pathstring-Base directory path
create_dirsbooltrueCreate directories if missing

[vault]

HashiCorp Vault configuration.

OptionTypeDefaultDescription
addrstring-Vault address
tokenstring-Vault token
auth_methodstring“token”Auth method
role_idstring-AppRole role ID
secret_idstring-AppRole secret ID
secret_id_filestring-Secret ID file
rolestring-K8s/AWS role
ca_certstring-CA certificate
skip_verifyboolfalseSkip TLS verify
namespacestring-Vault namespace

[aws]

AWS configuration.

OptionTypeDefaultDescription
regionstring-Default region
profilestring-AWS profile
access_key_idstring-Access key
secret_access_keystring-Secret key

[credentials]

Credential caching configuration.

OptionTypeDefaultDescription
cache_enabledbooltrueEnable caching
cache_ttlint300Cache TTL (seconds)
cache_max_sizeint100Max cache entries
refresh_before_expiryint60Refresh threshold

Variable Substitution

Environment variables can be referenced in config:

[storage.postgres.production]
password = "${DB_PASSWORD}"

File contents can be referenced:

[storage.postgres.production]
password = "@file:/run/secrets/db-password"

Profiles

Multiple profiles can be defined:

[profiles.development]
metadata.url = "postgresql://localhost/horizon_epoch_dev"

[profiles.production]
metadata.url = "postgresql://prod-db/horizon_epoch"

Switch profiles:

epoch --profile production status

Local Repository Configuration

When you initialize a repository with epoch init, a .epoch/ directory is created with local configuration.

Directory Structure

.epoch/
├── config          # Repository configuration (TOML)
├── HEAD            # Current branch reference
└── refs/
    ├── heads/      # Branch references
    └── tags/       # Tag references

Local Config File (.epoch/config)

# Repository identity
repository_name = "my-data-repo"
repository_id = "550e8400-e29b-41d4-a716-446655440000"

# Metadata database connection
metadata_url = "postgresql://localhost/horizon_epoch"

# Current branch (also tracked in HEAD file)
current_branch = "main"

# Author information (optional, overrides global config)
[author]
name = "Your Name"
email = "you@example.com"

# Branching configuration (optional)
[branching]
strategy = "environment"  # or "gitflow", "trunk"

[branching.promotion]
paths = [
    { source = "development", target = "staging" },
    { source = "staging", target = "production" }
]
require_approval = true

HEAD File

The HEAD file tracks the current branch:

ref: refs/heads/main

Or in detached HEAD state (direct commit reference):

abc123def456...

DDL Configuration

DDL (Data Definition Language) execution settings for schema changes.

[ddl]

OptionTypeDefaultDescription
lock_timeout_secsint30Lock acquisition timeout
statement_timeout_secsint300Statement execution timeout
check_table_statsbooltrueCheck table size before DDL
use_concurrentboolfalseUse concurrent index operations
warn_large_table_rowsint1000000Warn if table exceeds rows
warn_large_table_bytesint1073741824Warn if table exceeds bytes (1GB)

[ddl.retry]

OptionTypeDefaultDescription
max_retriesint3Maximum retry attempts
initial_delay_msint500Initial retry delay
max_delay_msint30000Maximum retry delay
backoff_multiplierfloat2.0Exponential backoff factor

Example:

[ddl]
lock_timeout_secs = 30
statement_timeout_secs = 300
check_table_stats = true

[ddl.retry]
max_retries = 3
initial_delay_ms = 500
max_delay_ms = 30000

Branching Configuration

[branching]

OptionTypeDefaultDescription
strategystring“environment”Branching strategy
default_branchstring“main”Default branch name
protected_branchesarray[“main”, “production”]Protected branch patterns

Branching Strategies

environment (default):

  • Branches represent environments (development, staging, production)
  • Promotion flows upward through environments
  • Good for data pipelines and ETL workflows

gitflow:

  • Feature branches, develop, release, hotfix, main
  • Traditional Git Flow pattern adapted for data
  • Good for teams familiar with Git Flow

trunk:

  • Single main branch with short-lived feature branches
  • Continuous integration style
  • Good for fast-moving projects

[branching.promotion]

OptionTypeDefaultDescription
pathsarray[]Allowed promotion paths
require_approvalboolfalseRequire approval for promotions
auto_mergebooltrueAuto-merge if no conflicts
conflict_strategystring“fail”Conflict handling (fail/ours/theirs)

Example:

[branching]
strategy = "environment"
default_branch = "development"
protected_branches = ["staging", "production"]

[branching.promotion]
paths = [
    { source = "development", target = "staging" },
    { source = "staging", target = "production" }
]
require_approval = true
conflict_strategy = "fail"

Conflict Handling

[conflict]

OptionTypeDefaultDescription
default_strategystring“manual”Default conflict resolution
auto_resolve_schemaboolfalseAuto-resolve schema conflicts

[conflict.table_rules]

Per-table conflict rules:

[conflict.table_rules.users]
strategy = "last_write_wins"
timestamp_field = "updated_at"

[conflict.table_rules.audit_log]
strategy = "append_only"

[conflict.field_rules]

Per-field conflict rules:

[conflict.field_rules."users.email"]
strategy = "source_wins"

[conflict.field_rules."users.login_count"]
strategy = "sum"

Strategies: manual, source_wins, target_wins, last_write_wins, sum, max, min, append_only.

Testing and Development

Development Environment Variables

VariableDescription
DATABASE_URLDatabase URL for tests/development
TEST_DATABASE_URLAlternative test database URL
SQLX_OFFLINEEnable SQLx offline mode (true)

S3/MinIO Testing

VariableDescription
S3_ENDPOINTS3/MinIO endpoint URL
S3_BUCKETTest bucket name
S3_ACCESS_KEYAccess key for testing
S3_SECRET_KEYSecret key for testing

E2E Testing

VariableDescriptionDefault
EPOCH_BINPath to CLI binary for tests./target/debug/epoch
E2E_KEEP_TEMPKeep temporary test directoriesfalse
E2E_VERBOSEVerbose test outputfalse

Example Development Setup

# .env file for development
DATABASE_URL=postgresql://horizon:horizon_dev_password@localhost:5432/horizon_epoch
EPOCH_METADATA_URL=postgresql://horizon:horizon_dev_password@localhost:5432/horizon_epoch

# MinIO for local S3 testing
EPOCH_S3_ENDPOINT=http://localhost:9000
EPOCH_S3_ACCESS_KEY=minioadmin
EPOCH_S3_SECRET_KEY=minioadmin
EPOCH_S3_BUCKET=horizon-epoch-data

# Development settings
EPOCH_DEBUG=true
EPOCH_LOG_LEVEL=debug
RUST_LOG=horizon_epoch=debug,sqlx=warn

# Author info
EPOCH_AUTHOR_NAME="Development User"
EPOCH_AUTHOR_EMAIL="dev@example.com"

Complete Configuration Example

# epoch.toml - Complete configuration example

# =============================================================================
# Metadata Database
# =============================================================================
[metadata_db]
url = "postgresql://localhost:5432/horizon_epoch"
max_connections = 10
min_connections = 2

# =============================================================================
# Logging
# =============================================================================
[logging]
level = "info"
format = "json"  # "pretty" or "json"

# =============================================================================
# DDL Execution
# =============================================================================
[ddl]
lock_timeout_secs = 30
statement_timeout_secs = 300
check_table_stats = true
use_concurrent = false

[ddl.retry]
max_retries = 3
initial_delay_ms = 500
max_delay_ms = 30000

# =============================================================================
# Branching Strategy
# =============================================================================
[branching]
strategy = "environment"
default_branch = "development"
protected_branches = ["staging", "production"]

[branching.promotion]
paths = [
    { source = "development", target = "staging" },
    { source = "staging", target = "production" }
]
require_approval = true

# =============================================================================
# Storage Backends
# =============================================================================

# PostgreSQL backends
[[storage.postgresql]]
name = "primary"
url = "postgresql://localhost:5432/data"

[[storage.postgresql]]
name = "analytics"
host = "analytics-db.internal"
port = 5432
database = "analytics"

[storage.postgresql.credentials]
type = "vault"
addr = "https://vault.example.com:8200"
secret_path = "database/creds/analytics"

[storage.postgresql.tls]
mode = "verify_full"
ca_cert = "/etc/ssl/certs/ca.pem"

# S3 backends
[[storage.s3]]
name = "datalake"
endpoint = "https://s3.amazonaws.com"
bucket = "company-datalake"
region = "us-east-1"
prefix = "epoch/"

[[storage.s3]]
name = "archive"
endpoint = "http://minio.internal:9000"
bucket = "archive"
access_key_id = "${MINIO_ACCESS_KEY}"
secret_access_key = "${MINIO_SECRET_KEY}"
force_path_style = true

Credential Provider Reference

Complete reference for the Horizon Epoch credential provider abstraction. This document covers the architecture, all available providers, credential types, caching, and error handling.

Architecture Overview

The credential provider system provides a unified abstraction for managing credentials across all storage backends.

                    ┌─────────────────────────────────────────┐
                    │          Storage Backend                │
                    │  (PostgreSQL, S3, Vault, etc.)          │
                    └─────────────────────┬───────────────────┘
                                          │
                                          │ get_credentials()
                                          ▼
                    ┌─────────────────────────────────────────┐
                    │       CachedCredentialProvider          │
                    │  (optional caching layer)               │
                    └─────────────────────┬───────────────────┘
                                          │
          ┌───────────────────────────────┼───────────────────────────────┐
          │                               │                               │
          ▼                               ▼                               ▼
┌─────────────────┐             ┌─────────────────┐             ┌─────────────────┐
│    Static       │             │  Environment    │             │     File        │
│   Provider      │             │   Provider      │             │   Provider      │
└─────────────────┘             └─────────────────┘             └─────────────────┘
          │                               │                               │
          ▼                               ▼                               ▼
┌─────────────────┐             ┌─────────────────┐             ┌─────────────────┐
│     Vault       │             │     AWS         │             │   Web Identity  │
│   Provider      │             │   Provider      │             │   Provider      │
└─────────────────┘             └─────────────────┘             └─────────────────┘

Core Trait: CredentialProvider

All credential providers implement this async trait:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait CredentialProvider: Send + Sync {
    /// Retrieves credentials from this provider.
    async fn get_credentials(&self) -> CredentialResult<StorageCredentials>;

    /// Returns true if credentials need refresh.
    fn requires_refresh(&self) -> bool;

    /// Refreshes the credentials.
    async fn refresh(&self) -> CredentialResult<StorageCredentials>;

    /// Returns the recommended refresh interval.
    fn refresh_interval(&self) -> Option<Duration>;

    /// Returns the provider name for logging.
    fn provider_name(&self) -> &str;

    /// Returns true if refresh is supported.
    fn supports_refresh(&self) -> bool {
        self.refresh_interval().is_some()
    }

    /// Returns metadata for diagnostics.
    fn provider_info(&self) -> ProviderInfo;
}
}

Extension Trait: CredentialProviderExt

Additional utility methods available on all providers:

MethodDescription
get_fresh_credentials()Gets credentials, refreshing first if needed
validate()Validates that credentials can be retrieved

Credential Types

StorageCredentials

The unified credential container with expiration tracking:

#![allow(unused)]
fn main() {
pub struct StorageCredentials {
    /// The specific type of credential
    pub credential_type: StorageCredentialType,

    /// When these credentials expire (optional)
    pub expires_at: Option<DateTime<Utc>>,

    /// Additional metadata
    pub metadata: HashMap<String, String>,
}
}

Factory Methods:

MethodDescription
username_password(user, pass)Create username/password credentials
token(token)Create token-based credentials
aws(access_key, secret_key, session_token)Create AWS credentials
oauth2(access_token, refresh_token, token_type)Create OAuth2 credentials
certificate(cert, key)Create certificate credentials

Builder Methods:

MethodDescription
.with_expiration(datetime)Set expiration time
.with_ttl(duration)Set TTL relative to now
.with_metadata(key, value)Add metadata

Inspection Methods:

MethodDescription
is_expired()Check if credentials are expired
time_to_expiry()Get time until expiration
is_username_password()Type check
is_token()Type check
is_aws()Type check
is_certificate()Type check
is_oauth2()Type check

StorageCredentialType

The underlying credential type enum:

VariantFieldsUse Case
UsernamePasswordusername, passwordDatabase authentication
TokentokenAPI key/token authentication
AwsCredentialsaccess_key_id, secret_access_key, session_token?AWS services
Certificatecert, keymTLS client certificates
OAuth2access_token, refresh_token?, token_typeOAuth2 authentication

Certificate Types

CertificateData:

#![allow(unused)]
fn main() {
pub struct CertificateData {
    pub pem: SecretString,           // PEM-encoded certificate
    pub chain: Option<Vec<SecretString>>,  // Optional chain
}
}

PrivateKeyData:

#![allow(unused)]
fn main() {
pub struct PrivateKeyData {
    pub pem: SecretString,           // PEM-encoded private key
    pub passphrase: Option<SecretString>,  // Optional passphrase
}
}

Built-in Providers

Static Provider

Pre-configured credentials for testing and development. Never use in production.

Factory Methods:

#![allow(unused)]
fn main() {
StaticCredentialProvider::username_password("user", "pass")
StaticCredentialProvider::token("api-key")
StaticCredentialProvider::aws("AKIA...", "secret", None)
StaticCredentialProvider::oauth2("access", Some("refresh"), "Bearer")
StaticCredentialProvider::certificate(cert_data, key_data)
}

Builder Pattern:

#![allow(unused)]
fn main() {
StaticCredentialProviderBuilder::new()
    .username("admin")
    .password("secret")
    .build()
}

Configuration (TOML):

[storage.postgres.mydb]
# Static credentials embedded in config (dev only!)
url = "postgresql://user:password@localhost/db"
PropertyRequiredDescription
urlYesFull connection URL with credentials
usernameAlternativeUsername (with separate password)
passwordWith usernamePassword

Refresh Support: No


Environment Provider

Reads credentials from environment variables.

Factory Methods:

#![allow(unused)]
fn main() {
// PostgreSQL standard: PGUSER, PGPASSWORD
EnvCredentialProvider::postgres_default()

// AWS standard: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
EnvCredentialProvider::aws_default()

// Custom prefix: MYAPP_DB_USERNAME, MYAPP_DB_PASSWORD
EnvCredentialProvider::with_prefix("MYAPP_DB")
}

Builder Pattern:

#![allow(unused)]
fn main() {
EnvCredentialProviderBuilder::new()
    .username_var("DB_USER")
    .password_var("DB_PASS")
    .credential_type(EnvCredentialType::UsernamePassword)
    .build()
}

Environment Variable Mappings:

PresetUsernamePasswordAccess KeySecret KeySession Token
postgres_default()PGUSERPGPASSWORD---
mysql_default()MYSQL_USERMYSQL_PASSWORD---
mssql_default()MSSQL_USERMSSQL_PASSWORD---
aws_default()--AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_SESSION_TOKEN
with_prefix("X")X_USERNAMEX_PASSWORDX_ACCESS_KEYX_SECRET_KEYX_SESSION_TOKEN

Configuration (TOML):

[storage.postgres.mydb]
url = "${EPOCH_POSTGRES_MYDB_URL}"  # Variable substitution

# Or individual fields
host = "db.example.com"
username = "${DB_USER}"
password = "${DB_PASS}"

Refresh Support: Yes (re-reads environment on refresh)


File Provider

Reads credentials from files in multiple formats.

Factory Methods:

#![allow(unused)]
fn main() {
FileCredentialProvider::json("/etc/secrets/db.json")
FileCredentialProvider::yaml("/etc/secrets/db.yaml")
FileCredentialProvider::ini("/etc/secrets/db.ini")
FileCredentialProvider::env("/etc/secrets/.env")
FileCredentialProvider::aws_credentials()  // ~/.aws/credentials
FileCredentialProvider::certificate("/path/cert.pem", "/path/key.pem")
}

Builder Pattern:

#![allow(unused)]
fn main() {
FileCredentialProviderBuilder::new()
    .path("/etc/secrets/creds.json")
    .format(CredentialFileFormat::Json)
    .profile("production")  // For INI files
    .build()
}

Supported Formats:

FormatExtensionExample
JSON.json{"username": "x", "password": "y"}
YAML.yaml, .ymlusername: x
password: y
INI.ini, .cfg[default]
username = x
ENV.envUSERNAME=x
PASSWORD=y
PEM.pem, .crt, .keyCertificate/key files

Configuration (TOML):

[storage.postgres.mydb]
password_file = "/run/secrets/db-password"

# Or JSON credentials file
credentials_file = "/etc/secrets/db.json"

Refresh Support: Yes (re-reads file on refresh)


AWS Provider

Retrieves credentials from AWS services.

Secrets Manager Provider

#![allow(unused)]
fn main() {
SecretsManagerProvider::new("my-secret-id")
    .region("us-east-1")
    .field_mapping(SecretFieldMapping::default())
    .build()
}

Configuration (TOML):

[storage.postgres.mydb]
aws_secret_id = "prod/db/credentials"

[aws]
region = "us-east-1"
PropertyRequiredDescription
aws_secret_idYesSecrets Manager secret ID or ARN
regionNoAWS region (default: from environment)

RDS IAM Provider

Generates temporary IAM authentication tokens for RDS.

#![allow(unused)]
fn main() {
RdsIamProvider::new()
    .host("mydb.cluster-xxx.us-east-1.rds.amazonaws.com")
    .port(5432)
    .username("iam_user")
    .region("us-east-1")
    .build()
}

Configuration (TOML):

[storage.postgres.mydb]
host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com"
use_iam_auth = true
username = "iam_user"
PropertyRequiredDescription
use_iam_authYesEnable RDS IAM authentication
hostYesRDS endpoint
usernameYesIAM database user

Credential Sources

#![allow(unused)]
fn main() {
pub enum AwsCredentialSource {
    /// Default AWS credential chain
    DefaultChain,

    /// Explicit credentials
    Explicit {
        access_key_id: String,
        secret_access_key: String,
        session_token: Option<String>,
    },

    /// Assume a role
    AssumeRole {
        role_arn: String,
        external_id: Option<String>,
        session_name: Option<String>,
        session_tags: Option<HashMap<String, String>>,
    },

    /// Named AWS profile
    Profile { name: String },

    /// Web Identity (OIDC)
    WebIdentity { /* ... */ },
}
}

Refresh Support: Yes (automatic token refresh)


Vault Provider

Retrieves credentials from HashiCorp Vault.

#![allow(unused)]
fn main() {
VaultCredentialProvider::new(VaultConfig {
    addr: "https://vault.example.com:8200".to_string(),
    namespace: Some("my-namespace".to_string()),
    auth: VaultAuthMethod::AppRole {
        role_id: "xxx".to_string(),
        secret_id: "yyy".to_string(),
    },
    tls: Some(VaultTlsConfig {
        ca_cert: Some("/etc/ssl/vault-ca.crt".to_string()),
        client_cert: None,
        client_key: None,
    }),
})
.secret_path("secret/data/mydb")
.secret_type(VaultSecretType::KvV2)
.build()
}

Authentication Methods:

MethodConfiguration
Tokenauth_method = "token", token = "hvs.xxx"
AppRoleauth_method = "approle", role_id, secret_id
Kubernetesauth_method = "kubernetes", role
AWS IAMauth_method = "aws", role
TLSauth_method = "tls", client certificate
UserPassauth_method = "userpass", username, password

Secret Types:

TypeDescription
KvV2Key-Value v2 secrets engine (static secrets)
DatabaseDatabase secrets engine (dynamic credentials)

Configuration (TOML):

[vault]
addr = "https://vault.example.com:8200"
auth_method = "approle"
role_id = "xxx-xxx-xxx"
secret_id_file = "/run/secrets/vault-secret-id"
ca_cert = "/etc/ssl/certs/vault-ca.crt"
namespace = "my-namespace"

[storage.postgres.mydb]
vault_path = "secret/data/mydb"
# Or for dynamic credentials:
vault_role = "my-db-role"
PropertyRequiredDescription
addrYesVault server address
auth_methodYesAuthentication method
vault_pathFor KVPath to KV secret
vault_roleFor DBDatabase role name
namespaceNoVault namespace

Refresh Support: Yes (lease renewal, token refresh)


Vault PKI Provider

Issues X.509 certificates from Vault’s PKI secrets engine.

#![allow(unused)]
fn main() {
VaultPkiProvider::new(vault_config)
    .pki_path("pki/issue/my-role")
    .common_name("myapp.example.com")
    .ttl(Duration::from_secs(86400))
    .dns_sans(vec!["myapp.example.com", "myapp-alt.example.com"])
    .build()
}

Configuration (TOML):

[storage.postgres.secure]
vault_pki_path = "pki/issue/postgres-client"
vault_pki_common_name = "epoch-client"
vault_pki_ttl = "24h"
PropertyRequiredDescription
vault_pki_pathYesPKI issue path
vault_pki_common_nameYesCertificate CN
vault_pki_ttlNoCertificate TTL
vault_pki_dns_sansNoDNS Subject Alt Names
vault_pki_ip_sansNoIP Subject Alt Names

Refresh Support: Yes (automatic certificate renewal)


Web Identity Provider

Uses OIDC tokens for AWS authentication.

#![allow(unused)]
fn main() {
WebIdentityCredentialProvider::new(WebIdentityConfig {
    role_arn: "arn:aws:iam::123456789012:role/MyRole".to_string(),
    web_identity_token_file: Some("/var/run/secrets/token".to_string()),
    session_name: Some("my-session".to_string()),
    provider_url: Some("https://oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E".to_string()),
    duration_seconds: Some(3600),
    policy_arns: None,
    inline_policy: None,
})
.build()
}

Supported OIDC Providers:

ProviderToken Source
GitHub ActionsGITHUB_TOKEN or ACTIONS_ID_TOKEN_REQUEST_*
GitLab CI/CDCI_JOB_JWT
Azure AD / Entra IDAzure managed identity
Google CloudGCP service account
Kubernetes (EKS/GKE)Service account token file
CustomToken file or direct token

Configuration (TOML):

[aws]
role_arn = "arn:aws:iam::123456789012:role/MyRole"
web_identity_token_file = "/var/run/secrets/token"

Refresh Support: Yes (token refresh on expiration)


SSO Provider

Integrates with AWS IAM Identity Center (formerly AWS SSO).

#![allow(unused)]
fn main() {
SsoCredentialProvider::new(SsoConfig {
    start_url: "https://my-sso-portal.awsapps.com/start".to_string(),
    region: "us-east-1".to_string(),
    account_id: "123456789012".to_string(),
    role_name: "MyRole".to_string(),
})
.build()
}

Uses the OAuth 2.0 Device Authorization Grant flow for browser-based authentication.

Configuration (TOML):

[aws.sso]
start_url = "https://my-sso-portal.awsapps.com/start"
region = "us-east-1"
account_id = "123456789012"
role_name = "MyRole"

Refresh Support: Yes (automatic token refresh, cached tokens)


Credential Caching

Wrap any provider with CachedCredentialProvider to reduce external calls.

#![allow(unused)]
fn main() {
// Simple caching
let cached = CachedCredentialProvider::new(inner_provider);

// With configuration
let cached = CachedCredentialProvider::new(inner_provider)
    .with_refresh_buffer(Duration::from_secs(120))   // Refresh 2 min before expiry
    .with_max_cache_age(Duration::from_secs(3600));  // Max 1 hour cache

// Using extension trait
let cached = inner_provider.with_cache();

// Using builder
let cached = inner_provider.cached()
    .refresh_buffer(Duration::from_secs(60))
    .max_cache_age(Duration::from_secs(3600))
    .build();
}

Cache Configuration:

ParameterDefaultDescription
refresh_buffer60sHow long before expiry to refresh
max_cache_ageNoneMax cache age for non-expiring credentials

Cache Operations:

MethodDescription
invalidate()Force cache invalidation
cache_stats()Get cache statistics

CacheStats Fields:

FieldTypeDescription
is_cachedboolWhether credentials are cached
cache_ageOption<Duration>How long cached
time_to_refreshOption<Duration>Time until auto-refresh
is_expiredboolWhether cached credentials expired

Configuration (TOML):

[credentials]
cache_enabled = true
cache_ttl = 300           # 5 minutes
cache_max_size = 100      # Max entries
refresh_before_expiry = 60  # Refresh 60s before expiry

Provider Chains

Chain multiple providers for fallback behavior:

#![allow(unused)]
fn main() {
use std::sync::Arc;

let env_provider = Arc::new(EnvCredentialProvider::postgres_default());
let file_provider = Arc::new(FileCredentialProvider::json("/etc/db.json"));
let vault_provider = Arc::new(VaultCredentialProvider::new(config));

let chain = ChainedCredentialProvider::new(vec![
    env_provider,
    file_provider,
    vault_provider,
]).with_name("my-credential-chain");

// Tries each provider in order until one succeeds
let creds = chain.get_credentials().await?;
}

Chain Behavior:

  • Tries providers in order
  • Returns first successful result
  • Falls through on any error
  • Collects last error if all fail
  • Refresh interval = minimum of all providers

Error Types

CredentialError Variants

VariantDescriptionRetryable
NotFoundCredentials not foundNo
ExpiredCredentials expiredNo
ProviderUnavailableProvider unreachableYes
AuthFailedAuthentication failedNo
NetworkNetwork errorYes
ConfigConfiguration errorNo
FileReadFile read errorNo
FileParseFile parse errorNo
EnvVarNotSetMissing env varNo
EnvVarInvalidInvalid env var valueNo
MissingFieldRequired field missingNo
RefreshNotSupportedRefresh not supportedNo
RefreshFailedRefresh failedMaybe
InvalidFormatInvalid formatNo
CertificateCertificate errorNo
PrivateKeyPrivate key errorNo

Error Helper Methods:

MethodDescription
is_not_found()Check if NotFound
is_expired()Check if Expired
is_retryable()Check if safe to retry

Implementing Custom Providers

Implement the CredentialProvider trait for custom credential sources:

#![allow(unused)]
fn main() {
use async_trait::async_trait;
use horizon_epoch_core::storage::credentials::{
    CredentialProvider, StorageCredentials, CredentialError, CredentialResult,
};
use std::time::Duration;

struct MyProvider {
    config: MyConfig,
}

#[async_trait]
impl CredentialProvider for MyProvider {
    async fn get_credentials(&self) -> CredentialResult<StorageCredentials> {
        // Fetch from your credential source
        let secret = self.fetch_secret().await?;

        Ok(StorageCredentials::username_password(
            secret.username,
            secret.password,
        ).with_ttl(Duration::from_secs(3600)))
    }

    fn requires_refresh(&self) -> bool {
        // Check if credentials need refresh
        false
    }

    async fn refresh(&self) -> CredentialResult<StorageCredentials> {
        // Refresh credentials
        self.get_credentials().await
    }

    fn refresh_interval(&self) -> Option<Duration> {
        // Recommended refresh interval
        Some(Duration::from_secs(300))
    }

    fn provider_name(&self) -> &str {
        "my_provider"
    }
}
}

Security Considerations

Secret Protection

All sensitive values use secrecy::SecretString:

  • Not printed in Debug output
  • Zeroized on drop
  • Explicit .expose_secret() required

Best Practices

  1. Never log credentials - Use structured logging carefully
  2. Use environment variables - Avoid hardcoded credentials
  3. Enable caching - Reduce calls to external services
  4. Check expiration - Always verify before use
  5. Rotate regularly - Use providers that support rotation
  6. Least privilege - Grant minimal required permissions

Secure Configuration Order

  1. Vault/Secrets Manager (production)
  2. Environment variables (containers)
  3. Credential files with proper permissions (Kubernetes secrets)
  4. Never: hardcoded credentials in config files

Feature Flags

Credential providers are enabled via Cargo features:

FeatureProviders Enabled
vaultVaultCredentialProvider, VaultPkiProvider
awsSecretsManagerProvider, RdsIamProvider, SsoProvider, WebIdentityProvider

Core providers (Static, Environment, File) are always available.


Database Backend Credentials

PostgreSQL

PostgreSQL credentials can be provided via:

MethodConfiguration
EnvironmentPGUSER, PGPASSWORD or custom prefix
Connection URLpostgresql://user:pass@host/database
VaultKV secret or database secrets engine
AWS RDS IAMToken-based authentication
AWS Secrets ManagerStored credentials

Example (Environment):

export PGUSER="myuser"
export PGPASSWORD="mypassword"

Example (Vault):

[storage.postgres.prod]
vault_path = "secret/data/postgres/prod"

MySQL

MySQL credentials can be provided via:

MethodConfiguration
EnvironmentMYSQL_USER, MYSQL_PASSWORD or custom prefix
Connection URLmysql://user:pass@host/database
VaultKV secret or database secrets engine
AWS RDS IAMToken-based authentication (Aurora MySQL)
AWS Secrets ManagerStored credentials

Example (Environment):

export MYSQL_USER="myuser"
export MYSQL_PASSWORD="mypassword"

Example (TOML):

[storage.mysql.analytics]
host = "mysql.example.com"
database = "analytics"
username = "${MYSQL_USER}"
password = "${MYSQL_PASSWORD}"

SQL Server (MSSQL)

SQL Server credentials can be provided via:

MethodConfiguration
EnvironmentMSSQL_USER, MSSQL_PASSWORD or custom prefix
Connection URLmssql://user:pass@host/database
VaultKV secret or database secrets engine
Windows AuthenticationIntegrated security (Windows only)
Azure ADToken-based authentication (Azure SQL)

Example (Environment):

export MSSQL_USER="sa"
export MSSQL_PASSWORD="YourPassword123!"

Example (TOML):

[storage.mssql.warehouse]
host = "sqlserver.example.com"
database = "warehouse"
username = "${MSSQL_USER}"
password = "${MSSQL_PASSWORD}"
encrypt = true
trust_cert = false

Example (Windows Authentication):

[storage.mssql.warehouse]
host = "sqlserver.internal"
database = "warehouse"
use_windows_auth = true

SQLite

SQLite typically does not require credentials as it uses file-based storage. However, encrypted SQLite databases (SQLCipher) may require a passphrase.

MethodConfiguration
No credentialsDefault for unencrypted databases
EnvironmentSQLITE_PASSPHRASE for encrypted databases
FilePassphrase file for SQLCipher

Example (Unencrypted):

[storage.sqlite.local]
path = "/data/local.db"
# No credentials needed

Example (Encrypted with SQLCipher):

[storage.sqlite.secure]
path = "/data/secure.db"
passphrase = "${SQLITE_PASSPHRASE}"

See Also

Constraint Reference

Horizon Epoch versions database constraints alongside your data, enabling schema evolution with the same Git-like branching, merging, and conflict detection you use for data.

Constraint Types

Horizon Epoch supports seven constraint types that map to PostgreSQL DDL:

TypeRust StructPostgreSQL DDL
Foreign KeyForeignKeyConstraintALTER TABLE ADD CONSTRAINT ... FOREIGN KEY ... REFERENCES
UniqueUniqueConstraintALTER TABLE ADD CONSTRAINT ... UNIQUE
CheckCheckConstraintALTER TABLE ADD CONSTRAINT ... CHECK (expression)
ExclusionExclusionConstraintALTER TABLE ADD CONSTRAINT ... EXCLUDE USING
IndexIndexDefinitionCREATE INDEX
Enum TypeEnumTypeCREATE TYPE ... AS ENUM
Default ValueDefaultValueALTER COLUMN ... SET DEFAULT

Foreign Key Constraints

Foreign keys define referential integrity between tables.

Structure

#![allow(unused)]
fn main() {
ForeignKeyConstraint {
    name: String,                    // Constraint name
    columns: Vec<String>,            // Local columns
    references_table: String,        // Referenced table
    references_columns: Vec<String>, // Referenced columns
    on_update: ReferentialAction,    // Action on parent update
    on_delete: ReferentialAction,    // Action on parent delete
    deferrable: bool,                // Can be deferred
    initially_deferred: bool,        // Deferred by default
}
}

Referential Actions

ActionDescription
NoActionRaise error if referenced rows exist (default)
RestrictSame as NoAction, checked immediately
CascadeDelete/update child rows automatically
SetNullSet child columns to NULL
SetDefaultSet child columns to default value

PostgreSQL Mapping

-- Generated DDL
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer
    FOREIGN KEY (customer_id) REFERENCES customers (id)
    ON DELETE CASCADE ON UPDATE NO ACTION;

Python Example

from horizon_epoch import Schema, ForeignKeyConstraint

schema = Schema.builder() \
    .add_field("customer_id", "integer") \
    .add_foreign_key(
        name="fk_orders_customer",
        columns=["customer_id"],
        references_table="customers",
        references_columns=["id"],
        on_delete="cascade"
    ) \
    .build()

Unique Constraints

Unique constraints ensure no duplicate values exist in the specified columns.

Structure

#![allow(unused)]
fn main() {
UniqueConstraint {
    name: String,               // Constraint name
    columns: Vec<String>,       // Columns in the constraint
    nulls_distinct: bool,       // PostgreSQL 15+ NULLS NOT DISTINCT
    deferrable: bool,           // Can be deferred
    initially_deferred: bool,   // Deferred by default
    index_method: Option<String>, // Index method (btree, hash)
}
}

PostgreSQL Mapping

-- Simple unique constraint
ALTER TABLE users ADD CONSTRAINT uq_users_email
    UNIQUE (email);

-- Composite unique constraint
ALTER TABLE order_items ADD CONSTRAINT uq_order_product
    UNIQUE (order_id, product_id);

-- PostgreSQL 15+ with NULLS NOT DISTINCT
ALTER TABLE users ADD CONSTRAINT uq_users_phone
    UNIQUE NULLS NOT DISTINCT (phone);

Python Example

schema = Schema.builder() \
    .add_field("email", "text") \
    .add_unique_constraint(
        name="uq_users_email",
        columns=["email"]
    ) \
    .build()

Check Constraints

Check constraints validate that column values satisfy a boolean expression.

Structure

#![allow(unused)]
fn main() {
CheckConstraint {
    name: String,                      // Constraint name
    expression: String,                // SQL boolean expression
    referenced_columns: Vec<String>,   // Columns used in expression
    is_not_null: bool,                 // Is this a NOT NULL constraint?
    no_inherit: bool,                  // PostgreSQL NO INHERIT option
}
}

PostgreSQL Mapping

-- Value range check
ALTER TABLE products ADD CONSTRAINT chk_products_price
    CHECK (price > 0);

-- Pattern check
ALTER TABLE users ADD CONSTRAINT chk_users_email
    CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');

-- NOT NULL (special case)
ALTER TABLE users ALTER COLUMN name SET NOT NULL;

Python Example

schema = Schema.builder() \
    .add_field("price", "numeric") \
    .add_check_constraint(
        name="chk_products_price",
        expression="price > 0"
    ) \
    .build()

Exclusion Constraints

Exclusion constraints prevent overlapping or conflicting values using operators.

Structure

#![allow(unused)]
fn main() {
ExclusionConstraint {
    name: String,                      // Constraint name
    index_method: String,              // Usually "gist" or "btree"
    columns: Vec<String>,              // Columns for display
    elements: Vec<ExclusionElement>,   // Column/operator pairs
    where_clause: Option<String>,      // Partial exclusion
    deferrable: bool,                  // Can be deferred
    initially_deferred: bool,          // Deferred by default
}

ExclusionElement {
    column: String,          // Column name or expression
    operator: String,        // Comparison operator
    opclass: Option<String>, // Operator class
}
}

PostgreSQL Mapping

-- Prevent overlapping time ranges
ALTER TABLE reservations ADD CONSTRAINT excl_reservations_overlap
    EXCLUDE USING gist (
        room_id WITH =,
        tsrange(start_time, end_time) WITH &&
    );

-- Prevent overlapping circles (geometric)
ALTER TABLE zones ADD CONSTRAINT excl_zones_overlap
    EXCLUDE USING gist (area WITH &&);

Python Example

schema = Schema.builder() \
    .add_field("room_id", "integer") \
    .add_field("start_time", "timestamp") \
    .add_field("end_time", "timestamp") \
    .add_exclusion_constraint(
        name="excl_reservations_overlap",
        index_method="gist",
        elements=[
            {"column": "room_id", "operator": "="},
            {"column": "tsrange(start_time, end_time)", "operator": "&&"}
        ]
    ) \
    .build()

Index Definitions

Indexes improve query performance and can enforce uniqueness.

Structure

#![allow(unused)]
fn main() {
IndexDefinition {
    name: String,                        // Index name
    method: IndexMethod,                 // btree, hash, gist, gin, brin
    columns: Vec<IndexColumn>,           // Columns with sort options
    is_unique: bool,                     // UNIQUE index
    where_clause: Option<String>,        // Partial index
    include_columns: Vec<String>,        // Covering index (INCLUDE)
    storage_parameters: HashMap<String, String>, // fillfactor, etc.
    concurrent: bool,                    // CONCURRENTLY option
    if_not_exists: bool,                 // IF NOT EXISTS
}

IndexColumn {
    name: String,                    // Column name or expression
    sort_order: Option<SortOrder>,   // ASC or DESC
    nulls_order: Option<NullsOrder>, // NULLS FIRST or NULLS LAST
    opclass: Option<String>,         // Operator class
}
}

Index Methods

MethodUse Case
btreeDefault; equality and range queries
hashEquality comparisons only
gistGeometric, full-text, range types
ginArrays, JSONB, full-text search
brinLarge tables with natural ordering

PostgreSQL Mapping

-- Basic index
CREATE INDEX idx_users_email ON users (email);

-- Composite index with sort order
CREATE INDEX idx_orders_date_desc ON orders (created_at DESC NULLS LAST);

-- Partial index
CREATE INDEX idx_orders_pending ON orders (status)
    WHERE status = 'pending';

-- Covering index (PostgreSQL 11+)
CREATE INDEX idx_users_email_include ON users (email)
    INCLUDE (name, created_at);

-- GIN index for JSONB
CREATE INDEX idx_products_attrs ON products USING gin (attributes);

Python Example

schema = Schema.builder() \
    .add_field("email", "text") \
    .add_index(
        name="idx_users_email",
        columns=[{"name": "email"}],
        method="btree"
    ) \
    .build()

Enum Types

Enum types define a fixed set of allowed values.

Structure

#![allow(unused)]
fn main() {
EnumType {
    name: String,           // Type name
    values: Vec<String>,    // Ordered enum values
    schema: Option<String>, // Schema name (optional)
}
}

PostgreSQL Mapping

-- Create enum type
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered');

-- Add value to enum (PostgreSQL is additive-only for enums)
ALTER TYPE order_status ADD VALUE 'cancelled' AFTER 'delivered';

-- Use in table
ALTER TABLE orders ALTER COLUMN status TYPE order_status
    USING status::order_status;

Enum Diffing

When comparing enum types between branches, Horizon Epoch tracks:

  • Added values: Safe to add in migrations
  • Removed values: Requires data migration (destructive)
  • Reordered values: May affect comparisons
# Check if enum change is safe (additive only)
enum_diff = client.diff_enums("main", "feature-branch")
if enum_diff.is_additive():
    print("Safe to merge - only new values added")
else:
    print("Warning: Enum values removed or reordered")

Python Example

schema = Schema.builder() \
    .add_enum_type(
        name="order_status",
        values=["pending", "processing", "shipped", "delivered"]
    ) \
    .add_field("status", "order_status") \
    .build()

Default Values

Default values specify what value to use when none is provided.

Structure

#![allow(unused)]
fn main() {
enum DefaultValue {
    Literal(String),      // Literal value: 'active', 42, true
    Function(String),     // Function call: gen_random_uuid()
    Sequence(String),     // Sequence: nextval('users_id_seq')
    Expression(String),   // Expression: CURRENT_DATE + INTERVAL '30 days'
    CurrentTimestamp,     // CURRENT_TIMESTAMP / NOW()
    CurrentDate,          // CURRENT_DATE
    CurrentUser,          // CURRENT_USER
    Null,                 // NULL (explicit)
}
}

PostgreSQL Mapping

-- Literal default
ALTER TABLE users ALTER COLUMN status SET DEFAULT 'active';

-- Function default
ALTER TABLE users ALTER COLUMN id SET DEFAULT gen_random_uuid();

-- Timestamp default
ALTER TABLE users ALTER COLUMN created_at SET DEFAULT CURRENT_TIMESTAMP;

-- Expression default
ALTER TABLE subscriptions ALTER COLUMN expires_at
    SET DEFAULT CURRENT_DATE + INTERVAL '30 days';

Determinism

Horizon Epoch tracks whether defaults are deterministic:

Default TypeDeterministicNotes
LiteralYesSame value every time
NullYesAlways NULL
FunctionNoMay return different values
SequenceNoReturns next value
CurrentTimestampNoCurrent time
CurrentDateNoCurrent date
CurrentUserNoDepends on session
ExpressionVariesAnalyzed at runtime

Storage Backend Support

Different storage backends support constraints at different levels:

Constraint Support Matrix

BackendSupport LevelPrimary KeyForeign KeyUniqueCheckIndex
PostgreSQLFullEnforcedEnforcedEnforcedEnforcedEnforced
MySQLFullEnforcedEnforcedEnforcedEnforcedEnforced
SQL ServerFullEnforcedEnforcedEnforcedEnforcedEnforced
SQLitePartialEnforcedOptional*EnforcedEnforcedEnforced
S3/Delta LakeMetadataMerge keyMetadataMetadataMetadataN/A
Azure BlobMetadataMerge keyMetadataMetadataMetadataN/A
GCSMetadataMerge keyMetadataMetadataMetadataN/A
Local FSMetadataMerge keyMetadataMetadataMetadataN/A

* SQLite requires PRAGMA foreign_keys = ON to enforce foreign keys.

PostgreSQL (Full Support)

PostgreSQL provides ConstraintSupportLevel::Full:

Constraint TypeSupportEnforcement
Primary KeyFullEnforced via DDL
Foreign KeyFullEnforced via DDL
UniqueFullEnforced via DDL
CheckFullEnforced via DDL
ExclusionFullEnforced via DDL
IndexFullCreated via DDL
Enum TypeFullCreated via DDL
Default ValueFullApplied via DDL

MySQL (Full Support)

MySQL provides ConstraintSupportLevel::Full:

Constraint TypeSupportNotes
Primary KeyFullEnforced via DDL
Foreign KeyFullEnforced via DDL (InnoDB only)
UniqueFullEnforced via DDL
CheckFullEnforced via DDL (MySQL 8.0.16+)
ExclusionNot SupportedMySQL-specific alternative: unique constraints
IndexFullCreated via DDL
Enum TypeFullNative ENUM type support
Default ValueFullApplied via DDL

SQL Server (Full Support)

SQL Server provides ConstraintSupportLevel::Full:

Constraint TypeSupportNotes
Primary KeyFullEnforced via DDL
Foreign KeyFullEnforced via DDL
UniqueFullEnforced via DDL
CheckFullEnforced via DDL
ExclusionNot SupportedUse unique constraints or triggers
IndexFullCreated via DDL
Enum TypePartialUse CHECK constraints for enum-like behavior
Default ValueFullApplied via DDL

SQLite (Partial Support)

SQLite provides ConstraintSupportLevel::Partial:

Constraint TypeSupportNotes
Primary KeyFullEnforced via DDL
Foreign KeyPartialRequires PRAGMA foreign_keys = ON
UniqueFullEnforced via DDL
CheckFullEnforced via DDL
ExclusionNot SupportedSQLite-specific; not available
IndexFullCreated via DDL
Enum TypeMetadataUse CHECK constraints for validation
Default ValueFullApplied via DDL

S3/Delta Lake (Metadata Only)

S3/Delta Lake provides ConstraintSupportLevel::MetadataOnly:

Constraint TypeSupportNotes
Primary KeyPartialUsed as merge key; uniqueness not enforced
NOT NULLEnforcedVia Arrow schema nullability
Foreign KeyMetadataStored for documentation; no enforcement
UniqueMetadataStored for documentation; duplicates allowed
CheckMetadataStored for documentation; expressions not evaluated
ExclusionMetadataPostgreSQL-specific; not enforced
IndexMetadataDelta Lake handles its own indexing
Enum TypePartialCan use Arrow dictionary encoding
Default ValueMetadataStored for documentation

Azure Blob Storage (Metadata Only)

Azure Blob Storage provides ConstraintSupportLevel::MetadataOnly:

Constraint TypeSupportNotes
Primary KeyPartialUsed as merge key; uniqueness not enforced
NOT NULLEnforcedVia Arrow schema nullability
Foreign KeyMetadataStored for documentation; no enforcement
UniqueMetadataStored for documentation; duplicates allowed
CheckMetadataStored for documentation; expressions not evaluated
ExclusionMetadataNot applicable to blob storage
IndexMetadataNot applicable to blob storage
Enum TypePartialCan use Arrow dictionary encoding
Default ValueMetadataStored for documentation

Google Cloud Storage (Metadata Only)

GCS provides ConstraintSupportLevel::MetadataOnly:

Constraint TypeSupportNotes
Primary KeyPartialUsed as merge key; uniqueness not enforced
NOT NULLEnforcedVia Arrow schema nullability
Foreign KeyMetadataStored for documentation; no enforcement
UniqueMetadataStored for documentation; duplicates allowed
CheckMetadataStored for documentation; expressions not evaluated
ExclusionMetadataNot applicable to object storage
IndexMetadataNot applicable to object storage
Enum TypePartialCan use Arrow dictionary encoding
Default ValueMetadataStored for documentation

Local Filesystem (Metadata Only)

Local filesystem provides ConstraintSupportLevel::MetadataOnly:

Constraint TypeSupportNotes
Primary KeyPartialUsed as merge key; uniqueness not enforced
NOT NULLEnforcedVia Arrow schema nullability
Foreign KeyMetadataStored for documentation; no enforcement
UniqueMetadataStored for documentation; duplicates allowed
CheckMetadataStored for documentation; expressions not evaluated
ExclusionMetadataNot applicable to file storage
IndexMetadataParquet/Delta handles its own indexing
Enum TypePartialCan use Arrow dictionary encoding
Default ValueMetadataStored for documentation

ConstraintSupportLevel Enum

#![allow(unused)]
fn main() {
pub enum ConstraintSupportLevel {
    Full,          // All constraints enforced (PostgreSQL)
    MetadataOnly,  // Constraints stored but not enforced (S3)
    Partial,       // Some constraints work, others don't
    None,          // No constraint support
}

impl ConstraintSupportLevel {
    pub fn can_enforce(&self) -> bool;      // True only for Full
    pub fn can_store(&self) -> bool;        // True for Full, MetadataOnly, Partial
    pub fn can_execute_ddl(&self) -> bool;  // True only for Full
}
}

Cross-Backend Considerations

When working with mixed storage backends:

  1. Constraints are always stored in metadata - Even if not enforced, constraints are tracked
  2. DDL generation targets PostgreSQL - Use PostgresDdlGenerator for migrations
  3. Merge conflicts work everywhere - Constraint conflicts are detected regardless of enforcement
  4. Validation is backend-specific - Check can_enforce() before relying on constraint enforcement

TableConstraints Container

All constraints for a table are grouped in TableConstraints:

#![allow(unused)]
fn main() {
pub struct TableConstraints {
    pub foreign_keys: Vec<ForeignKeyConstraint>,
    pub unique_constraints: Vec<UniqueConstraint>,
    pub check_constraints: Vec<CheckConstraint>,
    pub exclusion_constraints: Vec<ExclusionConstraint>,
    pub indexes: Vec<IndexDefinition>,
    pub enum_types: Vec<EnumType>,
}
}

Key Methods

#![allow(unused)]
fn main() {
// Get all constraints as enum variants
let all: Vec<Constraint> = constraints.all_constraints();

// Find constraint by name
let fk: Option<&ForeignKeyConstraint> = constraints.find_foreign_key("fk_orders_customer");

// Get constraints affecting a column
let affecting: Vec<&Constraint> = constraints.constraints_on_column("customer_id");

// Validate all constraints
let errors: Vec<String> = constraints.validate();
}

See Also

Architecture

System Overview

┌─────────────────────────────────────────────────────────────┐
│                     User Interfaces                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │    CLI      │  │ Python SDK  │  │    REST API         │  │
│  └──────┬──────┘  └──────┬──────┘  └──────────┬──────────┘  │
└─────────┼────────────────┼─────────────────────┼────────────┘
          │                │                     │
          └────────────────┼─────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│                     Rust Core Engine                         │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Version Control Operations                            │ │
│  │  branch, commit, merge, diff, promote                  │ │
│  └────────────────────────────────────────────────────────┘ │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Metadata Layer                                        │ │
│  │  commit graph, branch pointers, table versions         │ │
│  └────────────────────────────────────────────────────────┘ │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Storage Abstraction                                   │ │
│  │  unified interface for all backends                    │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────┬───────────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────────┐
│                    Storage Backends                          │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐  │
│  │ PostgreSQL │ │   MySQL    │ │ SQL Server │ │  SQLite  │  │
│  └────────────┘ └────────────┘ └────────────┘ └──────────┘  │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐  │
│  │  AWS S3    │ │Azure Blob  │ │    GCS     │ │  Local   │  │
│  └────────────┘ └────────────┘ └────────────┘ └──────────┘  │
└─────────────────────────────────────────────────────────────┘

Design Principles

1. Git Semantics

If you know Git, you know Horizon Epoch. We use the same conceptual model:

  • Repositories contain versioned data
  • Branches are independent lines of development
  • Commits are immutable snapshots
  • Merges combine branches with conflict detection

2. Storage Agnostic

Work with data where it lives:

  • No data migration required
  • Each storage backend has an adapter
  • Same operations work across all backends
  • Mix and match storage types

3. Zero-Copy Branching

Creating a branch is instant:

  • No data is duplicated
  • Branch is just a pointer to a commit
  • Changes are tracked incrementally
  • Storage is only used for actual changes

4. Record-Level Tracking

Granular change detection:

  • Track individual record changes (not just files)
  • Field-level conflict detection
  • Efficient storage of deltas
  • Precise merge resolution

5. Metadata Separation

Version control metadata is separate from data:

  • Metadata stored in dedicated PostgreSQL database
  • Data stays in original storage
  • No modification to existing data infrastructure
  • Easy to add/remove without affecting data

Component Details

Rust Core Engine

The heart of Horizon Epoch, written in Rust for:

  • Performance and memory safety
  • Concurrent operations
  • Cross-platform compatibility

Key modules:

  • operations/ - Branch, commit, merge, diff logic
  • metadata/ - Commit graph, branch management
  • storage/ - Storage adapter traits and implementations

Metadata Layer

Stores all versioning information:

┌─────────────────────────────────────────┐
│           Metadata Database              │
│  ┌─────────────┐  ┌─────────────────┐   │
│  │ Repositories│  │ Branches        │   │
│  └─────────────┘  └─────────────────┘   │
│  ┌─────────────┐  ┌─────────────────┐   │
│  │ Commits     │  │ Tables          │   │
│  └─────────────┘  └─────────────────┘   │
│  ┌─────────────┐  ┌─────────────────┐   │
│  │ Tags        │  │ Change Tracking │   │
│  └─────────────┘  └─────────────────┘   │
└─────────────────────────────────────────┘

Storage Adapters

Each adapter implements a common trait:

#![allow(unused)]
fn main() {
trait StorageAdapter {
    fn read_records(&self, table: &str, keys: &[RecordKey]) -> Result<Vec<Record>>;
    fn write_records(&self, table: &str, records: &[Record]) -> Result<()>;
    fn delete_records(&self, table: &str, keys: &[RecordKey]) -> Result<()>;
    fn get_schema(&self, table: &str) -> Result<Schema>;
    fn list_tables(&self) -> Result<Vec<String>>;
    // ... more methods
}
}

Python Bindings (PyO3)

Python SDK wraps Rust core via PyO3:

  • Zero-copy data exchange where possible
  • Async support via tokio
  • Pythonic API design
  • Type hints for IDE support

REST API (FastAPI)

HTTP API for external integrations:

  • OpenAPI documentation
  • Authentication/authorization
  • Rate limiting
  • Webhook support

CLI

Command-line interface for developers:

  • Git-like commands
  • Interactive prompts
  • JSON output option
  • Shell completion

Data Flow

Commit Operation

1. User: epoch commit -m "Update users"
         │
2. CLI:  Parse command, validate
         │
3. Core: Identify changed records
         │
4. Core: Create commit metadata
         │
5. Core: Store change delta
         │
6. Meta: Update branch pointer
         │
7. CLI:  Return success + commit ID

Merge Operation

1. User: epoch merge feature/branch
         │
2. Core: Find merge base (common ancestor)
         │
3. Core: Compute changes from base to source
         │
4. Core: Compute changes from base to target
         │
5. Core: Identify conflicts
         │
    ┌────┴────┐
    │         │
 No conflicts  Has conflicts
    │         │
    ▼         ▼
6a. Apply    6b. Report
    changes       conflicts
    │         │
7a. Create   7b. Wait for
    merge         resolution
    commit

Query Operation

1. User: SELECT * FROM users WHERE id = 42
         │
2. Core: Determine current branch
         │
3. Core: Check branch overlay for record
         │
    ┌────┴────┐
    │         │
 Found in    Not in
 overlay     overlay
    │         │
    ▼         ▼
4a. Return  4b. Query
    overlay      base data
    record       │
         ┌──────┴──────┐
         │             │
      Found         Not found
         │             │
         ▼             ▼
      Return        Return
      record        empty

Scalability Considerations

Horizontal Scaling

  • Metadata database can be replicated
  • Storage backends scale independently
  • API servers are stateless

Performance Optimization

  • Connection pooling for databases
  • Lazy loading of large datasets
  • Caching of frequently accessed data
  • Batch operations for bulk changes

Large Repository Support

  • Incremental operations
  • Streaming for large diffs
  • Pagination for history
  • Sparse checkout support

Security Model

Authentication

  • Multiple auth methods supported
  • Credential providers for secrets
  • Token-based API access

Authorization

  • Repository-level permissions
  • Branch protection rules
  • Audit logging

Data Protection

  • TLS for all connections
  • Encryption at rest (via storage backends)
  • Credential encryption

Future Architecture

Planned enhancements:

  • Distributed metadata for global deployments
  • Real-time collaboration features
  • Advanced caching layer
  • Plugin system for custom adapters

Storage Adapters

Storage adapters enable Horizon Epoch to work with different data storage systems while providing a unified interface for version control operations.

Supported Backends

Horizon Epoch supports 8 storage backends:

BackendStatusConstraint SupportUse Case
PostgreSQLProductionFullTransactional databases
MySQLProductionFullTransactional databases
Microsoft SQL ServerProductionFullEnterprise databases
SQLiteProductionPartialLocal/embedded databases
S3/Delta LakeProductionMetadata OnlyData lakes, analytics
Azure Blob StorageProductionMetadata OnlyCloud data lakes
Google Cloud StorageProductionMetadata OnlyCloud data lakes
Local FilesystemProductionMetadata OnlyDevelopment, testing

Adapter Architecture

┌─────────────────────────────────────────────────────────────┐
│                  Storage Abstraction Layer                   │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │              StorageAdapter Trait                       │ │
│  │  - read_records()    - write_records()                 │ │
│  │  - delete_records()  - get_schema()                    │ │
│  │  - list_tables()     - scan_table()                    │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
        │              │              │              │
        ▼              ▼              ▼              ▼
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ PostgreSQL │ │   MySQL    │ │    MSSQL   │ │   SQLite   │
│  Adapter   │ │  Adapter   │ │   Adapter  │ │  Adapter   │
└────────────┘ └────────────┘ └────────────┘ └────────────┘
        │              │              │              │
        ▼              ▼              ▼              ▼
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ S3/Delta   │ │   Azure    │ │    GCS     │ │  LocalFS   │
│  Adapter   │ │   Adapter  │ │  Adapter   │ │  Adapter   │
└────────────┘ └────────────┘ └────────────┘ └────────────┘

Common Interface

All storage adapters implement these core operations:

Record Operations

#![allow(unused)]
fn main() {
// Read specific records by primary key
fn read_records(
    &self,
    table: &str,
    keys: &[RecordKey],
    branch_context: &BranchContext
) -> Result<Vec<Record>>;

// Write records (insert or update)
fn write_records(
    &self,
    table: &str,
    records: &[Record],
    branch_context: &BranchContext
) -> Result<WriteResult>;

// Delete records by primary key
fn delete_records(
    &self,
    table: &str,
    keys: &[RecordKey],
    branch_context: &BranchContext
) -> Result<DeleteResult>;
}

Schema Operations

#![allow(unused)]
fn main() {
// Get table schema
fn get_schema(&self, table: &str) -> Result<Schema>;

// Detect schema changes
fn compare_schema(
    &self,
    table: &str,
    old_schema: &Schema
) -> Result<SchemaChanges>;
}

Table Operations

#![allow(unused)]
fn main() {
// List available tables
fn list_tables(&self) -> Result<Vec<TableInfo>>;

// Scan entire table (for initial import or full sync)
fn scan_table(
    &self,
    table: &str,
    options: ScanOptions
) -> Result<RecordStream>;

// Get table statistics
fn get_table_stats(&self, table: &str) -> Result<TableStats>;
}

Relational Database Adapters

PostgreSQL Adapter

Capabilities:

  • Direct SQL execution
  • ACID transactions
  • Full schema introspection
  • Constraint enforcement (foreign keys, unique, check, exclusion)
  • Connection pooling
  • TLS/SSL secure connections

Configuration:

[storage.postgresql.mydb]
url = "postgresql://user:pass@host/database"
pool_size = 10
connect_timeout = 30
statement_timeout = 60000  # 60 seconds
ssl_mode = "require"       # disable, prefer, require

MySQL Adapter

Capabilities:

  • Full CRUD operations
  • Schema introspection via information_schema
  • Full constraint support
  • 5 SSL/TLS modes
  • SSH tunnel support
  • Connection pooling

Configuration:

[storage.mysql.mydb]
url = "mysql://user:pass@host/database"
pool_size = 10
ssl_mode = "required"  # disabled, preferred, required, verify_ca, verify_identity

Microsoft SQL Server Adapter

Capabilities:

  • Full CRUD operations
  • Schema introspection via sys.* catalog
  • Full constraint support
  • TLS/SSL encryption (3 levels)
  • Windows and SQL Server authentication

Configuration:

[storage.mssql.mydb]
url = "mssql://user:pass@host/database"
pool_size = 10
encrypt = "required"  # off, on, required
trust_server_certificate = false

SQLite Adapter

Capabilities:

  • File-based and in-memory databases
  • Schema introspection via PRAGMA
  • Partial constraint support (foreign keys require PRAGMA foreign_keys = ON)
  • WAL journal mode for concurrency
  • Connection pooling with health checks

Configuration:

[storage.sqlite.local]
path = "/path/to/database.db"
# Or use in-memory:
# path = ":memory:"
journal_mode = "wal"
foreign_keys = true

Object Storage Adapters

All object storage adapters use Delta Lake format for transactional semantics.

S3/Delta Lake Adapter

Capabilities:

  • Delta Lake protocol support
  • Parquet file format
  • Time travel via Delta versioning
  • Schema evolution
  • Efficient columnar storage
  • AWS S3 and S3-compatible (MinIO, etc.)

Configuration:

[storage.s3.datalake]
bucket = "company-datalake"
region = "us-east-1"
prefix = "horizon-epoch/"
endpoint = "https://s3.amazonaws.com"  # Optional, for S3-compatible
delta_log_retention_days = 30

Azure Blob Storage Adapter

Capabilities:

  • Delta Lake format on Azure Blob
  • Multiple authentication methods:
    • Account key
    • SAS token
    • Service principal
    • Managed identity
  • Copy-on-write semantics

Configuration:

[storage.azure.datalake]
account = "mystorageaccount"
container = "data"
prefix = "horizon-epoch/"
auth_method = "account_key"  # account_key, sas_token, service_principal, managed_identity

Google Cloud Storage Adapter

Capabilities:

  • Delta Lake format on GCS
  • Authentication:
    • Service account key
    • Application Default Credentials (ADC)
  • Copy-on-write semantics

Configuration:

[storage.gcs.datalake]
bucket = "my-gcs-bucket"
prefix = "horizon-epoch/"
project_id = "my-project"
auth_method = "service_account"  # service_account, adc

Local Filesystem Adapter

Capabilities:

  • Delta Lake format on local disk
  • SQL query execution via DataFusion
  • Ideal for development and testing
  • Copy-on-write semantics

Configuration:

[storage.local.dev]
path = "/data/horizon-epoch"

Constraint Support Levels

LevelDescriptionBackends
FullConstraints enforced via DDLPostgreSQL, MySQL, MSSQL
PartialSome constraints require configSQLite
Metadata OnlyStored but not enforcedS3, Azure, GCS, LocalFS

For object storage backends, constraints are versioned in metadata but enforcement happens at the application layer or when promoting to a relational database.


Adapter Selection

Horizon Epoch selects adapters based on table location URL scheme:

# PostgreSQL table
client.track_table("users", StorageLocation.postgresql("main", "public", "users"))

# MySQL table
client.track_table("orders", StorageLocation.mysql("main", "mydb", "orders"))

# S3 Delta table
client.track_table("events", StorageLocation.s3("bucket", "delta/events"))

# Local filesystem
client.track_table("test_data", StorageLocation.local("/data/test"))

Performance Characteristics

OperationPostgreSQLMySQLMSSQLSQLiteS3/DeltaAzure/GCS
Point readFastFastFastFastMediumMedium
Range scanFastFastFastFastFastFast
Write singleFastFastFastFastSlowSlow
Write batchFastFastFastMediumFastFast
Branch createInstantInstantInstantInstantInstantInstant
Full scanDependsDependsDependsFastVery FastVery Fast

Best Practices

Relational Databases (PostgreSQL, MySQL, MSSQL)

  1. Ensure proper indexes on primary keys and commonly queried columns
  2. Use connection pooling for high-throughput workloads
  3. Configure appropriate timeouts for long-running operations
  4. Monitor overlay table size and compact periodically

SQLite

  1. Use WAL mode for better concurrency (journal_mode = "wal")
  2. Enable foreign keys explicitly if needed (foreign_keys = true)
  3. Use file-based for persistence, in-memory for testing

Object Storage (S3, Azure, GCS, Local)

  1. Use appropriate file sizes (target 100MB-1GB per file)
  2. Partition large tables for efficient queries
  3. Enable predicate pushdown for filtered scans
  4. Run OPTIMIZE periodically for better read performance
  5. Configure retention for Delta log cleanup

Troubleshooting

Connection Issues

PostgreSQL/MySQL/MSSQL:

  • Verify network connectivity and firewall rules
  • Check SSL/TLS configuration matches server requirements
  • Ensure connection pool size is appropriate for workload

Object Storage:

  • Verify credentials and permissions
  • Check bucket/container exists and is accessible
  • Ensure network allows access to storage endpoints

Performance Issues

Slow Queries:

  • Add indexes for overlay joins (relational)
  • Analyze query plans
  • Consider materializing branches for read-heavy workloads

Slow Writes:

  • Batch writes together (especially for object storage)
  • Check network latency to storage
  • Use regional endpoints for cloud storage

Copy-on-Write

Copy-on-Write (CoW) is the fundamental mechanism that enables Horizon Epoch to create instant branches and track changes efficiently.

Overview

When you create a branch, no data is copied. Instead:

  1. The branch is created as a pointer to the current commit
  2. When you modify a record, only that record is copied
  3. Queries combine the original data with branch-specific changes

How It Works

Initial State

main branch:
┌─────────────────────────────────────────┐
│              users table                 │
│  id │ name    │ email                   │
│─────┼─────────┼─────────────────────────│
│  1  │ Alice   │ alice@example.com       │
│  2  │ Bob     │ bob@example.com         │
│  3  │ Charlie │ charlie@example.com     │
└─────────────────────────────────────────┘

Branch Creation (Zero-Copy)

epoch branch create feature/update-emails

                     ┌─── feature/update-emails
                     │    (pointer to same data)
                     ▼
main ───────────────►┌─────────────────────────────────────────┐
                     │              users table                 │
                     │  (shared, no copy)                      │
                     └─────────────────────────────────────────┘

No data is copied. Both branches point to the same underlying data.

First Modification (Copy-on-Write)

# On feature/update-emails branch
client.execute("UPDATE users SET email = 'alice@company.com' WHERE id = 1")

Now the modified record is stored separately:

main branch:                    feature/update-emails:
┌───────────────────────┐       ┌─────────────────────────────┐
│     users (base)      │       │     users (overlay)         │
│  id │ name  │ email   │       │  id │ name  │ email         │
│─────┼───────┼─────────│       │─────┼───────┼───────────────│
│  1  │ Alice │ alice@  │       │  1  │ Alice │ alice@company │
│  2  │ Bob   │ bob@    │       └─────────────────────────────┘
│  3  │ Charl │ charlie@│       Only modified records stored
└───────────────────────┘

Querying a Branch

When you query the feature branch:

SELECT * FROM users;

Horizon Epoch:

  1. Checks the branch overlay for each record
  2. Returns overlay version if found
  3. Returns base version otherwise

Result on feature/update-emails:

│  1  │ Alice   │ alice@company.com    │  ← from overlay
│  2  │ Bob     │ bob@example.com      │  ← from base
│  3  │ Charlie │ charlie@example.com  │  ← from base

Storage Layout: Before and After Modifications

Understanding how data is physically stored helps you reason about performance and storage costs.

PostgreSQL Implementation

Horizon Epoch uses version tables with a naming convention {table}__v_{version_id}:

State 0: Initial Setup (Main Branch Only)

┌─────────────────────────────────────────────────────────────────────────┐
│  Database: my_database                                                   │
│                                                                          │
│  users (base table)                                                      │
│  ┌──────┬─────────┬─────────────────────┬──────────┐                    │
│  │  id  │  name   │       email         │  status  │                    │
│  ├──────┼─────────┼─────────────────────┼──────────┤                    │
│  │  1   │ Alice   │ alice@example.com   │ active   │                    │
│  │  2   │ Bob     │ bob@example.com     │ active   │                    │
│  │  3   │ Charlie │ charlie@example.com │ inactive │                    │
│  └──────┴─────────┴─────────────────────┴──────────┘                    │
│                                                                          │
│  Total storage: ~3 records                                               │
└─────────────────────────────────────────────────────────────────────────┘

State 1: Branch Created (Zero-Copy)

epoch branch create feature/update-users
┌─────────────────────────────────────────────────────────────────────────┐
│  Database: my_database                                                   │
│                                                                          │
│  users (base table)                        users__v_feature-update-users │
│  ┌──────┬─────────┬─────────────────────┐  ┌───────────────────────────┐│
│  │  id  │  name   │       email         │  │      (empty table)        ││
│  ├──────┼─────────┼─────────────────────┤  │                           ││
│  │  1   │ Alice   │ alice@example.com   │  │  Created with same schema ││
│  │  2   │ Bob     │ bob@example.com     │  │  using LIKE ... INCLUDING ││
│  │  3   │ Charlie │ charlie@example.com │  │  ALL (copies constraints) ││
│  └──────┴─────────┴─────────────────────┘  └───────────────────────────┘│
│                                                                          │
│  Total storage: ~3 records (0 additional for branch)                     │
│  Branch creation time: O(1) - just creates empty table                   │
└─────────────────────────────────────────────────────────────────────────┘

State 2: First Modification (Copy-on-Write Triggers)

# On feature branch: Update Alice's email
client.execute("UPDATE users SET email = 'alice@company.com' WHERE id = 1")
┌─────────────────────────────────────────────────────────────────────────┐
│  Database: my_database                                                   │
│                                                                          │
│  users (base table)                        users__v_feature-update-users │
│  ┌──────┬─────────┬─────────────────────┐  ┌──────┬───────┬────────────┐│
│  │  id  │  name   │       email         │  │  id  │ name  │   email    ││
│  ├──────┼─────────┼─────────────────────┤  ├──────┼───────┼────────────┤│
│  │  1   │ Alice   │ alice@example.com   │  │  1   │ Alice │ alice@co.. ││
│  │  2   │ Bob     │ bob@example.com     │  └──────┴───────┴────────────┘│
│  │  3   │ Charlie │ charlie@example.com │                                │
│  └──────┴─────────┴─────────────────────┘  Only 1 record copied!        │
│                                                                          │
│  Total storage: ~4 records (1 additional for branch)                     │
│  Base table: UNCHANGED                                                   │
└─────────────────────────────────────────────────────────────────────────┘

State 3: Multiple Modifications

# More changes on feature branch
client.execute("UPDATE users SET status = 'vip' WHERE id = 1")  # Updates overlay
client.execute("INSERT INTO users VALUES (4, 'Dana', 'dana@company.com', 'active')")
client.execute("DELETE FROM users WHERE id = 2")  # Marks as deleted
┌─────────────────────────────────────────────────────────────────────────┐
│  Database: my_database                                                   │
│                                                                          │
│  users (base table)                        users__v_feature-update-users │
│  ┌──────┬─────────┬─────────────────────┐  ┌──────┬───────┬────────────┐│
│  │  id  │  name   │       email         │  │  id  │ name  │   status   ││
│  ├──────┼─────────┼─────────────────────┤  ├──────┼───────┼────────────┤│
│  │  1   │ Alice   │ alice@example.com   │  │  1   │ Alice │ vip (mod)  ││
│  │  2   │ Bob     │ bob@example.com     │  │  4   │ Dana  │ active(new)││
│  │  3   │ Charlie │ charlie@example.com │  │  2   │ NULL  │ _DELETED_  ││
│  └──────┴─────────┴─────────────────────┘  └──────┴───────┴────────────┘│
│                                                                          │
│  Total storage: ~6 records (3 additional for branch)                     │
│  Main branch queries: See only original 3 records                        │
│  Feature branch queries: See Alice(mod) + Charlie + Dana = 3 records     │
└─────────────────────────────────────────────────────────────────────────┘

S3/Delta Lake Implementation

Delta Lake uses a similar pattern with separate directories:

State 0: Initial Table

s3://horizon-data/
└── repositories/my-repo/
    └── tables/
        └── users/                          # Base table (main branch)
            ├── _delta_log/
            │   └── 00000000000000000000.json   # Transaction log
            ├── part-00000-abc123.parquet       # Data file 1
            └── part-00001-def456.parquet       # Data file 2

Total: ~100MB (all records)

State 1: Branch Created

s3://horizon-data/
└── repositories/my-repo/
    └── tables/
        ├── users/                          # Base table (unchanged)
        │   ├── _delta_log/
        │   └── *.parquet
        │
        └── users__v_feature-xyz/           # Version table (empty)
            └── _delta_log/
                └── 00000000000000000000.json   # Schema only, no data files

Additional storage: ~1KB (just metadata)

State 2: After Modifications

s3://horizon-data/
└── repositories/my-repo/
    └── tables/
        ├── users/                          # Base table (unchanged: 100MB)
        │   ├── _delta_log/
        │   └── *.parquet
        │
        └── users__v_feature-xyz/           # Version table
            ├── _delta_log/
            │   ├── 00000000000000000000.json   # Initial schema
            │   └── 00000000000000000001.json   # Insert/update transaction
            └── part-00000-xyz789.parquet       # Only modified records (~1MB)

Additional storage: ~1MB (only changed records)

Query Overlay Mechanism

The query overlay mechanism is the heart of CoW - it makes branched data appear seamless to applications.

The UNION ALL + NOT EXISTS Pattern

Horizon Epoch uses a highly optimized SQL pattern that leverages PostgreSQL’s query planner:

-- Overlay query for feature branch
SELECT * FROM schema.users__v_feature_branch
UNION ALL
SELECT b.* FROM schema.users b
WHERE NOT EXISTS (
    SELECT 1 FROM schema.users__v_feature_branch v
    WHERE b.id = v.id  -- Primary key match
)

How it works:

  1. First SELECT: Returns ALL records from the version table (modified/new records)
  2. UNION ALL: Combines without deduplication (faster than UNION)
  3. Second SELECT: Returns base records WHERE they DON’T exist in version table
  4. NOT EXISTS: Anti-join pattern - highly optimized in PostgreSQL

Query Flow Diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                        SELECT * FROM users                               │
│                        (on feature branch)                               │
└─────────────────────────────────┬───────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  Horizon Epoch Query Rewriter                                            │
│  ┌────────────────────────────────────────────────────────────────────┐ │
│  │  1. Detect current branch context                                  │ │
│  │  2. Identify version table: users__v_feature_branch                │ │
│  │  3. Rewrite query with UNION ALL pattern                           │ │
│  └────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────┬───────────────────────────────────────┘
                                  │
                    ┌─────────────┴─────────────┐
                    │                           │
                    ▼                           ▼
┌──────────────────────────────┐  ┌──────────────────────────────────────┐
│   Version Table Scan         │  │   Base Table Scan with Anti-Join     │
│   users__v_feature_branch    │  │   users WHERE NOT EXISTS (version)   │
│                              │  │                                      │
│   Returns: Modified records  │  │   Returns: Unmodified records        │
│   (Alice with new email)     │  │   (Bob, Charlie - unchanged)         │
└──────────────┬───────────────┘  └──────────────────┬───────────────────┘
               │                                      │
               └──────────────┬───────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                        UNION ALL Result                                  │
│  ┌──────┬─────────┬─────────────────────┐                               │
│  │  id  │  name   │       email         │                               │
│  ├──────┼─────────┼─────────────────────┤                               │
│  │  1   │ Alice   │ alice@company.com   │  ← from version table         │
│  │  2   │ Bob     │ bob@example.com     │  ← from base (not in version) │
│  │  3   │ Charlie │ charlie@example.com │  ← from base (not in version) │
│  └──────┴─────────┴─────────────────────┘                               │
└─────────────────────────────────────────────────────────────────────────┘

Composite Primary Key Handling

For tables with composite primary keys, the pattern adapts:

-- Table with composite PK (order_id, line_item_id)
SELECT * FROM schema.order_lines__v_feature_branch
UNION ALL
SELECT b.* FROM schema.order_lines b
WHERE NOT EXISTS (
    SELECT 1 FROM schema.order_lines__v_feature_branch v
    WHERE b.order_id = v.order_id
      AND b.line_item_id = v.line_item_id
)

Filtered Queries

When you add a WHERE clause, it’s applied to both parts:

-- Original query
SELECT * FROM users WHERE status = 'active'

-- Rewritten overlay query
SELECT * FROM schema.users__v_feature_branch
WHERE status = 'active'
UNION ALL
SELECT b.* FROM schema.users b
WHERE b.status = 'active'
  AND NOT EXISTS (
    SELECT 1 FROM schema.users__v_feature_branch v
    WHERE b.id = v.id
)

Query Optimization Details

Point Lookups (O(1))

SELECT * FROM users WHERE id = 1

Optimized path:

  1. Check version table first (indexed lookup)
  2. If found → return immediately
  3. If not found → check base table
  4. Return result
-- Horizon Epoch generates:
SELECT * FROM schema.users__v_feature_branch WHERE id = 1
UNION ALL
SELECT * FROM schema.users
WHERE id = 1
  AND NOT EXISTS (
    SELECT 1 FROM schema.users__v_feature_branch WHERE id = 1
)

With proper indexing, this is two index lookups - very fast.

Full Table Scans

SELECT * FROM users

Performance depends on overlay size:

  • Empty overlay: ~0% overhead (NOT EXISTS returns quickly)
  • Small overlay (< 1% of base): ~5-10% overhead
  • Large overlay (> 10% of base): Consider materialization

Aggregations

SELECT status, COUNT(*) FROM users GROUP BY status

Aggregations work correctly across the union:

SELECT status, COUNT(*) FROM (
    SELECT status FROM schema.users__v_feature_branch
    UNION ALL
    SELECT b.status FROM schema.users b
    WHERE NOT EXISTS (...)
) combined
GROUP BY status

S3/Delta Lake Query Pattern

For Delta Lake tables, Horizon Epoch uses DataFusion SQL engine:

-- Register tables in DataFusion context
CREATE EXTERNAL TABLE base STORED AS DELTA LOCATION 's3://bucket/users/'
CREATE EXTERNAL TABLE branch STORED AS DELTA LOCATION 's3://bucket/users__v_feature/'

-- Query with overlay
SELECT * FROM branch
UNION ALL
SELECT * FROM base
WHERE id NOT IN (SELECT id FROM branch)

Performance Benchmarks

ScenarioBase QueryOverlay QueryOverhead
Point lookup (indexed)0.5ms0.6ms+20%
Range scan (1000 rows)5ms5.5ms+10%
Full scan (1M rows, 1K overlay)500ms520ms+4%
Full scan (1M rows, 100K overlay)500ms600ms+20%
Aggregation (1M rows)200ms220ms+10%

Key insight: Overhead is proportional to overlay size, not base table size.

Index Recommendations

For optimal overlay query performance:

-- Version tables automatically inherit indexes via LIKE ... INCLUDING ALL
-- But you may want to ensure these exist:

-- Primary key index (automatic)
CREATE INDEX ON users__v_feature_branch (id);

-- Covering indexes for common queries
CREATE INDEX ON users__v_feature_branch (status) INCLUDE (name, email);

Benefits

1. Instant Branch Creation

Creating a branch takes constant time regardless of data size:

  • 1 row table: ~1ms
  • 1 billion row table: ~1ms

2. Storage Efficiency

Only store what changes:

  • Base: 1TB
  • Branch with 1000 changes: ~1MB additional

3. Fast Merging

Merge only examines changed records:

  • Base: 1 billion rows
  • Changes: 1000 rows
  • Merge compares: 1000 rows (not 1 billion)

4. Time Travel

Every commit preserves the overlay state:

  • Query any historical version
  • No additional storage for old versions
  • Instant access to history

Considerations

Query Performance

Overlaid queries have slight overhead:

  • Must check overlay before/after base
  • Overhead is proportional to overlay size
  • Negligible for typical branch sizes

For large overlays:

# Materialize long-lived branches
client.materialize_branch("long-running-branch")

Garbage Collection

Old overlays must be cleaned up:

# Merged branches can have overlays removed
client.gc_branch("merged-feature")

# Or automatic cleanup
config = Config(
    gc_merged_branches=True,
    gc_retention_days=7
)

Deep Branch Hierarchies

main → dev → feature → sub-feature → experiment

Each level adds overlay resolution. Keep hierarchies shallow.

Comparison with Full Copy

AspectCopy-on-WriteFull Copy
Branch creationO(1)O(n)
Storage per branchO(changes)O(data)
Query (point)O(1)O(1)
Query (scan)O(n + overlay)O(n)
MergeO(changes)O(n)

Best Practices

  1. Keep branches short-lived - Smaller overlays are faster
  2. Merge frequently - Reduces overlay accumulation
  3. Monitor overlay size - Large overlays may need materialization
  4. Use appropriate indexes - Index overlay tables for performance
  5. Clean up merged branches - Run garbage collection regularly

Merge Algorithm

Horizon Epoch uses a three-way merge algorithm similar to Git, extended for structured data with field-level conflict detection.

Overview

Three-way merge compares:

  1. Base - The common ancestor (merge base)
  2. Ours - The target branch (where we’re merging into)
  3. Theirs - The source branch (what we’re merging)

Visual Overview

                    ┌─────────────────────────────────────────────────────────┐
                    │                  THREE-WAY MERGE                         │
                    └─────────────────────────────────────────────────────────┘

    Source Branch                      Target Branch
    (feature)                          (main)
         │                                  │
         ▼                                  ▼
    ┌─────────┐                        ┌─────────┐
    │Commit E │                        │Commit G │
    │ users:  │                        │ users:  │
    │ Alice→  │                        │ Alice→  │
    │ Smith   │                        │ alice@  │
    └────┬────┘                        │ company │
         │                             └────┬────┘
         │                                  │
         │      ┌──────────────────┐        │
         └──────►    MERGE BASE    ◄────────┘
                │    Commit B      │
                │    users:        │
                │    Alice         │
                │    alice@example │
                └────────┬─────────┘
                         │
                         ▼
                ┌────────────────────┐
                │  MERGE RESULT      │
                │  users:            │
                │  Alice Smith       │  ← from source (name changed)
                │  alice@company.com │  ← from target (email changed)
                └────────────────────┘

Finding the Merge Base

Commit Graph

        C---D---E  (feature)
       /
  A---B---F---G    (main)

The merge base for merging feature into main is commit B - the most recent common ancestor.

Algorithm

def find_merge_base(source_commit, target_commit):
    """Find the lowest common ancestor in the commit DAG."""
    source_ancestors = get_all_ancestors(source_commit)

    # BFS from target to find first intersection
    queue = [target_commit]
    while queue:
        commit = queue.pop(0)
        if commit in source_ancestors:
            return commit
        queue.extend(commit.parents)

    return None  # No common ancestor (shouldn't happen)

Content Hash Optimization

Before performing record-level comparison, Horizon Epoch uses content hashes for quick diff detection:

def should_merge_table(base_version, source_version, target_version):
    """Use content hashes to skip unchanged tables."""
    base_hash = base_version.content_hash
    source_hash = source_version.content_hash
    target_hash = target_version.content_hash

    if source_hash == target_hash:
        # Both branches have identical data - no merge needed
        return MergeAction.SKIP

    if source_hash == base_hash:
        # Only target changed - keep target (fast-forward)
        return MergeAction.KEEP_TARGET

    if target_hash == base_hash:
        # Only source changed - take source
        return MergeAction.TAKE_SOURCE

    # Both changed differently - full record merge required
    return MergeAction.RECORD_LEVEL_MERGE

Three-Way Merge Process

Step 1: Compute Changes

For each table, compute:

  • Changes from base to ours (ours_changes)
  • Changes from base to theirs (theirs_changes)
def compute_changes(base_state, current_state):
    changes = {}

    for record in current_state:
        base_record = base_state.get(record.key)

        if base_record is None:
            changes[record.key] = Change(type=INSERT, record=record)
        elif base_record != record:
            changes[record.key] = Change(type=UPDATE,
                                         base=base_record,
                                         current=record)

    for key in base_state:
        if key not in current_state:
            changes[key] = Change(type=DELETE, base=base_state[key])

    return changes

Step 2: Classify Changes

For each changed record:

OursTheirsResult
No changeChangedAccept theirs
ChangedNo changeKeep ours
Same changeSame changeKeep either (identical)
Different changesDifferent changesCONFLICT
DeletedChangedCONFLICT
ChangedDeletedCONFLICT
DeletedDeletedKeep deleted

Step 3: Apply Non-Conflicting Changes

def apply_changes(base, ours_changes, theirs_changes):
    result = base.copy()
    conflicts = []

    for key in set(ours_changes.keys()) | set(theirs_changes.keys()):
        ours = ours_changes.get(key)
        theirs = theirs_changes.get(key)

        if ours is None:
            # Only theirs changed
            result.apply(theirs)
        elif theirs is None:
            # Only ours changed - already in result
            pass
        elif ours == theirs:
            # Same change - already in result
            pass
        else:
            # Potential conflict - check field level
            conflict = check_field_conflict(key, ours, theirs)
            if conflict:
                conflicts.append(conflict)
            else:
                result.apply(merge_changes(ours, theirs))

    return result, conflicts

Field-Level Conflict Detection

Unlike file-based version control, Horizon Epoch detects conflicts at the field level. This is one of the most powerful features - it allows more changes to merge automatically.

How Field-Level Merge Works

┌─────────────────────────────────────────────────────────────────────────────┐
│                    FIELD-LEVEL MERGE ALGORITHM                               │
└─────────────────────────────────────────────────────────────────────────────┘

For each field in a record where both branches made changes:

┌──────────────────┬──────────────────┬──────────────────┬────────────────────┐
│   Ancestor (A)   │   Source (S)     │   Target (T)     │   Result           │
├──────────────────┼──────────────────┼──────────────────┼────────────────────┤
│   value_a        │   value_a        │   value_a        │   value_a (same)   │
│   value_a        │   value_s        │   value_a        │   value_s (source) │
│   value_a        │   value_a        │   value_t        │   value_t (target) │
│   value_a        │   value_s        │   value_s        │   value_s (same)   │
│   value_a        │   value_s        │   value_t        │   ⚠️ CONFLICT      │
│   NULL           │   value_s        │   NULL           │   value_s (added)  │
│   NULL           │   NULL           │   value_t        │   value_t (added)  │
│   NULL           │   value_s        │   value_t        │   ⚠️ CONFLICT      │
│   value_a        │   NULL           │   value_a        │   NULL (deleted)   │
│   value_a        │   value_a        │   NULL           │   NULL (deleted)   │
│   value_a        │   NULL           │   value_t        │   ⚠️ CONFLICT      │
└──────────────────┴──────────────────┴──────────────────┴────────────────────┘

Implementation

def field_level_merge(ancestor, source, target, strategy):
    """
    Merge two records field-by-field using three-way merge.
    Returns merged record or conflict information.
    """
    result = {}
    conflicting_fields = []

    # Get all fields across all three versions
    all_fields = set(ancestor.keys()) | set(source.keys()) | set(target.keys())

    for field in all_fields:
        ancestor_val = ancestor.get(field)
        source_val = source.get(field)
        target_val = target.get(field)

        # Case 1: Same in both source and target → no conflict
        if source_val == target_val:
            result[field] = source_val
            continue

        # Case 2: Only source changed from ancestor
        if ancestor_val == target_val and ancestor_val != source_val:
            result[field] = source_val
            continue

        # Case 3: Only target changed from ancestor
        if ancestor_val == source_val and ancestor_val != target_val:
            result[field] = target_val
            continue

        # Case 4: Field added in source only
        if ancestor_val is None and target_val is None and source_val is not None:
            result[field] = source_val
            continue

        # Case 5: Field added in target only
        if ancestor_val is None and source_val is None and target_val is not None:
            result[field] = target_val
            continue

        # Case 6: CONFLICT - both changed differently
        conflicting_fields.append(FieldConflict(
            field_name=field,
            ancestor_value=ancestor_val,
            source_value=source_val,
            target_value=target_val
        ))

    if conflicting_fields:
        return MergeRecordResult.Conflict(
            merged_fields=result,
            conflicting_fields=conflicting_fields
        )

    return MergeRecordResult.Resolved(result)

Example: Successful Auto-Merge

Base record:

{"id": 1, "name": "Alice", "email": "alice@example.com", "status": "active"}

Ours (main):

{"id": 1, "name": "Alice", "email": "alice@company.com", "status": "active"}

Theirs (feature):

{"id": 1, "name": "Alice Smith", "email": "alice@example.com", "status": "inactive"}

Field-by-Field Analysis

┌─────────┬───────────────┬───────────────┬───────────────┬────────────────────┐
│ Field   │ Base          │ Ours (main)   │ Theirs (feat) │ Result             │
├─────────┼───────────────┼───────────────┼───────────────┼────────────────────┤
│ id      │ 1             │ 1             │ 1             │ 1 (unchanged)      │
│ name    │ Alice         │ Alice         │ Alice Smith   │ Alice Smith ✓      │
│         │               │ (unchanged)   │ (changed)     │ (take theirs)      │
│ email   │ alice@example │ alice@company │ alice@example │ alice@company ✓    │
│         │               │ (changed)     │ (unchanged)   │ (keep ours)        │
│ status  │ active        │ active        │ inactive      │ inactive ✓         │
│         │               │ (unchanged)   │ (changed)     │ (take theirs)      │
└─────────┴───────────────┴───────────────┴───────────────┴────────────────────┘

Result (auto-merged):

{"id": 1, "name": "Alice Smith", "email": "alice@company.com", "status": "inactive"}

Example: Field-Level Conflict

If both branches changed the same field to different values:

┌─────────┬───────────────┬────────────────────┬────────────────────┬──────────┐
│ Field   │ Base          │ Ours (main)        │ Theirs (feature)   │ Result   │
├─────────┼───────────────┼────────────────────┼────────────────────┼──────────┤
│ email   │ alice@example │ alice@company.com  │ a.smith@example    │ CONFLICT │
│         │               │ (changed)          │ (changed)          │ ⚠️       │
└─────────┴───────────────┴────────────────────┴────────────────────┴──────────┘

Conflict Types

Horizon Epoch distinguishes several conflict types:

#![allow(unused)]
fn main() {
pub enum RecordConflictType {
    /// Both sides modified the same record differently
    ModifyModify,

    /// Source deleted, target modified
    DeleteModify,

    /// Source modified, target deleted
    ModifyDelete,

    /// Both sides added a record with the same primary key
    AddAdd,

    /// Other edge cases
    Other,
}
}

Each type may require different resolution strategies.

Merge Strategies

Horizon Epoch supports four merge strategies, each suited for different workflows:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         MERGE STRATEGIES                                     │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────┬────────────────────────────────────────────────────────────┐
│ Strategy        │ Behavior                                                    │
├─────────────────┼────────────────────────────────────────────────────────────┤
│ ThreeWay        │ Standard 3-way merge, reports conflicts for resolution     │
│ (default)       │ Use when: You want to review and manually resolve conflicts│
├─────────────────┼────────────────────────────────────────────────────────────┤
│ AcceptSource    │ Auto-resolve conflicts by taking source branch values      │
│ (--strategy     │ Use when: Source branch should "win" (e.g., hotfixes)      │
│  theirs)        │                                                            │
├─────────────────┼────────────────────────────────────────────────────────────┤
│ AcceptTarget    │ Auto-resolve conflicts by keeping target branch values     │
│ (--strategy     │ Use when: Target branch should "win" (preserve main)       │
│  ours)          │                                                            │
├─────────────────┼────────────────────────────────────────────────────────────┤
│ FailOnConflict  │ Abort immediately if any conflict is detected              │
│ (--fail-on-     │ Use when: CI/CD pipelines, automated processes             │
│  conflict)      │                                                            │
└─────────────────┴────────────────────────────────────────────────────────────┘

Default Strategy (ThreeWay)

Field-level three-way merge with conflict detection. Non-conflicting changes merge automatically; conflicts are reported for manual resolution.

epoch merge feature/branch

Accept Source (Theirs) Strategy

On conflicts, automatically choose the source branch (the branch being merged in):

epoch merge feature/branch --strategy theirs

Use cases:

  • Hotfix branches that must override any conflicting changes
  • Feature branches developed by domain experts
  • Emergency deployments

Accept Target (Ours) Strategy

On conflicts, automatically keep the target branch values (the branch you’re merging into):

epoch merge feature/branch --strategy ours

Use cases:

  • Integrating experimental branches where existing data should be preserved
  • Rebasing feature branches onto main
  • Pulling updates while preserving local modifications

Fail on Conflict Strategy

Fail immediately if any conflict is detected. Useful for automated pipelines.

epoch merge feature/branch --fail-on-conflict

Use cases:

  • CI/CD pipelines where conflicts indicate process failures
  • Automated data sync jobs
  • Pre-merge validation checks

Strategy Comparison

ScenarioThreeWayAcceptSourceAcceptTargetFailOnConflict
Non-conflicting changes✓ Merge✓ Merge✓ Merge✓ Merge
Same field, different values⚠️ ReportTake sourceKeep target❌ Abort
Delete vs modify⚠️ ReportDeleteKeep modified❌ Abort
Both add same key⚠️ ReportTake sourceKeep target❌ Abort

Conflict Resolution

Interactive Resolution

epoch conflicts resolve --interactive

Walks through each conflict:

Conflict in users.email (record id=1):
  BASE:   alice@example.com
  OURS:   alice@company.com
  THEIRS: alice.smith@example.com

Choose:
  [o] Keep ours
  [t] Keep theirs
  [b] Keep base
  [c] Custom value
>

Programmatic Resolution

conflicts = client.get_conflicts()

for conflict in conflicts:
    for field in conflict.fields:
        # Custom resolution logic
        if field.name == "email":
            # Business rule: prefer company domain
            if "@company.com" in field.ours_value:
                resolution = field.ours_value
            else:
                resolution = field.theirs_value

            client.resolve_conflict(
                table=conflict.table_name,
                record_id=conflict.record_id,
                field=field.name,
                value=resolution
            )

client.merge_continue()

Schema Merge

Schema changes are also merged:

Compatible Changes

These merge automatically:

  • Adding a column (either branch)
  • Adding an index
  • Changing a default value (non-conflicting)

Conflicting Changes

These require resolution:

  • Same column changed to different types
  • Same column renamed differently
  • Conflicting constraints
Schema conflict in users:
  Column 'status':
    OURS:   VARCHAR(20)
    THEIRS: INTEGER

Choose resolution:
  [o] Keep VARCHAR(20)
  [t] Change to INTEGER
  [c] Custom type
>

Merge Commits

After successful merge, a merge commit is created:

Commit: ghi789
Parents: abc123 (main), def456 (feature)
Message: Merge feature/branch into main

Changes:
  users: 5 records merged, 2 conflicts resolved

Performance

Complexity

  • Finding merge base: O(commit history depth)
  • Computing changes: O(changed records)
  • Conflict detection: O(changed records)

Optimization

  1. Change tracking - Only examine records marked as changed
  2. Parallel comparison - Compare multiple tables concurrently
  3. Early termination - Stop at first conflict if not auto-resolving

Best Practices

  1. Merge frequently - Smaller merges have fewer conflicts
  2. Keep branches focused - One feature per branch reduces conflicts
  3. Communicate on shared data - Coordinate changes to same records
  4. Test after merge - Verify data integrity post-merge
  5. Document conflict resolution - Include context in merge commit

Multi-Backend Architecture

Horizon Epoch is designed to manage data across multiple, heterogeneous storage backends through a unified interface.

Design Philosophy

Separation of Concerns

┌─────────────────────────────────────────────────────────────┐
│                    Horizon Epoch                             │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                 Metadata Layer                          │ │
│  │  - Commits, branches, tags                             │ │
│  │  - Table registrations                                  │ │
│  │  - Change tracking indices                              │ │
│  │  - Version graph                                        │ │
│  └────────────────────────────────────────────────────────┘ │
│                           │                                  │
│                           │ (references, not data)          │
│                           ▼                                  │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                  Storage Layer                          │ │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐  │ │
│  │  │PostgreSQL│ │  MySQL   │ │SQL Server│ │  SQLite  │  │ │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘  │ │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐  │ │
│  │  │  AWS S3  │ │  Azure   │ │   GCS    │ │  Local   │  │ │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘  │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Key insight: The metadata layer stores what changed and when, while storage adapters handle where the data physically lives.

Benefits

  1. No Data Migration - Keep data where it is
  2. Best Tool for Job - Use PostgreSQL for transactional, S3 for analytics
  3. Gradual Adoption - Add version control to existing infrastructure
  4. Unified Operations - Same commands work across all backends

Architecture Components

Storage Registry

Central registry of all configured backends:

config = Config(
    metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
    "prod_users",      # Logical name
    "postgresql://prod/users"
).add_postgres(
    "prod_orders",
    "postgresql://prod/orders"
).add_s3(
    "analytics",
    bucket="company-analytics"
)

Storage Location

Each table has a storage location that identifies:

  1. Protocol - Which adapter to use
  2. Backend Name - Which configured backend
  3. Path - Table identifier within the backend
postgresql://prod_users/public.users
│            │          │
protocol     backend    table path

Table Registration

Tables are registered with their location:

# PostgreSQL table
client.register_table("users", "postgresql://prod_users/public.users")

# S3 Delta table
client.register_table("events", "s3://analytics/delta/events")

Metadata References

Metadata stores references to data, not the data itself:

-- In metadata database
SELECT * FROM epoch_tables;
┌──────────┬────────────────────────────────────────┐
│ name     │ location                                │
├──────────┼────────────────────────────────────────┤
│ users    │ postgresql://prod_users/public.users   │
│ orders   │ postgresql://prod_orders/public.orders │
│ events   │ s3://analytics/delta/events            │
└──────────┴────────────────────────────────────────┘

Cross-Backend Operations

Branching

Branches span all registered tables:

# Creates a branch that covers:
# - users (PostgreSQL)
# - orders (PostgreSQL)
# - events (S3)
client.create_branch("feature/new-reporting")

Each backend maintains its own overlay:

  • PostgreSQL: Overlay tables
  • S3: Separate Delta log

Committing

Commits can include changes from multiple backends:

# Changes to users (PostgreSQL) and events (S3)
# are captured in a single commit
client.commit(message="Update user events schema")

The commit metadata tracks which backends have changes:

{
  "commit_id": "abc123",
  "message": "Update user events schema",
  "changes": {
    "postgresql://prod_users": ["users"],
    "s3://analytics": ["events"]
  }
}

Diffing

Diff operations aggregate across backends:

diff = client.diff("main", "feature/branch")

# Returns changes from all backends
for table_diff in diff.table_diffs:
    print(f"{table_diff.location}: {table_diff.status}")

Merging

Merges are coordinated across backends:

  1. Compute changes per backend
  2. Detect conflicts per backend
  3. Apply changes per backend (in transaction if supported)
  4. Create unified merge commit

Consistency Model

Within a Backend

Operations within a single backend use that backend’s consistency guarantees:

  • PostgreSQL: ACID transactions
  • S3/Delta: Serializable via Delta protocol

Across Backends

Cross-backend operations provide best-effort consistency:

┌─────────────┐     ┌─────────────┐
│ PostgreSQL  │     │     S3      │
│   commit    │     │   commit    │
└──────┬──────┘     └──────┬──────┘
       │                    │
       └────────┬───────────┘
                │
         ┌──────▼──────┐
         │   Metadata  │
         │   Commit    │
         └─────────────┘

If one backend fails:

  • The operation is marked as partial
  • Rollback is attempted where possible
  • User is notified of partial state
try:
    client.commit(message="Multi-backend update")
except PartialCommitError as e:
    print(f"Committed to: {e.successful_backends}")
    print(f"Failed on: {e.failed_backends}")
    # Manual intervention needed

Configuration Patterns

Separate Environments

# Development
[storage.postgres.dev_db]
url = "postgresql://localhost/dev"

# Staging
[storage.postgres.staging_db]
url = "postgresql://staging-db.internal/staging"

# Production
[storage.postgres.prod_db]
url = "postgresql://prod-db.internal/production"
aws_secret_id = "prod-db-credentials"

Mixed Workloads

# Transactional data in PostgreSQL
[storage.postgres.transactional]
url = "postgresql://oltp-db/production"

# Analytics in S3
[storage.s3.analytics]
bucket = "company-analytics"
region = "us-east-1"

# Archive in Glacier
[storage.s3.archive]
bucket = "company-archive"
region = "us-east-1"
storage_class = "GLACIER_IR"

Cross-Region

[storage.s3.us_data]
bucket = "data-us"
region = "us-east-1"

[storage.s3.eu_data]
bucket = "data-eu"
region = "eu-west-1"

Routing and Discovery

Explicit Routing

Specify backend when registering tables:

client.register_table("users", "postgresql://prod_users/public.users")

Pattern-Based Routing

Configure default routing patterns:

[routing]
# Tables starting with "raw_" go to S3
"raw_*" = "s3://analytics"

# Everything else goes to PostgreSQL
"*" = "postgresql://default"

Auto-Discovery

Discover tables from backends:

# List tables in a backend
tables = client.discover_tables("postgresql://prod_db")

# Register discovered tables
for table in tables:
    client.register_table(table.name, table.location)

Performance Considerations

Query Routing

Queries are routed to the appropriate backend:

# Routed to PostgreSQL
client.query("SELECT * FROM users")

# Routed to S3
client.query("SELECT * FROM events")

Cross-Backend Queries

Currently, joins across backends are not supported in a single query. Use application-level joining:

# Query each backend
users = client.query("SELECT * FROM users")
events = client.query("SELECT * FROM events WHERE user_id IN (...)")

# Join in application
result = join(users, events, on="user_id")

Caching

Per-backend connection pooling and caching:

[storage.postgres.prod_db]
pool_size = 20
cache_schema = true

[storage.s3.analytics]
cache_metadata = true
cache_ttl = 300

Limitations

  1. No cross-backend transactions - ACID only within single backend
  2. No cross-backend joins - Query each backend separately
  3. Eventual consistency - Cross-backend commits may be partially applied
  4. Network latency - Operations touch multiple backends

Best Practices

  1. Group related tables - Tables that are often queried together should be in the same backend
  2. Consider latency - Place backends close to where they’re accessed
  3. Plan for failures - Have recovery procedures for partial commits
  4. Monitor all backends - Track health and performance per backend
  5. Document routing - Make it clear which tables are where

Schema Storage Across Backends

Horizon Epoch stores schemas consistently regardless of the underlying storage backend, but the level of enforcement varies. This document explains how schemas are stored, introspected, and enforced across different backends.

Storage Architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                           Metadata Database                               │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │  table_versions                                                     │  │
│  │  ┌──────────────────────────────────────────────────────────────┐  │  │
│  │  │ id | table_id | commit_id | schema (JSONB)                   │  │  │
│  │  │ ----------------------------------------------------------   │  │  │
│  │  │ 1  | users    | abc123    | {"fields": [...], ...}           │  │  │
│  │  │ 2  | users    | def456    | {"fields": [...], ...}           │  │  │
│  │  │ 3  | orders   | abc123    | {"fields": [...], ...}           │  │  │
│  │  └──────────────────────────────────────────────────────────────┘  │  │
│  └────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────┘
                                      │
    ┌─────────────┬─────────────┬─────┴─────┬─────────────┬─────────────┐
    │             │             │           │             │             │
    ▼             ▼             ▼           ▼             ▼             ▼
┌────────┐  ┌────────┐  ┌────────────┐  ┌────────┐  ┌────────┐  ┌────────┐
│  SQL   │  │  SQL   │  │   Object   │  │  File  │  │  File  │  │  File  │
│ (Full) │  │(Part.) │  │  Storage   │  │Storage │  │Storage │  │Storage │
├────────┤  ├────────┤  │ (Metadata) │  │(Meta.) │  │(Meta.) │  │(Meta.) │
│PostgreSQL │MySQL  │  ├────────────┤  ├────────┤  ├────────┤  ├────────┤
│  MySQL  │  │SQLite │  │     S3     │  │ Azure  │  │  GCS   │  │ Local  │
│  MSSQL  │  │       │  │            │  │  Blob  │  │        │  │   FS   │
└────────┘  └────────┘  └────────────┘  └────────┘  └────────┘  └────────┘

Key Principle: Schema is always stored in the metadata database as JSONB. The storage backend may or may not enforce the schema.


PostgreSQL: Full Schema Support

PostgreSQL provides ConstraintSupportLevel::Full - all schema elements are enforced by the database engine.

Schema Storage

Schemas are stored in two places:

  1. Metadata database: JSONB in table_versions.schema
  2. Live database: Actual DDL (tables, constraints, indexes)

Introspection

Horizon Epoch introspects PostgreSQL schemas using system catalogs:

-- Columns from information_schema
SELECT column_name, data_type, is_nullable, column_default
FROM information_schema.columns
WHERE table_name = 'users';

-- Constraints from pg_catalog
SELECT conname, contype, pg_get_constraintdef(oid)
FROM pg_constraint
WHERE conrelid = 'users'::regclass;

-- Indexes from pg_indexes
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users';

-- Enum types from pg_type
SELECT typname, enumlabel
FROM pg_type t
JOIN pg_enum e ON t.oid = e.enumtypid
WHERE typname = 'order_status';

What’s Captured

ElementCapturedEnforcedNotes
Column namesYesYes
Column typesYesYesFull PostgreSQL type mapping
NullabilityYesYesNOT NULL constraints
DefaultsYesYesLiterals, functions, sequences
Primary keysYesYes
Foreign keysYesYesWith referential actions
Unique constraintsYesYesIncluding NULLS NOT DISTINCT
Check constraintsYesYesFull expression capture
Exclusion constraintsYesYesGiST-based
IndexesYesYesB-tree, GIN, GiST, BRIN
Enum typesYesYesWith value order

CLI Examples

# Show schema from PostgreSQL table
epoch schema show users

# Output:
# Table: users (PostgreSQL)
# Source: Live database introspection
#
# Columns:
#   id         SERIAL        PRIMARY KEY
#   email      VARCHAR(255)  NOT NULL
#   name       VARCHAR(255)  NOT NULL
#   status     user_status   DEFAULT 'active'
#   created_at TIMESTAMP     DEFAULT CURRENT_TIMESTAMP
#
# Constraints:
#   pk_users              PRIMARY KEY (id)
#   uq_users_email        UNIQUE (email)
#   chk_users_email       CHECK (email ~ '^[^@]+@[^@]+$')
#   fk_users_org          FOREIGN KEY (org_id) REFERENCES orgs (id)
#
# Indexes:
#   idx_users_email       btree (email)
#   idx_users_status      btree (status) WHERE status = 'active'
#
# Enum Types:
#   user_status           (active, inactive, suspended)

Python Example

import asyncio
from horizon_epoch import Client

async def inspect_schema():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Get constraints for a table
        constraints = await client.get_constraints("users", branch="main")

        print(f"Unique Constraints: {len(constraints.unique_constraints)}")
        print(f"Foreign Keys: {len(constraints.foreign_keys)}")
        print(f"Check Constraints: {len(constraints.check_constraints)}")

asyncio.run(inspect_schema())

S3/Delta Lake: Metadata-Only Support

S3/Delta Lake provides ConstraintSupportLevel::MetadataOnly - schemas are stored for documentation and merge operations but not enforced at the storage level.

Schema Storage

Schemas are stored in:

  1. Metadata database: JSONB in table_versions.schema
  2. Delta Lake: Parquet schema (columns and types only)
  3. Delta Lake table properties: Constraint metadata

What’s Stored vs Enforced

ElementStoredEnforcedNotes
Column namesYesYesVia Arrow/Parquet schema
Column typesYesYesMapped to Arrow types
NullabilityYesPartialArrow schema nullability
DefaultsYesNoStored as metadata only
Primary keysYesNoUsed as merge key, not unique
Foreign keysYesNoStored for documentation
Unique constraintsYesNoDuplicates allowed
Check constraintsYesNoExpressions not evaluated
Exclusion constraintsYesNoPostgreSQL-specific
IndexesYesNoDelta Lake handles indexing
Enum typesYesPartialArrow dictionary encoding

Schema Conversion

Horizon Epoch converts schemas to/from Arrow format:

Horizon Epoch Schema ←→ Apache Arrow Schema ←→ Parquet Files

Type Mapping:

Horizon EpochArrow TypeNotes
int8Int8
int16Int16
int32Int32
int64Int64
float32Float32
float64Float64
decimal(p,s)Decimal128Precision preserved
varchar(n)Utf8Length not enforced
textLargeUtf8
booleanBoolean
timestampTimestampWith timezone
dateDate32
timeTime64Nanoseconds
uuidFixedSizeBinary(16)
jsonbLargeUtf8Stored as JSON string
byteaLargeBinary
enumDictionaryValues preserved

CLI Examples

# Show schema from S3/Delta table
epoch schema show events

# Output:
# Table: events (S3/Delta Lake)
# Source: Stored metadata (constraints not enforced)
#
# Columns:
#   id         UUID
#   user_id    INT64
#   event_type VARCHAR(100)
#   payload    JSONB
#   created_at TIMESTAMP
#
# Constraints (metadata only - not enforced):
#   pk_events             PRIMARY KEY (id)
#   fk_events_user        FOREIGN KEY (user_id) REFERENCES users (id)
#
# Indexes (not created - Delta Lake manages indexing):
#   idx_events_user       btree (user_id)
#   idx_events_type       btree (event_type)
#
# Note: S3/Delta Lake does not enforce constraints at the storage level.
# Constraints are stored for documentation and merge conflict detection.

Python Example

import asyncio
from horizon_epoch import Client

async def check_delta_constraints():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Get constraints from Delta table
        constraints = await client.get_constraints("events", branch="main")

        # Constraints are stored but not enforced by S3/Delta
        print(f"Foreign Keys: {len(constraints.foreign_keys)}")
        print(f"Unique Constraints: {len(constraints.unique_constraints)}")

        # For S3/Delta backends, constraints are metadata-only
        # Validation happens at write time (application level)

asyncio.run(check_delta_constraints())

Constraint Storage in Delta Lake

Constraints are stored as Delta Lake table properties:

{
  "horizon_epoch.constraints": {
    "primary_key": ["id"],
    "foreign_keys": [
      {
        "name": "fk_events_user",
        "columns": ["user_id"],
        "references_table": "users",
        "references_columns": ["id"]
      }
    ],
    "unique_constraints": [],
    "check_constraints": [
      {
        "name": "chk_events_type",
        "expression": "event_type IN ('click', 'view', 'purchase')"
      }
    ]
  }
}

Cross-Backend Operations

Merge Behavior

Constraint conflicts are detected regardless of enforcement level:

# Branch A (PostgreSQL) modifies constraint
epoch checkout main
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 18)"
epoch commit -m "Add age constraint"

# Branch B (same table) modifies same constraint
epoch checkout feature
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 21)"
epoch commit -m "Different age constraint"

# Merge detects conflict
epoch merge feature
# CONFLICT (constraint): chk_age modified in both branches

Even for S3/Delta tables where constraints aren’t enforced, merge conflicts are detected:

# Both branches modify the same constraint metadata
# Merge requires resolution, even though neither is enforced
epoch merge feature
# CONFLICT (constraint): chk_events_type modified in both branches

Generating DDL for S3 Schemas

You can generate PostgreSQL DDL from schemas stored with S3/Delta tables:

# Generate DDL from S3 table schema
epoch schema show events --generate-ddl

# Output:
# -- DDL for events (from S3/Delta metadata)
# CREATE TABLE events (
#     id UUID PRIMARY KEY,
#     user_id INT64 NOT NULL,
#     event_type VARCHAR(100),
#     payload JSONB,
#     created_at TIMESTAMP
# );
#
# ALTER TABLE events ADD CONSTRAINT fk_events_user
#     FOREIGN KEY (user_id) REFERENCES users (id);
#
# CREATE INDEX idx_events_user ON events (user_id);

This is useful for:

  • Creating PostgreSQL replicas of S3 data
  • Validating schema compatibility
  • Documentation

Constraint Support Level API

from horizon_epoch import ConstraintSupportLevel

# Check what a backend supports
level = client.get_constraint_support("postgresql://...")

# Methods available
level.can_enforce()       # True for Full only
level.can_store()         # True for Full, MetadataOnly, Partial
level.can_execute_ddl()   # True for Full only
level.validation_mode()   # "database", "application", or "none"

# Use in application logic
if level.can_enforce():
    # Let database handle constraint enforcement
    client.insert(data)
else:
    # Validate in application before insert
    violations = client.validate_constraints(data)
    if violations:
        raise ValueError(f"Constraint violations: {violations}")
    client.insert(data)

Best Practices

1. Know Your Enforcement Level

support = client.get_constraint_support(table_location)
if support != ConstraintSupportLevel.Full:
    logger.warning(f"Constraints not enforced for {table_location}")

2. Validate Before Writing to S3

# For S3/Delta tables, validate in application
def insert_to_s3_table(client, table, data):
    schema = client.get_schema(table)

    # Validate primary key uniqueness
    existing_keys = client.query(f"SELECT id FROM {table}")
    if data['id'] in existing_keys:
        raise ValueError("Duplicate primary key")

    # Validate foreign keys exist
    for fk in schema.foreign_keys:
        # Check referenced table
        pass

    # Then insert
    client.insert(table, data)

3. Use PostgreSQL for Critical Constraints

If constraint enforcement is critical, use PostgreSQL tables:

# Analytics data - S3 is fine (no strict constraints needed)
client.register_table("events", "s3://bucket/events")

# User data - PostgreSQL for enforcement
client.register_table("users", "postgresql://mydb/public.users")

4. Document Constraint Semantics

In mixed-backend environments, document which constraints are enforced:

## Data Model

| Table | Backend | Constraints Enforced |
|-------|---------|---------------------|
| users | PostgreSQL | All (FK, UK, CHECK) |
| events | S3/Delta | None (metadata only) |
| orders | PostgreSQL | All |
| analytics | S3/Delta | None |

See Also

Common Issues

Solutions to frequently encountered problems with Horizon Epoch.

Connection Problems

Could not connect to metadata database

Symptoms:

Error: Database connection failed: connection refused

Causes:

  1. PostgreSQL not running
  2. Wrong connection string
  3. Firewall blocking port
  4. Authentication failure

Solutions:

  1. Check PostgreSQL is running:

    pg_isready -h localhost -p 5432
    
  2. Verify connection string format:

    postgresql://user:password@host:port/database
    
  3. Check firewall settings:

    # Check if port is accessible
    nc -zv localhost 5432
    
  4. Test connection directly:

    psql "postgresql://user:pass@localhost/horizon_epoch"
    

S3 bucket not accessible

Symptoms:

Error: S3 error: Access Denied

Causes:

  1. Invalid credentials
  2. Bucket doesn’t exist
  3. Incorrect permissions
  4. Wrong region

Solutions:

  1. Verify AWS credentials:

    aws sts get-caller-identity
    
  2. Check bucket exists:

    aws s3 ls s3://bucket-name
    
  3. Review IAM permissions (need s3:GetObject, s3:PutObject, s3:ListBucket)

  4. Verify region matches bucket location:

    aws s3api get-bucket-location --bucket bucket-name
    

SSH tunnel connection failed

Symptoms:

Error: SSH tunnel failed: Connection refused

Solutions:

  1. Test SSH connection manually:

    ssh -i ~/.ssh/key user@bastion
    
  2. Check SSH key permissions:

    chmod 600 ~/.ssh/id_rsa
    
  3. Verify bastion host is accessible

  4. Check known_hosts:

    ssh-keyscan bastion.example.com >> ~/.ssh/known_hosts
    

Merge Issues

Merge conflict in table X

Symptoms:

CONFLICT (content): Merge conflict in users
Automatic merge failed; fix conflicts and then commit.

Understanding the conflict:

  • Same record modified in both branches
  • Same field changed to different values

Resolution:

  1. View conflicts:

    epoch conflicts show
    
  2. Resolve interactively:

    epoch conflicts resolve --interactive
    
  3. Or accept one side:

    epoch conflicts resolve --ours    # Keep target branch
    epoch conflicts resolve --theirs  # Keep source branch
    
  4. Complete merge:

    epoch merge --continue
    
  5. Or abort:

    epoch merge --abort
    

Merge base not found

Symptoms:

Error: Could not find common ancestor for branches

Causes:

  • Branches have no common history
  • Orphaned branch
  • Corrupted commit graph

Solutions:

  1. Check branch history:

    epoch log main --oneline
    epoch log feature/branch --oneline
    
  2. Find common commits:

    epoch log --all --graph
    
  3. If no common ancestor, use --allow-unrelated:

    epoch merge feature/branch --allow-unrelated
    

Branch Issues

Branch not found

Symptoms:

Error: Branch 'feature/xyz' not found

Solutions:

  1. List available branches:

    epoch branch list
    
  2. Check for typos in branch name

  3. Verify branch exists in repository:

    epoch branch list --all
    

Cannot delete branch

Symptoms:

Error: Cannot delete branch 'feature/xyz': not fully merged

Solutions:

  1. Merge the branch first:

    epoch merge feature/xyz
    epoch branch delete feature/xyz
    
  2. Force delete (loses unmerged changes):

    epoch branch delete feature/xyz --force
    

Table Issues

Table not found

Symptoms:

Error: Table 'users' not found

Solutions:

  1. List registered tables:

    epoch table list
    
  2. Register the table:

    epoch table add users --location "postgresql://mydb/public.users"
    
  3. Check table exists in storage backend


Schema mismatch

Symptoms:

Error: Schema mismatch for table 'users': expected 5 columns, found 6

Causes:

  • Table schema changed outside Horizon Epoch
  • Uncommitted schema changes

Solutions:

  1. View current schema:

    epoch table show users --schema
    
  2. Refresh schema:

    epoch table refresh users
    
  3. Commit schema changes:

    epoch commit -m "Update schema for users table"
    

Performance Issues

Slow queries on branches

Symptoms:

  • Queries on branches significantly slower than main
  • Query times increase with branch age

Causes:

  • Large overlay size
  • Deep branch hierarchy
  • Missing indexes

Solutions:

  1. Check overlay size:

    epoch branch info feature/branch --stats
    
  2. Materialize long-lived branches:

    epoch branch materialize feature/branch
    
  3. Merge to reduce hierarchy depth

  4. Add indexes to overlay tables


Slow commit operations

Symptoms:

  • Commits take longer than expected
  • Timeout during commit

Causes:

  • Large number of changes
  • Network latency
  • Lock contention

Solutions:

  1. Commit in smaller batches:

    epoch commit --tables users -m "Part 1"
    epoch commit --tables orders -m "Part 2"
    
  2. Check for locks:

    SELECT * FROM pg_locks WHERE relation = 'epoch_commits'::regclass;
    
  3. Increase timeout:

    epoch commit -m "Large update" --timeout 600
    

Authentication Issues

Vault authentication failed

Symptoms:

Error: Vault authentication failed: permission denied

Solutions:

  1. Check Vault connectivity:

    vault status
    
  2. Verify token/credentials:

    vault token lookup
    
  3. Check policy permissions:

    vault policy read horizon-epoch
    
  4. For AppRole, verify role_id and secret_id are correct


AWS credentials expired

Symptoms:

Error: ExpiredTokenException: The security token included in the request is expired

Solutions:

  1. For IAM roles, credentials refresh automatically

  2. For SSO:

    aws sso login
    
  3. For access keys, rotate them:

    aws iam create-access-key
    

Data Integrity Issues

Corrupted commit

Symptoms:

Error: Commit 'abc123' is corrupted or missing

Solutions:

  1. Check commit exists:

    epoch show abc123
    
  2. Verify metadata database integrity:

    SELECT * FROM epoch_commits WHERE id = 'abc123';
    
  3. If commit is referenced but missing, contact support


Orphaned records

Symptoms:

  • Records in overlay not connected to commits
  • Disk usage growing unexpectedly

Solutions:

  1. Run garbage collection:

    epoch gc
    
  2. Check for orphaned overlays:

    epoch gc --dry-run
    
  3. Manually clean up if needed:

    epoch gc --force
    

Certificate/TLS Issues

Certificate expired

Symptoms:

Error: SSL error: certificate has expired

Causes:

  1. Client certificate expired
  2. Server certificate expired
  3. CA certificate expired

Solutions:

  1. Check certificate expiry:

    # Client certificate
    openssl x509 -enddate -noout -in /path/to/client.crt
    
    # Server certificate (remote)
    openssl s_client -connect db.example.com:5432 -starttls postgres 2>/dev/null | \
      openssl x509 -noout -enddate
    
  2. Renew the certificate:

    # If using Vault PKI
    vault write pki/issue/my-role common_name="client"
    
    # Manual renewal - contact your CA
    
  3. If using Vault dynamic certificates, check renewal is working:

    epoch doctor --check vault
    

Certificate verification failed

Symptoms:

Error: SSL error: certificate verify failed
Error: CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate

Causes:

  1. Wrong CA certificate
  2. Incomplete certificate chain
  3. Self-signed certificate not trusted
  4. Server hostname mismatch

Solutions:

  1. Verify CA certificate is correct:

    # Check certificate chain
    openssl verify -CAfile /path/to/ca.crt /path/to/client.crt
    
  2. Check certificate chain is complete:

    # View full chain
    openssl crl2pkcs7 -nocrl -certfile /path/to/cert.pem | \
      openssl pkcs7 -print_certs -noout
    
  3. For hostname mismatch, check server certificate SANs:

    openssl x509 -noout -text -in server.crt | grep -A1 "Subject Alternative Name"
    
  4. Verify you’re connecting to the correct hostname matching the certificate


Permission denied reading certificate files

Symptoms:

Error: could not load private key file: Permission denied
Error: could not load certificate file: No such file or directory

Solutions:

  1. Check file permissions:

    ls -la /path/to/client.key
    # Should show -rw------- (600) or -r-------- (400)
    
    chmod 600 /path/to/client.key
    chmod 644 /path/to/client.crt
    
  2. Verify file exists and path is correct:

    ls -la /path/to/client.crt /path/to/client.key /path/to/ca.crt
    
  3. Check process user has access:

    # Run as same user
    sudo -u epoch_user cat /path/to/client.crt
    

Key doesn’t match certificate

Symptoms:

Error: key values mismatch
Error: SSL error: private key does not match certificate

Solutions:

  1. Verify key matches certificate:

    # Compare modulus hashes - they should match
    openssl x509 -noout -modulus -in client.crt | openssl md5
    openssl rsa -noout -modulus -in client.key | openssl md5
    
  2. If mismatched, regenerate key/cert pair or locate correct files


RDS IAM Authentication Issues

Failed to generate IAM auth token

Symptoms:

Error: Failed to generate RDS IAM authentication token
Error: The security token included in the request is invalid

Causes:

  1. IAM role doesn’t have rds-db:connect permission
  2. Wrong AWS region
  3. Invalid instance endpoint

Solutions:

  1. Verify IAM policy allows rds-db:connect:

    {
      "Effect": "Allow",
      "Action": "rds-db:connect",
      "Resource": "arn:aws:rds-db:REGION:ACCOUNT:dbuser:DBI_RESOURCE_ID/DB_USER"
    }
    
  2. Check AWS region matches RDS instance:

    aws rds describe-db-instances --db-instance-identifier mydb \
      --query 'DBInstances[0].DBInstanceArn'
    
  3. Verify you’re using the correct endpoint:

    # epoch.toml - use the actual endpoint, not a custom DNS
    [storage.postgres.mydb]
    host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com"
    use_iam_auth = true
    

RDS IAM token expired immediately

Symptoms:

Error: PAM authentication failed
Error: password authentication failed for user "iam_user"

Causes:

  1. System clock skew
  2. Token generated too far in advance
  3. Wrong database user

Solutions:

  1. Check system time is accurate:

    date -u
    # Compare with actual UTC time
    
    # Sync time if needed
    sudo ntpdate pool.ntp.org
    
  2. Verify database user is configured for IAM:

    -- In RDS
    CREATE USER iam_user WITH LOGIN;
    GRANT rds_iam TO iam_user;
    
  3. Check the token is being generated correctly:

    # Generate token manually for testing
    aws rds generate-db-auth-token \
      --hostname mydb.cluster-xxx.us-east-1.rds.amazonaws.com \
      --port 5432 \
      --username iam_user
    

Cannot assume role for RDS IAM

Symptoms:

Error: User is not authorized to perform: sts:AssumeRole
Error: Access denied when assuming role for RDS authentication

Solutions:

  1. Check assume role trust policy:

    aws iam get-role --role-name MyRdsRole \
      --query 'Role.AssumeRolePolicyDocument'
    
  2. Verify trust policy allows your principal:

    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::ACCOUNT:role/YourExecutionRole"
      },
      "Action": "sts:AssumeRole"
    }
    
  3. Check for external_id requirement:

    [storage.s3.mydb]
    assume_role_arn = "arn:aws:iam::123456789012:role/RdsRole"
    external_id = "required-external-id"  # If configured
    

Credential Refresh Issues

Credential refresh failed

Symptoms:

Error: Credential refresh failed: provider returned error
Error: Failed to refresh credentials: connection refused

Causes:

  1. Credential provider unavailable
  2. Network connectivity issues
  3. Token/secret expired

Solutions:

  1. Check provider connectivity:

    # For Vault
    vault status
    
    # For AWS
    aws sts get-caller-identity
    
  2. Verify credentials haven’t expired beyond refresh:

    # Check Vault token
    vault token lookup
    
    # Check current credentials are valid
    epoch doctor --check credentials
    
  3. Force credential refresh:

    # Clear cached credentials
    epoch config cache clear
    
    # Re-authenticate
    epoch auth login
    

Credentials expired and refresh not supported

Symptoms:

Error: Credentials expired
Error: Refresh not supported for static credentials

Causes:

  1. Using static credentials without refresh capability
  2. Credential cache disabled
  3. Refresh interval too long

Solutions:

  1. Use a provider that supports refresh:

    # Instead of static password
    [storage.postgres.mydb]
    vault_path = "secret/data/mydb"  # Dynamic refresh
    
    # Or environment variables (refresh on re-read)
    [storage.postgres.mydb]
    url = "${DB_URL}"
    
  2. Enable credential caching with refresh:

    [credentials]
    cache_enabled = true
    cache_ttl = 300
    refresh_before_expiry = 60  # Refresh 60s before expiry
    

Token lease expired (Vault)

Symptoms:

Error: Vault token expired
Error: permission denied (token expired)

Solutions:

  1. Check token status:

    vault token lookup
    
  2. Renew token if renewable:

    vault token renew
    
  3. For non-renewable tokens, re-authenticate:

    # AppRole
    vault write auth/approle/login \
      role_id=$ROLE_ID \
      secret_id=$SECRET_ID
    
  4. Configure automatic renewal:

    [vault]
    auto_renew_token = true
    renew_threshold = 0.7  # Renew when 70% of TTL elapsed
    

Multi-Backend Routing Issues

Storage backend not found

Symptoms:

Error: Storage backend 'mydb' not found
Error: Unknown storage location: postgresql://mydb/...

Causes:

  1. Backend not configured
  2. Typo in backend name
  3. Configuration not loaded

Solutions:

  1. List configured backends:

    epoch config show storage
    
  2. Check configuration file:

    # epoch.toml
    [storage.postgres.mydb]  # Backend name is 'mydb'
    url = "postgresql://..."
    
  3. Verify configuration is loaded:

    epoch doctor --check config
    

Wrong backend selected for table

Symptoms:

Error: Table 'users' not found in storage 'datalake'
Error: Cannot access PostgreSQL table through S3 backend

Causes:

  1. Table registered with wrong backend
  2. Backend mismatch in storage location

Solutions:

  1. Check table registration:

    epoch table show users
    
  2. Re-register with correct backend:

    epoch table remove users
    epoch table add users \
      --location "postgresql://correct_backend/public.users"
    
  3. Verify storage location format:

    PostgreSQL: postgresql://backend_name/schema.table
    S3:         s3://backend_name/path/to/table
    

Cannot connect to multiple backends simultaneously

Symptoms:

Error: Connection pool exhausted
Error: Too many connections

Causes:

  1. Pool size too small for multi-backend operations
  2. Connection leak
  3. Long-running transactions

Solutions:

  1. Increase pool sizes:

    [storage.postgres.backend1]
    pool_size = 20
    
    [storage.postgres.backend2]
    pool_size = 20
    
  2. Check for connection leaks:

    -- PostgreSQL
    SELECT * FROM pg_stat_activity
    WHERE application_name LIKE '%epoch%';
    
  3. Set idle connection timeouts:

    [storage.postgres.mydb]
    idle_timeout = 300  # Close idle connections after 5 min
    

Cross-backend operations failing

Symptoms:

Error: Cannot merge tables from different storage backends
Error: Cross-storage operation not supported

Causes:

  1. Tables on different backends can’t be joined directly
  2. Merge requires compatible storage types

Solutions:

  1. Understand cross-backend limitations:

    • Merges work within same storage type
    • Queries are executed per-backend
    • Results merged in memory
  2. For cross-backend data access, use Python SDK:

    # Read from both backends
    users = client.query_table("users", backend="postgres1")
    orders = client.query_table("orders", backend="postgres2")
    
    # Join in Python/Pandas
    merged = users.merge(orders, on="user_id")
    
  3. Consider consolidating frequently-joined tables on same backend


Backend credentials conflict

Symptoms:

Error: Credential 'default' conflicts with existing credential
Warning: Multiple credentials found for backend 'mydb'

Solutions:

  1. Use unique credential names per backend:

    [storage.postgres.backend1]
    vault_path = "secret/data/db1"
    
    [storage.postgres.backend2]
    vault_path = "secret/data/db2"  # Different path
    
  2. Clear credential cache if switching configurations:

    epoch config cache clear
    

Getting Help

If these solutions don’t resolve your issue:

  1. Check logs:

    epoch --verbose command
    
  2. Enable debug logging:

    RUST_LOG=debug epoch command
    
  3. Report issues:

    • Contact Horizon Analytic Studios for support
    • Include: version (epoch --version), error message, steps to reproduce

FAQ

Frequently asked questions about Horizon Epoch.

General Questions

What is Horizon Epoch?

Horizon Epoch is a Git-like version control system for data. It provides branching, merging, commits, and history tracking for data stored in PostgreSQL, MySQL, SQL Server, SQLite, S3/Delta Lake, Azure Blob, GCS, and local filesystems.

How is it different from Git?

Git tracks changes to files (text). Horizon Epoch tracks changes to structured data (rows/records). Key differences:

  • Record-level tracking instead of line-level
  • Field-level conflict detection instead of text diff
  • Works with live databases instead of files
  • Storage-agnostic - supports multiple backends

Do I need to migrate my data?

No. Horizon Epoch works with data where it lives. You register existing tables and start versioning. Your data stays in its original location.

Is my data copied when I create a branch?

No. Branching is zero-copy. Only modified records are stored separately. Creating a branch is instant regardless of data size.


Architecture

Where is the metadata stored?

Metadata (commits, branches, change tracking) is stored in a dedicated PostgreSQL database. This is separate from your data storage.

Can I use different databases for metadata and data?

Yes. The metadata database and data storage are completely separate. For example:

  • Metadata: PostgreSQL on localhost
  • Data: PostgreSQL on AWS RDS + S3 for analytics

Does Horizon Epoch modify my existing tables?

Horizon Epoch creates overlay tables alongside your existing tables to track changes. Your original table structure is not modified.

What happens if I modify data directly without using Epoch?

Changes made outside of Horizon Epoch’s context won’t be tracked. For full version control, all changes should go through Epoch or you should periodically sync.


Performance

How fast is branch creation?

Branch creation is O(1) - constant time regardless of data size. Creating a branch from a 1TB dataset takes the same time as from a 1KB dataset (~milliseconds).

What’s the storage overhead?

Storage overhead is proportional to changes, not data size:

  • Base data: Original size (no overhead)
  • Per branch: Only modified records are stored
  • Typical overhead: 1-10% for active branches

Are queries slower on branches?

There’s a small overhead for branch queries due to overlay resolution. For typical branch sizes (< 1% of data modified), overhead is negligible (<10%). For large overlays, consider materializing the branch.

Can it handle large datasets?

Yes. Horizon Epoch is designed for large-scale data:

  • Zero-copy branching works at any scale
  • Only changed records are processed
  • Streaming for large operations
  • Pagination for history

Operations

Can I undo a merge?

Yes. Use epoch revert HEAD to create a new commit that undoes the merge. The original merge commit remains in history.

How do I see what changed?

# Changes since last commit
epoch status

# Changes between branches
epoch diff main feature/branch

# Commit history
epoch log

Can I query historical data?

Yes. Use time-travel queries:

# At specific commit
epoch query "SELECT * FROM users" --at abc123

# At specific time
epoch query "SELECT * FROM users" --at "2024-01-15 10:00:00"

# At tag
epoch query "SELECT * FROM users" --at v1.0.0

How do I resolve merge conflicts?

# See conflicts
epoch conflicts show

# Interactive resolution
epoch conflicts resolve --interactive

# Accept one side
epoch conflicts resolve --ours   # or --theirs

# Complete merge
epoch merge --continue

Storage Backends

Which storage systems are supported?

Horizon Epoch supports 8 storage backends:

Relational Databases (Full Constraint Support):

  • PostgreSQL
  • MySQL
  • Microsoft SQL Server
  • SQLite (partial constraint support)

Object Storage (Delta Lake Format):

  • AWS S3 (and S3-compatible like MinIO)
  • Azure Blob Storage
  • Google Cloud Storage
  • Local Filesystem

Can I use multiple storage backends?

Yes. You can register tables from different backends and manage them in a single repository. Operations like branching and merging work across backends.

Does it work with AWS RDS?

Yes. Use the PostgreSQL or MySQL adapter with your RDS connection string. IAM authentication is supported for enhanced security.

What about cloud data warehouses?

Snowflake, BigQuery, and Redshift are not currently supported. These have unique architectures that would require specialized adapters.


Security

Are credentials stored securely?

Credentials can be sourced from:

  • Environment variables
  • Encrypted files
  • HashiCorp Vault
  • AWS Secrets Manager

Never store credentials in plain text configuration files.

Does it support SSO?

AWS SSO/Identity Center is supported for S3 access. Database SSO depends on your database’s support.

What access controls are available?

Horizon Epoch includes comprehensive security features:

  • Branch protection rules - Prevent direct commits to protected branches
  • Role-based access control (RBAC) - Define roles with specific permissions
  • Row-level security (RLS) - Fine-grained data access policies
  • Audit logging - Track all operations for compliance
  • Commit signing - PGP/OpenPGP signature verification
  • Secret scanning - Detect accidentally committed secrets
  • Field-level masking - Redact sensitive data

Integration

Does it work with dbt?

You can use Horizon Epoch to version the data that dbt transforms. Create a branch, run dbt, validate results, then merge.

Can I use it with Airflow/Dagster?

Yes. Use the Python SDK in your DAG tasks:

from horizon_epoch import Client

async def branch_and_transform():
    async with Client.connect("postgresql://...") as client:
        await client.branch("etl/daily-run")
        # ... run transformations ...
        await client.commit("Daily ETL run")
        await client.merge("etl/daily-run")

Is there a REST API?

Yes. The FastAPI server provides a REST API for all operations. See the REST API Reference.

Does it have a web UI?

A web UI is planned for future releases. Currently, use the CLI or SDK.


Comparison

vs. dbt snapshots?

dbt snapshots capture point-in-time state. Horizon Epoch provides:

  • Full branching and merging
  • Efficient storage (only changes)
  • Any storage backend

vs. Delta Lake time travel?

Delta Lake provides time travel within a single table. Horizon Epoch adds:

  • Cross-table consistency
  • Branching and merging
  • Multi-backend support
  • Git-like workflow

vs. lakeFS?

Both provide Git-like versioning for data. Key differences:

  • Horizon Epoch: Works with databases (PostgreSQL, MySQL, etc.) and object storage
  • lakeFS: Focused on object storage

vs. Nessie/Project Nessie?

Nessie provides Git-like versioning for Iceberg/Delta tables. Horizon Epoch:

  • Supports more storage backends (including relational databases)
  • Provides record-level tracking
  • Works with transactional databases

Troubleshooting

Where are logs?

# Verbose output
epoch --verbose command

# Debug logging
RUST_LOG=debug epoch command

How do I check system health?

# Run diagnostics
epoch doctor

# Check local services
epoch local status

Where can I get help?

  • Documentation: This site
  • Run epoch tips for getting started help
  • Run epoch --help for command reference

Licensing

Licensing terms to be announced. Please contact Horizon Analytic Studios for licensing inquiries.