Horizon Epoch

Git-like version control for data across heterogeneous storage systems.

Horizon Epoch brings the power of Git-style version control to your data infrastructure. Whether your data lives in relational databases, cloud object storage, or local filesystems, Horizon Epoch provides unified branching, merging, and history tracking.

Supported Storage Backends

Relational Databases - PostgreSQL, MySQL, SQL Server, SQLite
Cloud Object Storage - AWS S3, Azure Blob Storage, Google Cloud Storage
Local Storage - Local filesystem with Delta Lake format

Key Features

Multi-Storage Support - Work with data across 8 different storage backends
Zero-Copy Branching - Create instant branches without duplicating data
True Three-Way Merge - Smart conflict detection and resolution at the field level
Copy-on-Write Semantics - Efficient storage with minimal overhead
Git-Like Commands - Familiar CLI interface for data engineers

Who Is This For?

Horizon Epoch is designed for:

Data Engineers who need to safely test ETL changes before production
Analytics Teams who want to experiment with transformations in isolation
DevOps Engineers managing data across multiple environments
Anyone who has wished for Git-like workflows for their data

Quick Example

# Initialize a repository
epoch init my-data-repo --metadata-url "postgresql://localhost/horizon_epoch"

# Register a table to track
epoch table add users --location "postgresql://localhost/mydb/public.users"

# Create a feature branch
epoch branch create feature/new-schema

# Make changes, then commit
epoch commit -m "Add user preferences column"

# Merge back to main
epoch merge feature/new-schema

How It Works

Horizon Epoch tracks data versions through a metadata layer that sits alongside your existing storage:

Metadata Database - Stores commit history, branch pointers, and version indices
Storage Adapters - Connect to your data where it lives (PostgreSQL, MySQL, SQL Server, SQLite, S3, Azure, GCS, local filesystem)
Copy-on-Write - Only modified records are stored separately on branches

This architecture means:

No data migration required
Works with your existing infrastructure
Minimal storage overhead for branching

Getting Started

Ready to try Horizon Epoch? Start with the Installation Guide and then follow the Quickstart tutorial.

Documentation Structure

This documentation is organized into several sections:

Getting Started - Installation and your first steps
Tutorials - Step-by-step guided walkthroughs
How-To Guides - Task-oriented recipes for specific goals
Reference - Complete API and CLI documentation
Concepts - Deep dives into how Horizon Epoch works
Troubleshooting - Solutions to common problems

Installation

Horizon Epoch is currently in private development.

Installation packages and instructions will be available upon public release. For early access inquiries, please contact Horizon Analytic Studios.

Requirements

When released, Horizon Epoch will require:

PostgreSQL 14+ (for metadata storage)
One or more supported storage backends

Supported Storage Backends

PostgreSQL
MySQL
Microsoft SQL Server
SQLite
AWS S3 / S3-compatible (MinIO, etc.)
Azure Blob Storage
Google Cloud Storage
Local filesystem

Next Steps

Core Concepts - Understand how Horizon Epoch works
FAQ - Common questions

Quickstart

Installation instructions coming soon.

For early access inquiries, please contact Horizon Analytic Studios.

What is Horizon Epoch?

Horizon Epoch is a Git-like version control system for data. It provides:

Branching - Create isolated environments for testing changes
Commits - Track changes with full history
Merging - Combine changes with conflict detection
Multi-backend support - Works with PostgreSQL, MySQL, SQL Server, SQLite, S3, Azure, GCS, local filesystem

Next Steps

Core Concepts - Understand how Horizon Epoch works
FAQ - Common questions

Core Concepts

Repositories

A repository is the top-level container for versioned data. It contains:

Multiple tables (data sources)
Multiple branches (parallel versions)
A history of commits (snapshots in time)

Branches

Branches are independent lines of development. Like Git:

main is the default branch
Create branches for features, experiments, or environments
Branches are zero-copy - creating one is instant

Commits

A commit is a snapshot of one or more tables at a point in time:

Contains a message describing the changes
Links to parent commit(s) forming a history graph
Tracks which records changed

Tables

Tables are data sources that Horizon Epoch tracks:

Can be database tables (PostgreSQL, MySQL, SQL Server, SQLite) or object storage (S3, Azure, GCS, local)
Each table has a schema and primary key
Changes are tracked at the record level

Copy-on-Write

Horizon Epoch uses copy-on-write (CoW) semantics:

When you create a branch, no data is copied
Only modified records are stored separately
Queries automatically overlay changes on base data

This makes operations fast and storage efficient.

Three-Way Merge

When merging branches, Horizon Epoch uses three-way merge:

Find the common ancestor
Identify changes from ancestor to each branch
Combine non-conflicting changes
Report conflicts for overlapping changes

Conflicts can be resolved at the field level.

Storage Adapters

Horizon Epoch works with your existing storage through adapters:

Relational Databases:

PostgreSQL - Full constraint support, TLS/SSL
MySQL - Full constraint support, SSH tunnels
SQL Server - Full constraint support, Windows auth
SQLite - File-based or in-memory, partial constraint support

Object Storage (Delta Lake format):

AWS S3 - And S3-compatible (MinIO, etc.)
Azure Blob Storage - Multiple auth methods
Google Cloud Storage - Service account or ADC
Local Filesystem - For development and edge deployments

Each adapter implements the same interface, so version control operations work identically across storage types.

Metadata Layer

All versioning information is stored in a metadata database (PostgreSQL):

Commit graph (directed acyclic graph of commits)
Branch pointers (references to commits)
Table registrations and schema versions
Change tracking indices

The metadata layer is separate from your data storage, meaning Horizon Epoch doesn’t modify how your data is stored.

Tutorial: Your First Repository

In this tutorial, you’ll:

Create a Horizon Epoch repository
Register a PostgreSQL table
Make your first commit
Explore the history

Prerequisites

Horizon Epoch installed (Installation Guide)
PostgreSQL running with a sample database

Step 1: Create Sample Data

First, let’s create a sample table in PostgreSQL:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

INSERT INTO users (name, email) VALUES
    ('Alice', 'alice@example.com'),
    ('Bob', 'bob@example.com'),
    ('Charlie', 'charlie@example.com');

Step 2: Initialize Repository

epoch init my-first-repo \
  --metadata-url "postgresql://localhost/horizon_epoch" \
  --description "My first Horizon Epoch repository"

Output:

Initialized repository 'my-first-repo' on branch 'main'

Step 3: Register the Table

epoch table add users \
  --location "postgresql://localhost/mydb/public.users"

Output:

Registered table 'users' with 3 records
  Schema: id (int), name (text), email (text), created_at (timestamp)
  Primary key: id

Step 4: Create Initial Commit

epoch commit -m "Initial import of users table"

Output:

Created commit abc123f
  1 table, 3 records

Step 5: View the Log

epoch log

Output:

commit abc123f
Author: Your Name <you@example.com>
Date:   Thu Dec 12 10:30:00 2025

    Initial import of users table

    Tables: users
    Stats: 3 records added

Step 6: Check Status

epoch status

Output:

On branch main
nothing to commit, working tree clean

Congratulations!

You’ve created your first Horizon Epoch repository. Next steps:

Branching Workflow - Learn about branches
Merging Data - Combine changes

Tutorial: Branching Workflow

Learn how to use branches effectively in Horizon Epoch for parallel development and safe experimentation.

Prerequisites

Completed Your First Repository tutorial
A repository with at least one registered table

Understanding Branches

Branches in Horizon Epoch work like Git branches:

Each branch is an independent line of development
Branches are zero-copy - creating one doesn’t duplicate data
Changes on one branch don’t affect others until merged

How Zero-Copy Branching Works

When you create a branch in Horizon Epoch, no data is actually copied. Instead, the new branch simply points to the same data as its parent. Only when you make changes does Horizon Epoch store the modified records separately.

flowchart LR
    subgraph "Before Branch Creation"
        M1[("main branch")] --> D1[("Shared Data<br/>users: 1000 rows<br/>orders: 5000 rows")]
    end

flowchart LR
    subgraph "After Branch Creation (Zero-Copy)"
        M2[("main")] --> D2[("Shared Data<br/>users: 1000 rows<br/>orders: 5000 rows")]
        F2[("feature")] --> D2
    end

flowchart LR
    subgraph "After Modifications on Feature Branch"
        M3[("main")] --> D3[("Shared Data<br/>users: 1000 rows<br/>orders: 5000 rows")]
        F3[("feature")] --> D3
        F3 --> C3[("Branch Changes<br/>users: 5 modified<br/>users: 2 added")]
    end

This architecture means:

Instant branch creation - No waiting for data to copy
Efficient storage - Only changed records are stored twice
Fast queries - Branch-specific changes are overlaid on base data

Common Branching Patterns

Pattern 1: Feature Branch

Create a branch for a new feature or experiment:

gitGraph
    commit id: "Initial"
    commit id: "Add users table"
    branch feature/user-preferences
    checkout feature/user-preferences
    commit id: "Add preferences column"
    commit id: "Populate default values"
    checkout main
    merge feature/user-preferences id: "Merge feature"
    commit id: "Continue development"

epoch branch create feature/user-preferences
epoch checkout feature/user-preferences

Pattern 2: Environment Branches

Maintain branches for different environments:

gitGraph
    commit id: "v1.0"
    branch staging
    branch development
    checkout development
    commit id: "New feature A"
    commit id: "New feature B"
    checkout staging
    merge development id: "Promote to staging"
    checkout main
    commit id: "v1.1 Hotfix"
    checkout staging
    merge main id: "Sync hotfix"
    checkout main
    merge staging id: "Release v1.2"

epoch branch create development
epoch branch create staging
epoch branch create production

Pattern 3: Hotfix Branch

Quick fixes that need to go directly to production:

gitGraph
    commit id: "v1.0 Release"
    commit id: "v1.1 Release"
    branch hotfix/fix-email
    checkout hotfix/fix-email
    commit id: "Fix email format"
    checkout main
    merge hotfix/fix-email id: "v1.1.1 Hotfix"

epoch branch create hotfix/fix-email-format --from production

Pattern 4: Multiple Feature Branches

Working on multiple features in parallel:

gitGraph
    commit id: "Base"
    branch feature/auth
    branch feature/reporting
    checkout feature/auth
    commit id: "Add login"
    commit id: "Add logout"
    checkout feature/reporting
    commit id: "Add daily report"
    commit id: "Add weekly report"
    checkout main
    merge feature/auth id: "Merge auth"
    merge feature/reporting id: "Merge reporting"

Step-by-Step: Feature Branch Workflow

1. Create a Feature Branch

epoch branch create feature/add-status-field

2. Switch to the Branch

epoch checkout feature/add-status-field

3. Make Changes to Your Data

Modify the underlying data (e.g., add records, update schema):

-- In your PostgreSQL database
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';
UPDATE users SET status = 'active';

4. Review Changes

epoch diff

Output:

Modified: users
  Schema changes:
    + status (varchar)
  Record changes:
    ~ 3 records modified (new field values)

5. Commit Your Changes

epoch commit -m "Add status field to users table"

6. View Branch History

epoch log --oneline

Output:

def456g Add status field to users table
abc123f Initial import of users table

7. Compare with Main

epoch diff main..feature/add-status-field

8. Merge to Main

epoch checkout main
epoch merge feature/add-status-field

9. Clean Up

epoch branch delete feature/add-status-field

Branch Lifecycle Visualization

The complete lifecycle of a feature branch:

stateDiagram-v2
    [*] --> Created: epoch branch create
    Created --> Active: epoch checkout
    Active --> Modified: Make data changes
    Modified --> Committed: epoch commit
    Committed --> Modified: More changes
    Committed --> Reviewed: epoch diff main..branch
    Reviewed --> Merged: epoch merge
    Merged --> Deleted: epoch branch delete
    Deleted --> [*]

Environment Promotion Flow

A typical flow for promoting changes through environments:

flowchart TD
    subgraph Development
        D1[Feature Branch] --> D2[Development Branch]
        D2 --> D3{Tests Pass?}
        D3 -->|No| D1
    end

    subgraph Staging
        D3 -->|Yes| S1[Promote to Staging]
        S1 --> S2{QA Approved?}
        S2 -->|No| D1
    end

    subgraph Production
        S2 -->|Yes| P1[Promote to Production]
        P1 --> P2[Tag Release]
    end

# Promote development to staging
epoch checkout staging
epoch merge development

# After QA approval, promote to production
epoch checkout production
epoch merge staging
epoch tag v1.2.0

Branch Comparison Summary

Pattern	Use Case	Lifetime	Merge Target
Feature	New functionality	Days to weeks	development or main
Environment	Deployment stages	Permanent	Next environment
Hotfix	Critical fixes	Hours to days	production + main
Release	Version preparation	Days	main

Best Practices

Keep branches short-lived - Merge frequently to avoid conflicts
Use descriptive names - feature/user-auth is better than branch1
One feature per branch - Makes merging and rollback easier
Test before merging - Validate changes on the branch first
Delete merged branches - Keep your branch list clean

Visualizing Your Repository

You can visualize your repository’s branch structure:

# List all branches with their relationships
epoch branch list --graph

# Show commit history across branches
epoch log --graph --all

Next Steps

Merging Data - Learn about merge strategies and conflict resolution
Environment Promotion - Deploy changes safely across environments

Tutorial: Merging Data

Learn how to merge branches, handle conflicts, and understand merge strategies in Horizon Epoch.

Prerequisites

Completed Branching Workflow tutorial
Understanding of basic branch operations

Merge Types

Fast-Forward Merge

When the target branch hasn’t changed since the source branch was created:

Before:
main:     A---B
               \
feature:        C---D

After (fast-forward):
main:     A---B---C---D

epoch merge feature/simple-change
# Fast-forward merge: main -> def456g

Three-Way Merge

When both branches have diverged:

Before:
main:     A---B---C
               \
feature:        D---E

After (three-way merge):
main:     A---B---C---F
               \     /
feature:        D---E

epoch merge feature/complex-change
# Merged feature/complex-change into main (commit: ghi789h)

Handling Conflicts

Conflicts occur when both branches modify the same record’s fields.

Detecting Conflicts

epoch merge feature/conflicting-changes
# CONFLICT: Merge conflict in users
# Automatic merge failed; fix conflicts and then commit.

Viewing Conflicts

epoch status

Output:

On branch main
You have unmerged paths.
  (fix conflicts and run "epoch merge --continue")
  (use "epoch merge --abort" to abort the merge)

Unmerged paths:
  both modified:   users (3 conflicts)

Inspecting Conflict Details

epoch conflicts show

Output:

Table: users
Record ID: 42
Field: email
  BASE:   alice@example.com
  OURS:   alice@company.com
  THEIRS: alice.smith@example.com

Field: status
  BASE:   active
  OURS:   active
  THEIRS: inactive

Resolving Conflicts

Interactive Resolution

epoch conflicts resolve --interactive

Accept One Side

# Accept all changes from target branch (ours)
epoch conflicts resolve --ours

# Accept all changes from source branch (theirs)
epoch conflicts resolve --theirs

Field-Level Resolution

epoch conflicts resolve users --record 42 --field email --value "alice.smith@company.com"

Completing the Merge

epoch merge --continue

Aborting the Merge

If you need to start over:

epoch merge --abort

Merge Strategies

Default Strategy

The default strategy uses three-way merge with field-level conflict detection:

epoch merge feature/my-branch

Ours Strategy

Keep all conflicting values from the target branch:

epoch merge feature/my-branch --strategy ours

Theirs Strategy

Accept all conflicting values from the source branch:

epoch merge feature/my-branch --strategy theirs

Best Practices

Merge frequently - Small, frequent merges have fewer conflicts
Review diffs first - Use epoch diff before merging
Test after merge - Validate data integrity post-merge
Document conflicts - Add context in merge commit messages

Example: Complete Merge Workflow

# 1. Ensure you're on the target branch
epoch checkout main

# 2. Pull latest changes (if using remote)
epoch pull

# 3. Preview what will be merged
epoch diff main..feature/my-feature

# 4. Perform the merge
epoch merge feature/my-feature

# 5. If conflicts, resolve them
epoch conflicts show
epoch conflicts resolve --interactive

# 6. Complete the merge
epoch merge --continue

# 7. Verify the result
epoch log --oneline -5
epoch status

Next Steps

Environment Promotion - Promote changes across environments
Resolve Conflicts - Detailed conflict resolution guide

Tutorial: Environment Promotion

Learn how to safely promote data changes from development through staging to production using Horizon Epoch’s branching model.

Prerequisites

Completed Merging Data tutorial
Understanding of environment-based workflows

The Environment Branch Model

A typical setup uses long-lived branches for each environment:

production    A---B-----------F---G
                   \         /
staging             C---D---E
                         \
development               H---I---J

Setting Up Environment Branches

1. Create Environment Branches

# Start from main (which represents production)
epoch branch create staging --from main
epoch branch create development --from staging

2. Protect Production

Configure branch protection to prevent direct commits:

epoch branch protect production --require-review --no-force-push

Development Workflow

1. Work on Development

All feature work starts on development or feature branches:

epoch checkout development
epoch branch create feature/new-report --from development

2. Develop and Test

Make changes and commit:

# Make data changes
epoch commit -m "Add sales report aggregations"

3. Merge to Development

epoch checkout development
epoch merge feature/new-report

Promoting to Staging

1. Review Changes

epoch diff staging..development

2. Create a Release Branch (Optional)

For complex releases:

epoch branch create release/v1.2 --from development

3. Promote to Staging

epoch checkout staging
epoch merge development  # or release/v1.2

4. Run Staging Tests

Validate the data in staging environment before proceeding.

Promoting to Production

1. Final Review

epoch diff production..staging

2. Promote to Production

epoch checkout production
epoch merge staging

3. Tag the Release

epoch tag v1.2.0 --message "Release 1.2.0: Sales report aggregations"

Rollback Procedures

Quick Rollback

If issues are found immediately after promotion:

# Revert the last merge
epoch revert HEAD

Rollback to Specific Version

# Find the commit to rollback to
epoch log --oneline -10

# Reset to that commit
epoch reset --hard abc123f

Emergency Production Rollback

# Create a hotfix from the last known good state
epoch checkout production
epoch reset --hard v1.1.0
epoch tag v1.1.1-rollback --message "Emergency rollback from v1.2.0"

Hotfix Workflow

For urgent production fixes:

# 1. Create hotfix from production
epoch branch create hotfix/critical-fix --from production

# 2. Make the fix
epoch checkout hotfix/critical-fix
# ... apply fix ...
epoch commit -m "Fix critical data issue"

# 3. Merge directly to production
epoch checkout production
epoch merge hotfix/critical-fix

# 4. Backport to other environments
epoch checkout staging
epoch merge hotfix/critical-fix

epoch checkout development
epoch merge hotfix/critical-fix

Automation Tips

CI/CD Integration

# Example GitHub Actions workflow
name: Promote to Staging

on:
  workflow_dispatch:
    inputs:
      source_branch:
        description: 'Branch to promote'
        default: 'development'

jobs:
  promote:
    runs-on: ubuntu-latest
    steps:
      - name: Promote to staging
        run: |
          epoch checkout staging
          epoch merge ${{ inputs.source_branch }}
          epoch push

Scheduled Promotions

# Promote development to staging every night
0 2 * * * epoch checkout staging && epoch merge development && epoch push

Best Practices

Never commit directly to production - Always promote through the pipeline
Tag production releases - Makes rollback and auditing easier
Keep environments in sync - Regularly merge production back to development
Automate where possible - Reduces human error
Document promotions - Keep a changelog of what was promoted and when

Next Steps

ETL Branch Testing - Test ETL changes safely
Configuration Reference - Environment configuration options

Tutorial: ETL Branch Testing

Learn how to safely test ETL (Extract, Transform, Load) changes using Horizon Epoch branches before deploying to production.

Prerequisites

Completed Environment Promotion tutorial
Understanding of ETL workflows
Python environment with Horizon Epoch SDK

The Problem

ETL changes are risky:

Transformations might produce incorrect results
Schema changes might break downstream systems
Volume changes might cause performance issues

Traditional approaches:

Test on copied data (expensive, often stale)
Test in staging (still risky, shared environment)
Hope for the best (not recommended)

The Solution: Branch-Based ETL Testing

With Horizon Epoch, you can:

Create a branch from production data
Run your ETL changes against the branch
Validate the results
Merge only when confident

Step-by-Step Guide

1. Set Up Your Repository

import asyncio
from horizon_epoch import Client, StorageBackend

async def setup():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        await client.init("my-repo")

        # Add warehouse storage
        await client.add_storage(
            name="warehouse",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://localhost/data_warehouse"}
        )
        return client

# Note: In practice, you'd keep the client connection open throughout

2. Create a Test Branch

# Create branch from production
test_branch = await client.branch(
    name="etl-test/sales-aggregation-v2",
    start_point="main"  # production
)
print(f"Created test branch: {test_branch.name}")

3. Run Your ETL Transformation

import pandas as pd
from horizon_epoch import Author

# Checkout the test branch
await client.checkout("etl-test/sales-aggregation-v2")

# Your ETL logic here
df = pd.read_sql("""
    SELECT
        date_trunc('month', sale_date) as month,
        product_category,
        SUM(amount) as total_sales
    FROM sales
    GROUP BY 1, 2
""", connection)

# Write transformed data
df.to_sql(
    "monthly_sales_summary",
    connection,
    if_exists="replace",
    index=False
)

# Stage and commit the transformation
await client.stage_all()
await client.commit(
    message="Apply new sales aggregation logic v2",
    author=Author(name="ETL Pipeline", email="etl@example.com")
)

4. Validate the Results

# Compare with expected results
async def validate_transformation():
    # Already on the test branch from checkout
    # Check row counts
    result = connection.execute(
        "SELECT COUNT(*) FROM monthly_sales_summary"
    ).scalar()
    assert result > 0, "No rows in summary table"

    # Check for nulls in required fields
    nulls = connection.execute("""
        SELECT COUNT(*) FROM monthly_sales_summary
        WHERE month IS NULL OR total_sales IS NULL
    """).scalar()
    assert nulls == 0, f"Found {nulls} rows with null values"

    # Verify totals match
        branch_total = connection.execute(
            "SELECT SUM(total_sales) FROM monthly_sales_summary"
        ).scalar()

    # Compare with production
    prod_total = connection.execute(
        "SELECT SUM(amount) FROM sales"
    ).scalar()

    assert abs(branch_total - prod_total) < 0.01, \
        f"Total mismatch: {branch_total} vs {prod_total}"

    print("All validations passed!")

validate_transformation()

5. Compare with Production

# Get detailed diff
diff = client.diff(
    source="main",
    target="etl-test/sales-aggregation-v2"
)

for table_diff in diff.table_diffs:
    print(f"\nTable: {table_diff.table_name}")
    print(f"  Added: {table_diff.added_count}")
    print(f"  Modified: {table_diff.modified_count}")
    print(f"  Deleted: {table_diff.deleted_count}")

6. Review and Merge

If validations pass:

# Merge to production
result = client.merge(
    source="etl-test/sales-aggregation-v2",
    target="main"
)

if result.status == "success":
    print("ETL changes merged to production!")
    # Clean up test branch
    client.delete_branch("etl-test/sales-aggregation-v2")
else:
    print(f"Merge failed: {result.conflicts}")

Complete Example Script

#!/usr/bin/env python3
"""
ETL Branch Testing Example

Usage:
    python etl_branch_test.py
"""

import asyncio
from horizon_epoch import Client, Author, StorageBackend
from datetime import datetime

async def main():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Add warehouse storage if needed
        await client.add_storage(
            name="warehouse",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://localhost/data_warehouse"}
        )

        branch_name = f"etl-test/transform-{datetime.now():%Y%m%d-%H%M%S}"

        try:
            # 1. Create test branch
            print(f"Creating branch: {branch_name}")
            await client.branch(branch_name, start_point="main")

            # 2. Run transformation
            print("Running ETL transformation...")
            await client.checkout(branch_name)
            await run_transformation()
            await client.stage_all()
            await client.commit(
                message="Apply transformation",
                author=Author(name="ETL", email="etl@example.com")
            )

            # 3. Validate
            print("Validating results...")
            await validate_results()

            # 4. Review diff
            print("\nChanges summary:")
            diff = await client.diff(base="main", target=branch_name)
            for td in diff.table_diffs:
                print(f"  {td.table_name}: {td.total_changes} changes")

            # 5. Prompt for merge
            if input("\nMerge to production? (y/n): ").lower() == 'y':
                result = await client.merge(source=branch_name)
                if result.has_conflicts:
                    print(f"Conflicts: {result.conflict_count}")
                else:
                    print(f"Merged: {result.result_commit_id}")

        finally:
            # Clean up
            branches = await client.branches()
            if any(b.name == branch_name for b in branches):
                await client.delete_branch(branch_name)
                print(f"Cleaned up branch: {branch_name}")

def run_transformation():
    # Your ETL logic here
    pass

def validate_results():
    # Your validation logic here
    pass

if __name__ == "__main__":
    main()

Best Practices

Name branches descriptively - Include ETL name and timestamp
Automate validation - Write comprehensive checks
Keep transformations atomic - One logical change per branch
Document expected changes - Record what the ETL should do
Time-box testing - Don’t let test branches linger

Next Steps

Branch-Aware Queries - Query across branches
Multi-Backend Setup - ETL across storage types

Tutorial: Schema and Constraint Versioning

This tutorial demonstrates how to version database constraints alongside your data, including schema introspection, constraint diffing, merge conflict resolution, and DDL migration generation.

Prerequisites

Horizon Epoch installed (Installation Guide)
PostgreSQL running with a sample database
Basic familiarity with Git-like version control concepts

Part 1: Schema Introspection with Constraints

Step 1: Create a Table with Constraints

First, let’s create a table with various constraints in PostgreSQL:

-- Create enum type
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered');

-- Create customers table
CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    name VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT uq_customers_email UNIQUE (email),
    CONSTRAINT chk_customers_email CHECK (email ~ '@')
);

-- Create orders table with foreign key
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    status order_status DEFAULT 'pending',
    total NUMERIC(10,2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id)
        REFERENCES customers (id) ON DELETE CASCADE,
    CONSTRAINT chk_orders_total CHECK (total >= 0)
);

-- Create index
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_status ON orders (status) WHERE status != 'delivered';

Step 2: Register Tables with Horizon Epoch

# Initialize repository
epoch init ecommerce-repo \
  --metadata-url "postgresql://localhost/horizon_epoch"

# Register tables (constraints are automatically detected)
epoch table add customers \
  --location "postgresql://localhost/mydb/public.customers"

epoch table add orders \
  --location "postgresql://localhost/mydb/public.orders"

# Initial commit captures schema with constraints
epoch commit -m "Initial schema with constraints"

Step 3: View Introspected Schema

# Show schema including constraints
epoch schema show customers

# Output:
# Table: customers
#
# Columns:
#   id          SERIAL       PRIMARY KEY
#   email       VARCHAR(255) NOT NULL
#   name        VARCHAR(255) NOT NULL
#   created_at  TIMESTAMP    DEFAULT CURRENT_TIMESTAMP
#
# Constraints:
#   uq_customers_email  UNIQUE (email)
#   chk_customers_email CHECK (email ~ '@')

Using Python:

import asyncio
from horizon_epoch import Client

async def view_constraints():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Get constraints for a table
        constraints = await client.get_constraints("customers", branch="main")

        print(f"\nUnique Constraints: {len(constraints.unique_constraints)}")
        print(f"Check Constraints: {len(constraints.check_constraints)}")
        print(f"Foreign Keys: {len(constraints.foreign_keys)}")

asyncio.run(view_constraints())

Part 2: Constraint Diffing Between Branches

Step 1: Create a Feature Branch

epoch branch create feature/add-phone

Step 2: Modify Constraints on the Branch

Add a phone number column with constraints:

-- On feature/add-phone branch
ALTER TABLE customers ADD COLUMN phone VARCHAR(20);
ALTER TABLE customers ADD CONSTRAINT uq_customers_phone UNIQUE (phone);
ALTER TABLE customers ADD CONSTRAINT chk_customers_phone
    CHECK (phone ~ '^\+?[0-9]{10,15}$');

epoch commit -m "Add phone column with constraints"

Step 3: View the Constraint Diff

# Compare branches
epoch diff main feature/add-phone --schema

# Output:
# Schema changes for 'customers':
#
# Added columns:
#   + phone VARCHAR(20)
#
# Added constraints:
#   + uq_customers_phone UNIQUE (phone)
#   + chk_customers_phone CHECK (phone ~ '^\+?[0-9]{10,15}$')

Using Python:

# Get constraint diff
diff = client.diff_schema("main", "feature/add-phone", table="customers")

print("Added constraints:")
for c in diff.constraints.added:
    print(f"  + {c.constraint_type}: {c.name}")

print("\nRemoved constraints:")
for c in diff.constraints.removed:
    print(f"  - {c.constraint_type}: {c.name}")

print("\nModified constraints:")
for m in diff.constraints.modified:
    print(f"  ~ {m.name}")
    for change in m.changes:
        print(f"      {change}")

Part 3: Constraint Merge Conflict Resolution

Constraint conflicts occur when the same constraint is modified differently on two branches.

Step 1: Create Conflicting Changes

# On main branch
epoch checkout main

-- Modify the email check constraint on main
ALTER TABLE customers DROP CONSTRAINT chk_customers_email;
ALTER TABLE customers ADD CONSTRAINT chk_customers_email
    CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');

epoch commit -m "Stricter email validation"

# On feature branch (already has different changes)
epoch checkout feature/add-phone

-- Also modify the email check constraint differently
ALTER TABLE customers DROP CONSTRAINT chk_customers_email;
ALTER TABLE customers ADD CONSTRAINT chk_customers_email
    CHECK (email LIKE '%@%.%');

epoch commit -m "Simpler email validation"

Step 2: Attempt Merge

epoch checkout main
epoch merge feature/add-phone

# Output:
# CONFLICT (constraint): chk_customers_email modified in both branches
#
# On main:
#   CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')
#
# On feature/add-phone:
#   CHECK (email LIKE '%@%.%')
#
# Automatic merge failed. Fix conflicts and commit.

Step 3: View Conflicts

epoch status

# Output:
# On branch main
# You have unmerged paths.
#
# Constraint conflicts:
#   (use "epoch resolve <constraint>" to mark resolution)
#
#   both modified:   customers.chk_customers_email

Using Python:

# Get merge conflicts
merge_result = client.merge("feature/add-phone", "main", dry_run=True)

for conflict in merge_result.constraint_conflicts:
    print(f"Conflict: {conflict.constraint_name}")
    print(f"  Type: {conflict.constraint_type}")
    print(f"  Base: {conflict.base}")
    print(f"  Ours: {conflict.ours}")
    print(f"  Theirs: {conflict.theirs}")
    print(f"  Differences: {conflict.differences}")

Step 4: Resolve Conflicts

Option A: Use ours (main branch version)

epoch resolve customers.chk_customers_email --ours
epoch commit -m "Merge feature/add-phone with stricter email validation"

Option B: Use theirs (feature branch version)

epoch resolve customers.chk_customers_email --theirs
epoch commit -m "Merge feature/add-phone with simpler email validation"

Option C: Use custom resolution

# Create custom constraint
epoch resolve customers.chk_customers_email --custom \
    "CHECK (email ~ '^[^@]+@[^@]+\.[^@]+$')"
epoch commit -m "Merge feature/add-phone with balanced email validation"

Using Python:

from horizon_epoch import ConstraintResolution

# Resolve using ours
client.resolve_constraint(
    "customers",
    "chk_customers_email",
    resolution=ConstraintResolution.UseOurs
)

# Or use theirs
client.resolve_constraint(
    "customers",
    "chk_customers_email",
    resolution=ConstraintResolution.UseTheirs
)

# Or provide custom
from horizon_epoch import CheckConstraint

custom_constraint = CheckConstraint(
    name="chk_customers_email",
    expression="email ~ '^[^@]+@[^@]+\\.[^@]+$'"
)
client.resolve_constraint(
    "customers",
    "chk_customers_email",
    resolution=ConstraintResolution.UseCustom(custom_constraint)
)

# Complete the merge
client.commit(message="Merge with resolved constraint conflict")

Part 4: DDL Migration Generation

Horizon Epoch can generate migration scripts for schema changes.

Step 1: View Pending Changes

# Show what migrations would be generated
epoch schema diff main feature/new-indexes --ddl

# Output:
# -- Migration: main -> feature/new-indexes
#
# -- UP
# CREATE INDEX CONCURRENTLY idx_customers_name ON customers (name);
# ALTER TABLE orders ADD CONSTRAINT chk_orders_status
#     CHECK (status IN ('pending', 'processing', 'shipped', 'delivered'));
#
# -- DOWN
# DROP INDEX CONCURRENTLY IF EXISTS idx_customers_name;
# ALTER TABLE orders DROP CONSTRAINT IF EXISTS chk_orders_status;

Step 2: Generate Migration Script

# Generate migration file
epoch schema migrate main feature/new-indexes \
    --output migrations/V2__add_indexes.sql

# Output:
# Generated migration: migrations/V2__add_indexes.sql
#   UP: 2 statements
#   DOWN: 2 statements
#   Warnings: 0

Using Python:

from horizon_epoch import PostgresDdlGenerator

# Get the schema diff
diff = client.diff_schema("main", "feature/new-indexes")

# Generate migration
generator = PostgresDdlGenerator()
migration = generator.generate_migration("customers", diff.constraints)

print("UP statements:")
for stmt in migration.up:
    print(f"  {stmt}")

print("\nDOWN statements:")
for stmt in migration.down:
    print(f"  {stmt}")

if migration.warnings:
    print("\nWarnings:")
    for warn in migration.warnings:
        print(f"  ! {warn}")

# Write to file
with open("migrations/V2__add_indexes.sql", "w") as f:
    f.write("-- UP\n")
    for stmt in migration.up:
        f.write(f"{stmt};\n")
    f.write("\n-- DOWN\n")
    for stmt in migration.down:
        f.write(f"{stmt};\n")

Step 3: Understanding Migration Ordering

The DDL generator follows a specific order to handle dependencies:

UP (Apply) Order:

Create new enum types (before columns using them)
Add values to modified enums
Drop constraints being modified/removed (FK first)
Drop indexes being modified/removed
Handle primary key changes
Add new/modified constraints (check, unique, then FK last)
Create new indexes

DOWN (Reverse) Order:

Drop new indexes
Drop new constraints
Re-add dropped indexes
Re-add dropped constraints
Re-add dropped enum types (with warnings)

Step 4: Check for Destructive Changes

# Check if migration has destructive changes
migration = generator.generate_migration("orders", diff.constraints)

if migration.has_destructive_changes():
    print("WARNING: Migration contains destructive changes!")
    for warn in migration.warnings:
        print(f"  {warn}")

Destructive changes include:

Dropping enum values (data may reference them)
Dropping foreign keys to tables being dropped
Removing NOT NULL on columns with NULL values

Part 5: Handling Enum Type Changes

Enum types require special handling because PostgreSQL only allows additive changes.

Step 1: Add Enum Values

epoch checkout main

-- Add new enum value (safe operation)
ALTER TYPE order_status ADD VALUE 'cancelled' AFTER 'delivered';

epoch commit -m "Add cancelled status"

Step 2: Check Enum Safety

# Get enum diff
diff = client.diff_schema("main", "feature-branch")

for enum_diff in diff.enum_types:
    if enum_diff.is_additive():
        print(f"Enum {enum_diff.name}: Safe to merge (additive only)")
    else:
        print(f"Enum {enum_diff.name}: UNSAFE - values removed or reordered")
        print(f"  Removed: {enum_diff.removed}")
        print(f"  Reordered: {enum_diff.reordered}")

Step 3: Handle Non-Additive Enum Changes

If you need to remove or reorder enum values, you must:

Create a new enum type
Migrate data to the new type
Drop the old type
Rename the new type

-- Create new enum
CREATE TYPE order_status_new AS ENUM ('pending', 'processing', 'shipped');

-- Migrate data (handle 'delivered' and 'cancelled' values)
UPDATE orders SET status = 'shipped'
    WHERE status IN ('delivered', 'cancelled');

-- Change column type
ALTER TABLE orders ALTER COLUMN status TYPE order_status_new
    USING status::text::order_status_new;

-- Drop old, rename new
DROP TYPE order_status;
ALTER TYPE order_status_new RENAME TO order_status;

Best Practices

1. Always Review Constraint Diffs Before Merge

# Check what constraints changed
epoch diff main feature-branch --schema --constraints-only

2. Use Dry-Run for Merges

# Preview merge without applying
result = client.merge("feature", "main", dry_run=True)
if result.has_conflicts:
    print("Resolve these conflicts before merging:")
    for c in result.conflicts:
        print(f"  {c}")

3. Generate and Review Migrations

# Always review generated DDL
epoch schema migrate main feature --output migration.sql --dry-run
cat migration.sql

4. Test Migrations on Branch Data

# Create test branch
epoch branch create test/migration

# Apply migration
psql -f migration.sql

# Verify data integrity
epoch validate test/migration

5. Use Semantic Constraint Names

Follow a consistent naming convention:

Foreign keys: fk_{table}_{column} or fk_{table}_{referenced_table}
Unique: uq_{table}_{columns}
Check: chk_{table}_{description}
Index: idx_{table}_{columns}

How to Connect PostgreSQL

Basic Connection

import asyncio
from horizon_epoch import Client

async def main():
    async with Client.connect("postgresql://user:pass@localhost/horizon_epoch") as client:
        # Client is now connected to metadata database
        await client.init("my-repo")

asyncio.run(main())

CLI Connection

# Set connection in environment
export EPOCH_METADATA_URL="postgresql://user:pass@localhost/horizon_epoch"

# Or use connection string directly
epoch init my-repo --metadata-url "postgresql://localhost/horizon_epoch"

# Register PostgreSQL table
epoch table add users \
    --location "postgresql://localhost/mydb/public.users"

Register a PostgreSQL Table

from horizon_epoch.client import _native

# Track a PostgreSQL table
loc = _native.StorageLocation.postgresql("main", "public", "users")
await client.track_table("users", loc)

# From a specific schema
loc = _native.StorageLocation.postgresql("main", "sales", "orders")
await client.track_table("orders", loc)

Add PostgreSQL as Storage Backend

# CLI - add storage backend
epoch storage add mydb \
    --type postgresql \
    --url "postgresql://user:pass@localhost/production"

from horizon_epoch import StorageBackend

await client.add_storage(
    name="mydb",
    backend=StorageBackend.POSTGRESQL,
    config={"url": "postgresql://user:pass@localhost/production"}
)

With SSL

# CLI with SSL
epoch storage add mydb \
    --type postgresql \
    --url "postgresql://user:pass@host/db?sslmode=require"

For stricter SSL verification:

epoch storage add mydb \
    --type postgresql \
    --url "postgresql://user:pass@host/db?sslmode=verify-full&sslrootcert=/path/to/ca.crt"

Connection String Format

postgresql://[user[:password]@][host][:port][/database][?param=value&...]

Common parameters:

sslmode - disable, require, verify-ca, verify-full
sslrootcert - Path to root certificate
sslcert - Path to client certificate
sslkey - Path to client key
connect_timeout - Connection timeout in seconds
application_name - Application name for logging

Advanced Authentication

For enterprise environments requiring additional security:

SSH Tunnels: See SSH Tunnel Guide
Client Certificates (mTLS): See mTLS Authentication
AWS RDS IAM Auth: See AWS Secrets Guide
HashiCorp Vault: See Vault Integration

Troubleshooting

Connection Refused

Error: connection refused (os error 111)

Verify PostgreSQL is running: pg_isready -h localhost
Check host and port in connection string
Verify firewall rules

Authentication Failed

Error: password authentication failed

Verify username and password
Check pg_hba.conf for allowed authentication methods
Ensure user has access to the database

SSL Errors

Error: SSL connection required

Add ?sslmode=require to connection string, or for self-signed certificates:

?sslmode=verify-ca&sslrootcert=/path/to/ca.crt

Connect MySQL

Connect Horizon Epoch to MySQL databases.

CLI Configuration

# Add MySQL storage backend
epoch storage add mydb \
    --type mysql \
    --url "mysql://user:pass@localhost:3306/database"

Connection String Format

mysql://[user[:password]@][host][:port]/database[?param=value&...]

Common parameters:

ssl-mode - disabled, preferred, required, verify_ca, verify_identity
ssl-ca - Path to CA certificate
connect_timeout - Connection timeout in seconds

SSL/TLS Modes

Mode	Description
`disabled`	No SSL
`preferred`	Use SSL if available (default)
`required`	Require SSL, don’t verify certificate
`verify_ca`	Require SSL, verify CA
`verify_identity`	Require SSL, verify CA and hostname

With SSL

epoch storage add mydb \
    --type mysql \
    --url "mysql://user:pass@host/db?ssl-mode=required"

With Certificate Verification

epoch storage add mydb \
    --type mysql \
    --url "mysql://user:pass@host/db?ssl-mode=verify_ca&ssl-ca=/path/to/ca.pem"

Register Tables

epoch table add customers \
    --location "mysql://mydb/customers"

SSH Tunnel Support

MySQL connections support SSH tunneling for secure access:

epoch storage add mydb \
    --type mysql \
    --url "mysql://user:pass@localhost:3306/db" \
    --ssh-host "bastion.example.com" \
    --ssh-user "ec2-user" \
    --ssh-key "~/.ssh/id_rsa"

See SSH Tunnels for more details.

AWS RDS MySQL

# Standard connection
epoch storage add rds \
    --type mysql \
    --url "mysql://admin:pass@mydb.cluster.us-east-1.rds.amazonaws.com:3306/mydb?ssl-mode=required"

For IAM authentication, see AWS Secrets.

Troubleshooting

Connection Refused

Verify MySQL is running and accepting connections
Check host, port, and firewall rules
Verify user has remote access permissions

SSL Errors

Ensure SSL mode matches server configuration
For self-signed certs, use verify_ca with ssl-ca parameter

Authentication Failed

Verify username and password
Check user has access from your host ('user'@'%' vs 'user'@'localhost')

Connect Microsoft SQL Server

Connect Horizon Epoch to Microsoft SQL Server databases.

CLI Configuration

# Add SQL Server storage backend
epoch storage add mydb \
    --type mssql \
    --url "mssql://user:pass@localhost:1433/database"

Connection String Format

mssql://[user[:password]@][host][:port]/database[?param=value&...]

Common parameters:

encrypt - off, on, required
trust_server_certificate - true/false
application_name - Application name for logging

Encryption Modes

Mode	Description
`off`	No encryption
`on`	Encrypt if server supports it
`required`	Require encryption (recommended)

With Encryption

epoch storage add mydb \
    --type mssql \
    --url "mssql://user:pass@host/db?encrypt=required"

Trust Self-Signed Certificate

epoch storage add mydb \
    --type mssql \
    --url "mssql://user:pass@host/db?encrypt=required&trust_server_certificate=true"

Authentication Methods

SQL Server Authentication

epoch storage add mydb \
    --type mssql \
    --url "mssql://sa:password@localhost/database"

Windows Authentication

epoch storage add mydb \
    --type mssql \
    --url "mssql://localhost/database?integrated_security=true"

Register Tables

epoch table add customers \
    --location "mssql://mydb/dbo.customers"

Azure SQL Database

epoch storage add azure \
    --type mssql \
    --url "mssql://user@server:pass@server.database.windows.net:1433/database?encrypt=required"

Troubleshooting

Connection Refused

Verify SQL Server is running and TCP/IP is enabled
Check SQL Server Configuration Manager for network settings
Verify firewall allows port 1433

Verify SQL Server authentication is enabled (mixed mode)
Check username and password
Verify user has access to the database

Encryption Errors

Ensure encrypt setting matches server configuration
For self-signed certs, use trust_server_certificate=true

Connect SQLite

Connect Horizon Epoch to SQLite databases.

CLI Configuration

# File-based database
epoch storage add local \
    --type sqlite \
    --path "/path/to/database.db"

# In-memory database (for testing)
epoch storage add test \
    --type sqlite \
    --path ":memory:"

Configuration Options

Option	Description	Default
`journal_mode`	WAL, DELETE, TRUNCATE, etc.	WAL
`foreign_keys`	Enable foreign key enforcement	true
`busy_timeout`	Lock wait timeout (ms)	5000

With WAL Mode (Recommended)

epoch storage add local \
    --type sqlite \
    --path "/data/mydb.db" \
    --journal-mode wal

Enable Foreign Keys

epoch storage add local \
    --type sqlite \
    --path "/data/mydb.db" \
    --foreign-keys true

Register Tables

epoch table add users \
    --location "sqlite://local/users"

Use Cases

Development/Testing

SQLite is ideal for local development:

# Create a local test database
epoch storage add dev \
    --type sqlite \
    --path "./test.db"

epoch init my-repo --storage dev

Embedded Applications

For applications that need embedded data versioning without external databases.

Edge/Offline

SQLite works well for edge deployments or offline-first applications.

Constraint Support

SQLite has partial constraint support:

Constraint	Support
Primary Key	Full
Unique	Full
Foreign Key	Requires `foreign_keys=true`
Check	Full
Index	Full

Foreign keys must be explicitly enabled:

epoch storage add local \
    --type sqlite \
    --path "/data/mydb.db" \
    --foreign-keys true

Troubleshooting

Database Locked

Increase busy_timeout for high-concurrency scenarios
Use WAL mode for better concurrent access
Ensure only one process writes at a time

Foreign Key Violations Not Detected

Ensure foreign_keys=true is set
Foreign key enforcement is off by default in SQLite

Permission Denied

Check file and directory permissions
Ensure the directory exists

How to Connect S3

Connect Horizon Epoch to S3-compatible storage for Delta Lake tables.

CLI Configuration

# Environment variables
export EPOCH_S3_ENDPOINT="http://localhost:9000"
export EPOCH_S3_BUCKET="horizon-data"
export EPOCH_S3_ACCESS_KEY="minioadmin"
export EPOCH_S3_SECRET_KEY="minioadmin"

# Add S3 storage backend
epoch storage add datalake \
    --type s3 \
    --bucket my-data-bucket \
    --region us-east-1

# Register a Delta table
epoch table add sales \
    --location "s3://my-bucket/delta/sales"

Python SDK

import asyncio
from horizon_epoch import Client, StorageBackend

async def main():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        await client.init("my-repo")

        # Add S3 storage backend
        await client.add_storage(
            name="datalake",
            backend=StorageBackend.S3,
            config={
                "bucket": "my-data-bucket",
                "region": "us-east-1"
            }
        )

asyncio.run(main())

Using Environment Variables

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"

The SDK will use AWS environment variables automatically.

MinIO / S3-Compatible Endpoints

# CLI
epoch storage add minio \
    --type s3 \
    --bucket horizon-data \
    --endpoint "http://localhost:9000" \
    --force-path-style

await client.add_storage(
    name="minio",
    backend=StorageBackend.S3,
    config={
        "bucket": "horizon-data",
        "endpoint": "http://localhost:9000",
        "access_key": "minioadmin",
        "secret_key": "minioadmin",
        "force_path_style": True
    }
)

IAM Roles (EC2/ECS/EKS)

When running on AWS infrastructure with IAM roles:

await client.add_storage(
    name="datalake",
    backend=StorageBackend.S3,
    config={
        "bucket": "my-data-bucket",
        "use_instance_credentials": True
    }
)

Assume Role

For cross-account access:

await client.add_storage(
    name="cross-account",
    backend=StorageBackend.S3,
    config={
        "bucket": "other-account-bucket",
        "assume_role_arn": "arn:aws:iam::123456789012:role/HorizonEpochAccess",
        "external_id": "optional-external-id"
    }
)

Registering Delta Tables

# CLI
epoch table add sales \
    --location "s3://my-bucket/delta/sales"

from horizon_epoch.client import _native

loc = _native.StorageLocation.s3("my-bucket", "delta/sales")
await client.track_table("sales", loc)

Table Paths

Tables are stored as Delta Lake tables:

s3://my-bucket/
  delta/
    sales/
      _delta_log/        # Delta transaction log
      part-00000.parquet
      part-00001.parquet
    users/
      _delta_log/
      part-00000.parquet

Configuration File

# ~/.epoch/config.toml
[storage.s3.default]
bucket = "horizon-data"
endpoint = "http://localhost:9000"
access_key = "minioadmin"
secret_key = "minioadmin"

Advanced Authentication

For enterprise environments:

AWS Secrets Manager: See AWS Secrets Guide
HashiCorp Vault: See Vault Integration
IAM Identity Center (SSO): See AWS Secrets Guide

Required Bucket Permissions

Minimum IAM permissions needed:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        }
    ]
}

Troubleshooting

Access Denied

Verify IAM permissions include all required actions
Check bucket policy allows access from your account/role
Ensure bucket region matches configuration

Endpoint Not Found

For MinIO, ensure force_path_style=True is set
Verify the endpoint URL is correct
Check network connectivity to the endpoint

Slow Operations

Consider enabling transfer acceleration for large files
Use regional endpoints when possible
Check for network latency to the S3 region

Connect Azure Blob Storage

Connect Horizon Epoch to Azure Blob Storage for Delta Lake tables.

CLI Configuration

# Add Azure storage backend
epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data"

Authentication Methods

Account Key

epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data" \
    --account-key "your-account-key"

Or via environment variable:

export AZURE_STORAGE_ACCOUNT_KEY="your-account-key"

epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data"

SAS Token

epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data" \
    --sas-token "?sv=2021-06-08&ss=b&srt=sco..."

Service Principal

export AZURE_CLIENT_ID="your-client-id"
export AZURE_CLIENT_SECRET="your-client-secret"
export AZURE_TENANT_ID="your-tenant-id"

epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data" \
    --auth-method service_principal

Managed Identity

For Azure VMs, AKS, or App Service with managed identity:

epoch storage add datalake \
    --type azure \
    --account "mystorageaccount" \
    --container "data" \
    --auth-method managed_identity

Register Delta Tables

epoch table add events \
    --location "azure://datalake/delta/events"

Table Storage Structure

Tables are stored as Delta Lake format:

container/
  delta/
    events/
      _delta_log/
      part-00000.parquet
      part-00001.parquet

Configuration File

# ~/.epoch/config.toml
[storage.azure.datalake]
account = "mystorageaccount"
container = "data"
prefix = "horizon-epoch/"
auth_method = "account_key"

Required Permissions

Minimum permissions for the storage account:

Storage Blob Data Reader (read operations)
Storage Blob Data Contributor (read/write operations)

Troubleshooting

AuthorizationFailure

Verify account name and credentials
Check container exists
Ensure identity has proper RBAC roles

Container Not Found

Verify container name is correct
Check storage account name

Slow Operations

Use regional endpoints when possible
Consider Azure Private Link for private network access

Connect Google Cloud Storage

Connect Horizon Epoch to Google Cloud Storage for Delta Lake tables.

CLI Configuration

# Add GCS storage backend
epoch storage add datalake \
    --type gcs \
    --bucket "my-gcs-bucket" \
    --project "my-project-id"

Authentication Methods

Service Account Key

# Set credentials file path
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

epoch storage add datalake \
    --type gcs \
    --bucket "my-gcs-bucket" \
    --project "my-project-id"

Or specify directly:

epoch storage add datalake \
    --type gcs \
    --bucket "my-gcs-bucket" \
    --project "my-project-id" \
    --credentials-file "/path/to/service-account.json"

Application Default Credentials (ADC)

For GCE, GKE, Cloud Run, or local development with gcloud auth:

# Local development: authenticate with gcloud
gcloud auth application-default login

epoch storage add datalake \
    --type gcs \
    --bucket "my-gcs-bucket" \
    --project "my-project-id" \
    --auth-method adc

Workload Identity (GKE)

For GKE with Workload Identity configured:

epoch storage add datalake \
    --type gcs \
    --bucket "my-gcs-bucket" \
    --project "my-project-id" \
    --auth-method adc

Register Delta Tables

epoch table add events \
    --location "gcs://datalake/delta/events"

Table Storage Structure

Tables are stored as Delta Lake format:

bucket/
  delta/
    events/
      _delta_log/
      part-00000.parquet
      part-00001.parquet

Configuration File

# ~/.epoch/config.toml
[storage.gcs.datalake]
bucket = "my-gcs-bucket"
project_id = "my-project-id"
prefix = "horizon-epoch/"
auth_method = "service_account"
credentials_file = "/path/to/service-account.json"

Required Permissions

Minimum IAM permissions for the service account:

roles/storage.objectViewer  # Read operations
roles/storage.objectCreator # Write operations
roles/storage.objectAdmin   # Full access (recommended)

Or custom role with:

storage.objects.get
storage.objects.list
storage.objects.create
storage.objects.delete

Troubleshooting

Permission Denied

Verify service account has required IAM roles
Check bucket-level permissions
Ensure project ID is correct

Bucket Not Found

Verify bucket name is correct
Check bucket is in the correct project

Credentials Not Found

Set GOOGLE_APPLICATION_CREDENTIALS environment variable
Or use --credentials-file parameter
For ADC, run gcloud auth application-default login

Connect Local Filesystem

Connect Horizon Epoch to local filesystem storage for Delta Lake tables.

CLI Configuration

# Add local filesystem storage backend
epoch storage add local \
    --type local \
    --path "/data/horizon-epoch"

Use Cases

Development and Testing

Local storage is ideal for development:

epoch storage add dev \
    --type local \
    --path "./data"

epoch init my-repo --storage dev

Single-Machine Deployments

For applications that don’t need distributed storage:

epoch storage add data \
    --type local \
    --path "/var/lib/horizon-epoch"

Air-Gapped Environments

For environments without cloud access:

epoch storage add offline \
    --type local \
    --path "/mnt/data/epoch"

Register Delta Tables

epoch table add events \
    --location "local://local/events"

Table Storage Structure

Tables are stored as Delta Lake format:

/data/horizon-epoch/
  events/
    _delta_log/
      00000000000000000000.json
    part-00000-*.parquet
    part-00001-*.parquet
  users/
    _delta_log/
    part-00000-*.parquet

Configuration File

# ~/.epoch/config.toml
[storage.local.dev]
path = "/data/horizon-epoch"

Features

Delta Lake format - Same format as S3/Azure/GCS
SQL queries - Query via DataFusion
Copy-on-write - Efficient branching
Time travel - Access historical versions

Permissions

Ensure the Epoch process has read/write access:

# Create directory with proper permissions
mkdir -p /data/horizon-epoch
chown -R epoch:epoch /data/horizon-epoch
chmod 750 /data/horizon-epoch

Troubleshooting

Permission Denied

Check directory exists and is writable
Verify process user has access

Disk Full

Monitor disk space
Use epoch gc to clean up old versions
Configure retention policies

Path Not Found

Ensure parent directory exists
Use absolute paths to avoid ambiguity

How to Resolve Conflicts

When merging branches, conflicts occur when both branches modify the same record’s fields. This guide explains how to identify, understand, and resolve conflicts.

Understanding Conflicts

A conflict happens when:

Both branches modify the same record (identified by primary key)
Both branches change the same field to different values

Example:

Branch A changes user 42’s email to alice@company.com
Branch B changes user 42’s email to alice.smith@example.com

Detecting Conflicts

When you run a merge:

epoch merge feature/branch

If conflicts exist:

CONFLICT: Merge conflict in users
Automatic merge failed; fix conflicts and then commit.

Viewing Conflicts

List Conflicting Tables

epoch status

Output:

On branch main
You have unmerged paths.

Unmerged paths:
  both modified:   users (3 conflicts)
  both modified:   orders (1 conflict)

View Conflict Details

epoch conflicts show

Output:

Table: users
------------------------------------------------------------
Record ID: 42
  Field: email
    BASE:   alice@example.com
    OURS:   alice@company.com
    THEIRS: alice.smith@example.com

  Field: phone
    BASE:   555-0100
    OURS:   555-0100
    THEIRS: 555-0199

Record ID: 87
  Field: status
    BASE:   active
    OURS:   inactive
    THEIRS: suspended
------------------------------------------------------------

Export Conflicts to File

epoch conflicts show --format json > conflicts.json

Resolution Methods

1. Interactive Resolution (Recommended)

epoch conflicts resolve --interactive

This walks you through each conflict:

Conflict 1 of 4: users.email (record 42)
  BASE:   alice@example.com
  OURS:   alice@company.com
  THEIRS: alice.smith@example.com

Choose resolution:
  [o] Keep ours (alice@company.com)
  [t] Keep theirs (alice.smith@example.com)
  [b] Keep base (alice@example.com)
  [c] Enter custom value
  [s] Skip (resolve later)
>

2. Accept All from One Side

Accept all changes from your branch (target):

epoch conflicts resolve --ours

Accept all changes from the source branch:

epoch conflicts resolve --theirs

3. Per-Table Resolution

# Accept theirs for users table
epoch conflicts resolve users --theirs

# Accept ours for orders table
epoch conflicts resolve orders --ours

4. Per-Record Resolution

# Keep our version of record 42
epoch conflicts resolve users --record 42 --ours

# Keep their version of record 87
epoch conflicts resolve users --record 87 --theirs

5. Per-Field Resolution

# Set specific value for a field
epoch conflicts resolve users \
    --record 42 \
    --field email \
    --value "alice.final@company.com"

6. Python SDK Resolution

# Get conflicts
conflicts = client.get_conflicts()

for conflict in conflicts:
    print(f"Table: {conflict.table_name}")
    print(f"Record: {conflict.record_id}")

    for field in conflict.fields:
        print(f"  {field.name}:")
        print(f"    base={field.base_value}")
        print(f"    ours={field.ours_value}")
        print(f"    theirs={field.theirs_value}")

        # Resolve programmatically
        client.resolve_conflict(
            table=conflict.table_name,
            record_id=conflict.record_id,
            field=field.name,
            value=determine_winner(field)  # Your logic
        )

# Continue merge
client.merge_continue()

Completing the Merge

After resolving all conflicts:

epoch merge --continue

Or if you want to abandon the merge:

epoch merge --abort

Conflict Prevention Strategies

Merge frequently - Smaller, more frequent merges have fewer conflicts
Communicate - Coordinate with team on who owns which data
Use feature flags - Test changes without modifying shared records
Partition work - Different team members work on different record sets

Best Practices

Understand the context - Read the commit messages from both branches
Check business rules - Some conflicts have a “correct” answer based on business logic
Document decisions - Add context in your merge commit message
Verify after merge - Run validation queries after resolving conflicts

Example Workflow

# 1. Start merge
epoch merge feature/user-updates

# 2. See what conflicted
epoch conflicts show

# 3. Resolve interactively
epoch conflicts resolve --interactive

# 4. Verify resolutions
epoch status
epoch diff HEAD

# 5. Complete merge with descriptive message
epoch merge --continue -m "Merge user-updates: resolved email conflicts using company domain policy"

How to Migrate from Snapshots

If you’re currently using database snapshots or manual backups for versioning, this guide helps you migrate to Horizon Epoch.

Current Snapshot-Based Workflow

Many teams use approaches like:

Daily database dumps
PostgreSQL pg_dump snapshots
AWS RDS snapshots
Manual CREATE TABLE ... AS SELECT copies

These approaches have limitations:

Full data copies are expensive
No granular change tracking
Difficult to compare versions
Merging changes is manual

Migration Overview

Initialize a Horizon Epoch repository
Import your existing data as the initial commit
Set up tables for tracking
Train your team on the new workflow

Step-by-Step Migration

1. Prepare Your Environment

# Start services
docker compose -f docker/docker-compose.yml up -d

# Build Horizon Epoch from source (see Installation guide)
cargo build --release

2. Initialize Repository

epoch init production-data \
    --metadata-url "postgresql://localhost/horizon_epoch" \
    --description "Migrated from daily snapshots"

3. Register Existing Tables

# Register each table you want to version
epoch table add customers \
    --location "postgresql://localhost/mydb/public.customers"

epoch table add orders \
    --location "postgresql://localhost/mydb/public.orders"

epoch table add products \
    --location "postgresql://localhost/mydb/public.products"

4. Create Initial Commit

epoch commit -m "Initial import from production snapshot 2024-01-15"

5. Tag Important Snapshots

If you have historical snapshots you want to reference:

# Tag the current state
epoch tag snapshot-2024-01-15 \
    --message "Migration baseline from daily snapshot"

6. Import Historical Snapshots (Optional)

If you have historical data you want to preserve:

import asyncio
from horizon_epoch import Client, Author

async def import_snapshots():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # For each historical snapshot
        for snapshot_date in historical_dates:
            # Restore snapshot to a temporary database
            restore_snapshot(snapshot_date, temp_db)

            # Create a branch for this point in time
            branch_name = f"history/{snapshot_date}"
            await client.branch(branch_name)

            # Register tables from restored snapshot
            # (pointing to temp database)

            # Commit
            await client.commit(
                message=f"Historical snapshot: {snapshot_date}",
                author=Author(name="Migration", email="ops@example.com")
            )

            # Tag for easy reference
            await client.tag_create(
                name=f"snapshot-{snapshot_date}",
                message="Imported from backup"
            )

asyncio.run(import_snapshots())

Mapping Snapshot Workflows

Daily Backup Replacement

Before:

# Nightly cron job
pg_dump mydb > /backups/mydb-$(date +%Y%m%d).sql

After:

# Nightly cron job
epoch commit -m "Daily snapshot $(date +%Y-%m-%d)"
epoch tag daily-$(date +%Y%m%d)

Pre-Change Backup

Before:

# Before making changes
pg_dump mydb > /backups/pre-migration.sql
# Make changes
# If something goes wrong, restore from pre-migration.sql

After:

# Before making changes
epoch tag pre-migration-$(date +%Y%m%d)
# Make changes
epoch commit -m "Applied migration X"
# If something goes wrong
epoch reset --hard pre-migration-$(date +%Y%m%d)

Environment Copies

Before:

# Create staging from production
pg_dump prod_db | psql staging_db

After:

# Create staging branch from production
epoch branch create staging --from main
# Staging now has zero-copy access to production data

Handling Large Datasets

For very large databases:

1. Incremental Registration

# Critical tables first
epoch table add customers orders products
epoch commit -m "Core business tables"

# Then secondary tables
epoch table add logs analytics events
epoch commit -m "Add operational tables"

2. Exclude Large/Non-Critical Tables

Some tables might not need versioning:

# Configure exclusions via CLI
# epoch config set exclude_patterns "audit_logs_*,*_archive,temp_*"

# Or skip these tables when registering
# Only register the tables you want to version

3. Use Partitioned Commits

For very large initial imports:

# Commit in chunks
epoch commit --tables customers,orders -m "Batch 1: customer data"
epoch commit --tables products,inventory -m "Batch 2: product data"

Validation

After migration, verify everything works:

# Check repository status
epoch status

# View commit history
epoch log

# Verify tables are registered
epoch table list

# Test branching
epoch branch create test-migration
epoch checkout test-migration
# Make a small change, commit, merge back
epoch checkout main
epoch merge test-migration
epoch branch delete test-migration

Team Training

Key concepts to communicate:

Branches replace copies - No more copying entire databases
Commits are lightweight - Only changes are stored
Tags mark important points - Like naming a backup
Merging combines changes - No more manual comparison

Rollback Plan

If migration doesn’t go smoothly:

Your existing snapshot workflow still works
Horizon Epoch metadata is separate from your data
Remove Horizon Epoch without affecting production:
```
# Remove metadata database
dropdb horizon_epoch
```

Next Steps

Branching Workflow - Learn the new workflow
Environment Promotion - Replace environment copies

Multi-Backend Setup

Configure Horizon Epoch to work with multiple storage backends simultaneously.

Overview

Horizon Epoch can manage data across different storage systems:

Multiple PostgreSQL databases
Multiple S3 buckets
Mixed PostgreSQL + S3 environments
Cross-region or cross-account configurations

Architecture

┌─────────────────────────────────────────────────┐
│           Horizon Epoch Metadata                │
│         (Single PostgreSQL Database)            │
└─────────────────────┬───────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
┌───────────┐  ┌───────────┐  ┌───────────┐
│ PostgreSQL│  │ PostgreSQL│  │    S3     │
│  (users)  │  │  (orders) │  │  (events) │
└───────────┘  └───────────┘  └───────────┘

Configuration

Multiple PostgreSQL Databases

import asyncio
from horizon_epoch import Client, StorageBackend

async def setup_multi_backend():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        await client.init("multi-backend-repo")

        # Add multiple PostgreSQL backends
        await client.add_storage(
            name="users_db",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://localhost/users"}
        )

        await client.add_storage(
            name="orders_db",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://prod-orders.cluster.rds.amazonaws.com/orders"}
        )

        await client.add_storage(
            name="analytics_db",
            backend=StorageBackend.POSTGRESQL,
            config={"url": "postgresql://analytics.internal/warehouse"}
        )

asyncio.run(setup_multi_backend())

Multiple S3 Buckets

# Add multiple S3 backends
await client.add_storage(
    name="raw_data",
    backend=StorageBackend.S3,
    config={"bucket": "company-raw-data", "region": "us-east-1"}
)

await client.add_storage(
    name="processed_data",
    backend=StorageBackend.S3,
    config={"bucket": "company-processed", "region": "us-west-2"}
)

await client.add_storage(
    name="archive",
    backend=StorageBackend.S3,
    config={"bucket": "company-archive", "region": "eu-west-1"}
)

Mixed PostgreSQL + S3

# Add both PostgreSQL and S3 backends
await client.add_storage(
    name="transactional",
    backend=StorageBackend.POSTGRESQL,
    config={"url": "postgresql://localhost/production"}
)

await client.add_storage(
    name="datalake",
    backend=StorageBackend.S3,
    config={"bucket": "company-datalake"}
)

Registering Tables

Specify the storage backend when registering tables:

from horizon_epoch.client import _native

# PostgreSQL table
loc = _native.StorageLocation.postgresql("users_db", "public", "users")
await client.track_table("users", loc)

# Different PostgreSQL database
loc = _native.StorageLocation.postgresql("orders_db", "public", "orders")
await client.track_table("orders", loc)

# S3 Delta table
loc = _native.StorageLocation.s3("datalake", "delta/events")
await client.track_table("events", loc)

CLI Registration

# PostgreSQL
epoch table add users --location "postgresql://users_db/public.users"

# S3
epoch table add events --location "s3://datalake/delta/events"

Configuration File

# epoch.toml
[metadata]
url = "postgresql://localhost/horizon_epoch"

# PostgreSQL backends
[storage.postgres.users_db]
url = "postgresql://localhost/users"

[storage.postgres.orders_db]
url = "postgresql://prod-orders.cluster.rds.amazonaws.com/orders"
aws_secret_id = "horizon-epoch/orders-db"  # Credentials from Secrets Manager

[storage.postgres.analytics]
host = "analytics.internal"
database = "warehouse"
vault_path = "secret/data/analytics-db"  # Credentials from Vault

# S3 backends
[storage.s3.raw]
bucket = "company-raw-data"
region = "us-east-1"

[storage.s3.processed]
bucket = "company-processed"
region = "us-west-2"
assume_role_arn = "arn:aws:iam::123456789012:role/DataAccess"

[storage.s3.archive]
bucket = "company-archive"
region = "eu-west-1"
endpoint = "https://s3.eu-west-1.amazonaws.com"

Cross-Storage Operations

Branching

Branches span all registered tables:

# Creates branch affecting all tables
epoch branch create feature/new-schema

Commits

Commits can include changes across backends:

# Commit changes from any backend
epoch commit -m "Update user preferences and events schema"

Diff

Compare changes across storage types:

epoch diff main feature/new-schema

Output:

PostgreSQL (users_db):
  users: 5 modified, 2 added

S3 (datalake):
  events: schema changed, 1000 added

Merge

Merges coordinate across all backends:

epoch merge feature/new-schema

Storage Routing

Default Backend

Set a default storage backend via configuration:

epoch config set default_storage users_db

Explicit Routing

Always specify backend in table location:

# Explicit backend reference via StorageLocation
loc = _native.StorageLocation.postgresql("users_db", "public", "users")
await client.track_table("users", loc)

Cross-Account S3

await client.add_storage(
    name="partner_data",
    backend=StorageBackend.S3,
    config={
        "bucket": "partner-shared-bucket",
        "assume_role_arn": "arn:aws:iam::999888777666:role/HorizonEpochAccess",
        "external_id": "partner-integration-id"
    }
)

Best Practices

1. Separate Metadata from Data

The metadata database should be separate from data storage:

# Connect to dedicated metadata database
async with Client.connect("postgresql://metadata.internal/horizon_epoch") as client:
    # Add production data storage (different server)
    await client.add_storage(
        name="production",
        backend=StorageBackend.POSTGRESQL,
        config={"url": "postgresql://data.internal/production"}
    )

2. Use Consistent Naming

# Good: Clear, consistent names
await client.add_storage("prod_users", StorageBackend.POSTGRESQL, {...})
await client.add_storage("prod_orders", StorageBackend.POSTGRESQL, {...})
await client.add_storage("prod_events", StorageBackend.S3, {...})

# Avoid: Inconsistent or unclear names like "db1" or "bucket"

3. Document Backend Purpose

# epoch.toml
# Users database - primary transactional storage
[storage.postgres.users]
url = "postgresql://..."

# Orders database - order processing system
[storage.postgres.orders]
url = "postgresql://..."

# Event lake - analytics events from all services
[storage.s3.events]
bucket = "company-events"

4. Handle Partial Failures

Operations across multiple backends may partially fail:

try:
    result = client.commit(message="Multi-backend update")
except PartialCommitError as e:
    print(f"Committed to: {e.successful_backends}")
    print(f"Failed on: {e.failed_backends}")
    # Handle recovery

Troubleshooting

Backend Not Found

Error: Storage backend 'mydb' not found

Verify backend is configured in config
Check backend name spelling

Cross-Backend Transaction

Warning: Cross-backend operations are not atomic

Horizon Epoch provides best-effort consistency
Use single backend for strict atomicity requirements

Permission Mismatch

Error: Access denied on backend 'orders_db'

Verify credentials for each backend
Check each backend has required permissions

Next Steps

Branch-Aware Queries - Query across backends
Configuration Reference - Full options
Architecture - Design details

Branch-Aware Queries

Query data from specific branches, compare between branches, and understand how branch context affects data access.

Overview

When you query data through Horizon Epoch:

Each branch shows its own version of the data
Changes on a branch don’t affect other branches
You can query multiple branches simultaneously

Setting Branch Context

CLI

# Set current branch
epoch checkout feature/new-schema

# All subsequent operations use this branch
epoch query "SELECT * FROM users"

Python SDK

import asyncio
from horizon_epoch import Client

async def query_branch():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Checkout the branch
        await client.checkout("feature/new-schema")

        # Get status to see current branch
        status = await client.status()
        print(f"On branch: {status.branch}")

        # Get data diff between branches
        diff = await client.diff(
            base="main",
            target="feature/new-schema"
        )
        for table_diff in diff.table_diffs:
            print(f"{table_diff.table_name}: {table_diff.total_changes}")

asyncio.run(query_branch())

How Branch Queries Work

Copy-on-Write Overlay

When you query a branch:

Base data comes from the parent branch
Branch-specific changes overlay the base
You see the combined result

main:     users = [Alice, Bob, Charlie]
                            │
feature:  overlay = [Charlie → Charles]
                            │
query:    result = [Alice, Bob, Charles]

Query Flow

-- On branch 'feature/update-names'
SELECT * FROM users WHERE id = 3;

-- Horizon Epoch executes:
-- 1. Check branch overlay for id=3
-- 2. If found, return overlay version
-- 3. If not found, return base version

Comparing Branches

Diff Query

Compare data between two branches:

# Get records that differ between branches
diff = client.diff_query(
    table="users",
    source="main",
    target="feature/updates"
)

for change in diff:
    print(f"{change.type}: {change.record}")

Side-by-Side Comparison

# Query both branches
main_data = client.query("SELECT * FROM users", branch="main")
feature_data = client.query("SELECT * FROM users", branch="feature/updates")

# Compare programmatically
import pandas as pd
main_df = pd.DataFrame(main_data)
feature_df = pd.DataFrame(feature_data)
comparison = main_df.compare(feature_df)

CLI Comparison

# Show diff
epoch diff main feature/updates --table users

# Show records only in one branch
epoch diff main feature/updates --only-in-target

Querying Multiple Branches

Union Across Branches

# Get all unique records across branches
results = client.query_branches(
    query="SELECT * FROM users",
    branches=["main", "feature/a", "feature/b"],
    mode="union"
)

Compare Aggregate Results

# Run same query on multiple branches
for branch in ["main", "staging", "production"]:
    result = client.query(
        "SELECT COUNT(*) as user_count FROM users",
        branch=branch
    )
    print(f"{branch}: {result[0]['user_count']} users")

Time-Travel Queries

Query data at a specific point in time:

By Commit ID

# Query at specific commit
client.query(
    "SELECT * FROM users",
    at_commit="abc123f"
)

By Tag

# Query at tagged version
client.query(
    "SELECT * FROM users",
    at_tag="v1.0.0"
)

By Timestamp

from datetime import datetime, timedelta

# Query data as of yesterday
yesterday = datetime.now() - timedelta(days=1)
client.query(
    "SELECT * FROM users",
    at_time=yesterday
)

CLI Time-Travel

# Query at commit
epoch query "SELECT * FROM users" --at abc123f

# Query at tag
epoch query "SELECT * FROM users" --at v1.0.0

# Query at time
epoch query "SELECT * FROM users" --at "2024-01-15 10:00:00"

Performance Considerations

Branch Depth

Deeply nested branches may be slower:

main → dev → feature → sub-feature → experiment

Each level adds overlay resolution. Keep branches shallow when possible.

Indexing

Branch overlays use primary keys for efficient lookup:

# Fast: uses primary key
client.query("SELECT * FROM users WHERE id = 123", branch="feature")

# Slower: scans overlay + base
client.query("SELECT * FROM users WHERE name = 'Alice'", branch="feature")

Caching

Query caching can be configured via the CLI:

epoch config set query_cache_enabled true
epoch config set query_cache_ttl 300

Branch Isolation Guarantees

Read Isolation

Each branch sees consistent data:

Changes on other branches are invisible
Commits create point-in-time snapshots
Queries always return consistent results

Write Isolation

Changes on a branch don’t affect others until merged:

# On feature branch
await client.checkout("feature")
# Make changes via your database connection...
await client.stage_all()
await client.commit(message="Deactivate user 1")

# Main branch is unchanged - checkout and diff to compare
await client.checkout("main")
# Main still has the original data

Common Patterns

A/B Testing

# Set up test branches
await client.branch("experiment/new-algo", start_point="main")

# Run algorithm on each branch
for branch in ["main", "experiment/new-algo"]:
    await client.checkout(branch)
    run_algorithm()
    results = measure_metrics()
    print(f"{branch}: {results}")

Data Validation

# Validate changes before merge
await client.checkout("feature/migration")

# Use diff to check for issues
diff = await client.diff(base="main", target="feature/migration")
for table_diff in diff.table_diffs:
    if table_diff.total_changes > 1000:
        raise ValidationError(f"Too many changes in {table_diff.table_name}")

Environment Comparison

# Compare data across environments
for env in ["development", "staging", "production"]:
    result = client.query(
        "SELECT COUNT(*) as count, MAX(updated_at) as latest FROM orders",
        branch=env
    )
    print(f"{env}: {result}")

Troubleshooting

Wrong Branch Data

Expected data from 'feature' but got 'main' data

Verify branch context is set correctly
Check if query specifies explicit branch
Verify commit was made on correct branch

Missing Changes

Can't see changes I just made

Changes must be committed to be visible in queries
Uncommitted changes are in working state only
Use epoch status to see uncommitted changes

Slow Queries

Reduce branch nesting depth
Ensure queries use primary keys when possible
Consider materializing long-lived branches

Next Steps

Copy-on-Write - How branch isolation works
Merge Algorithm - Combining branch changes
Time-Travel Queries - More on historical queries

How to: Schema Versioning

This guide covers schema versioning workflows including upgrading existing repositories, migrating schemas, handling rollbacks, and working with schemas across multiple branches.

Prerequisites

Horizon Epoch installed (Installation Guide)
PostgreSQL running with a repository configured
Familiarity with basic version control concepts

Understanding Schema Versioning

Horizon Epoch versions database schemas alongside your data. Every commit captures:

Column definitions: Name, type, nullability, defaults
Constraints: Foreign keys, unique, check, exclusion
Indexes: B-tree, GIN, GiST, BRIN, hash
Enum types: Custom PostgreSQL enums

Schema snapshots are stored as JSONB in the table_versions table, enabling full schema history and three-way merge for schema changes.

Upgrading Existing Repositories

If you have an existing Horizon Epoch repository created before schema versioning was added, you can backfill schema information for historical commits.

Step 1: Check Current Status

# See which tables have schema data
epoch schema status

# Output:
# Table      | Latest Schema | Historical | Live DB  | Status
# -----------|---------------|------------|----------|--------
# users      | Yes           | Partial    | Synced   | OK
# orders     | Yes           | No         | Synced   | Needs backfill
# products   | No            | No         | Drifted  | Needs migration

Step 2: Backfill Historical Schemas

For tables where live database is accessible, backfill historical commits:

# Backfill schema for specific table
epoch migrate-schemas users --backfill

# Backfill all tables with missing schemas
epoch migrate-schemas --backfill --all

# Preview what would be backfilled (dry-run)
epoch migrate-schemas --backfill --all --dry-run

What happens during backfill:

For each commit without schema data, Epoch checks if the table location is accessible
If accessible, it introspects the current schema and stores it
If not accessible, the commit is marked as “schema unavailable”

Step 3: Verify Migration

# Verify all tables have schema data
epoch schema status

# View schema at a specific commit
epoch schema show users --at abc123f

Migration Guide for Existing Repositories

Scenario: Adding Schema Versioning to Production

If you have a production repository and want to enable full schema versioning:

1. Create a Baseline

# Create a new commit that captures current schema
epoch commit -m "Capture schema baseline" --include-schema

This creates a commit with full schema snapshots for all registered tables.

2. Update Your Workflow

From this point forward, all commits automatically include schema snapshots.

# Normal workflow - schema is captured automatically
epoch branch create feature/add-column
# ... make schema changes ...
epoch commit -m "Add email_verified column"

3. Handle Historical Data (Optional)

If you need schema history for older commits:

# Option A: Backfill from live database (assumes schema hasn't changed)
epoch migrate-schemas --backfill --since "2024-01-01"

# Option B: Import from external migrations
epoch migrate-schemas --import-from flyway ./migrations
epoch migrate-schemas --import-from liquibase ./changelog.xml

Rollback Procedures

Rolling Back Schema Changes

If a schema change causes issues, you can roll back using several methods:

Method 1: Revert the Commit

Create a new commit that reverses the schema change:

# Revert the problematic commit
epoch revert abc123f

# This creates a new commit with:
# - Data changes reversed
# - Schema changes reversed
# - DDL generated to undo schema modifications

Method 2: Reset to Previous Commit

Reset the branch to a known-good state:

# Soft reset (keep changes for review)
epoch reset --soft HEAD~1

# Hard reset (discard changes)
epoch reset --hard HEAD~1

# Reset to specific commit
epoch reset --hard v1.0.0

After reset, apply the DDL to synchronize the live database:

# Generate rollback DDL
epoch schema diff HEAD v1.0.0 --show-ddl > rollback.sql

# Review and apply
psql -f rollback.sql

Method 3: Checkout and Apply

Create a new branch from a known-good state:

# Create branch from good state
epoch branch create hotfix/rollback --from v1.0.0

# Generate DDL to bring live DB to that state
epoch schema apply users --from v1.0.0 --dry-run

# Apply after review
epoch schema apply users --from v1.0.0

Handling Rollback Failures

Some schema changes cannot be automatically rolled back:

Change Type	Rollback Difficulty	Notes
Add column	Easy	DROP COLUMN
Add index	Easy	DROP INDEX
Add constraint	Easy	DROP CONSTRAINT
Drop column	Data Loss	Column data is gone
Change column type	May fail	Data may not fit old type
Remove enum value	May fail	Rows may use removed value

For destructive changes, consider:

# 1. Check for data dependencies
epoch schema diff HEAD~1 HEAD --check-data-impact

# 2. Create data backup before proceeding
epoch export users --at HEAD~1 > users_backup.parquet

# 3. Apply rollback with force flag
epoch schema apply users --from HEAD~1 --force

Multi-Branch Schema Workflows

Feature Branch Schema Changes

When developing schema changes on a feature branch:

# 1. Create feature branch
epoch branch create feature/add-preferences

# 2. Make schema changes
psql -c "ALTER TABLE users ADD COLUMN preferences JSONB"

# 3. Commit (schema captured automatically)
epoch commit -m "Add user preferences column"

# 4. Preview merge
epoch merge main --dry-run
# Shows: Schema changes to apply on main

# 5. Merge when ready
epoch merge main

Parallel Schema Development

When multiple branches modify the same table:

# Branch A adds column
epoch checkout feature-a
psql -c "ALTER TABLE users ADD COLUMN theme VARCHAR(20)"
epoch commit -m "Add theme column"

# Branch B adds different column
epoch checkout feature-b
psql -c "ALTER TABLE users ADD COLUMN locale VARCHAR(10)"
epoch commit -m "Add locale column"

# Merge A into main
epoch checkout main
epoch merge feature-a  # Success

# Merge B into main
epoch merge feature-b  # Also success - no conflict

Both columns are added because they don’t conflict.

Schema Conflict Resolution

When branches modify the same constraint:

# Branch A modifies check constraint
epoch checkout feature-a
psql -c "ALTER TABLE users DROP CONSTRAINT chk_age"
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 18)"
epoch commit -m "Require adult users"

# Branch B modifies same constraint differently
epoch checkout feature-b
psql -c "ALTER TABLE users DROP CONSTRAINT chk_age"
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 13)"
epoch commit -m "Allow teen users"

# Attempt merge
epoch checkout main
epoch merge feature-a  # Success
epoch merge feature-b  # CONFLICT

# Resolve conflict
epoch conflicts show
# CONFLICT (constraint): chk_age modified in both branches

# Choose resolution
epoch conflicts resolve --constraint chk_age --custom "CHECK (age >= 16)"
epoch commit -m "Merge feature-b with compromise age check"

Environment Promotion with Schema

When promoting schema changes between environments:

# 1. Develop on dev branch
epoch checkout dev
# ... make changes ...
epoch commit -m "Schema updates for v2.0"

# 2. Generate migration for staging
epoch schema diff staging dev --show-ddl > staging_migration.sql

# 3. Review migration
cat staging_migration.sql

# 4. Apply to staging environment
PGHOST=staging-db psql -f staging_migration.sql

# 5. Merge to staging branch
epoch checkout staging
epoch merge dev

# 6. Repeat for production
epoch schema diff production staging --show-ddl > prod_migration.sql

Best Practices

1. Always Review Schema Diffs

# Before merging, check schema changes
epoch diff main feature-branch --schema

# Generate DDL preview
epoch schema diff main feature-branch --show-ddl

2. Use Descriptive Commit Messages

# Bad
epoch commit -m "Update schema"

# Good
epoch commit -m "Add email_verified column with default false for GDPR compliance"

3. Test Migrations on Branch Data

# Create test branch
epoch branch create test/migration-validation

# Apply migration
epoch schema apply users --from feature-branch --dry-run

# Validate
epoch validate --table users

4. Handle Large Tables Carefully

# For large tables, use online mode
epoch schema apply users --from feature-branch --online

# Monitor progress
epoch schema apply --status

5. Version Your Enum Changes Carefully

# Adding values is safe
psql -c "ALTER TYPE status ADD VALUE 'archived'"

# Removing values requires migration
# See: docs/how-to/ddl-migrations.md

Troubleshooting

Schema Not Captured

If schema isn’t being captured in commits:

# Check table is registered
epoch table list

# Re-register with schema introspection
epoch table add users --location "postgresql://mydb/public.users" --introspect-schema

Schema Drift Detected

If live database differs from stored schema:

# See differences
epoch schema diff users --live

# Capture current state
epoch commit -m "Sync schema from live database" --rescan-schema

# Or apply stored schema to database
epoch schema apply users --from HEAD --force

Backfill Fails

If schema backfill fails:

# Check connectivity
epoch doctor

# Try specific table
epoch migrate-schemas users --backfill --verbose

# Skip inaccessible tables
epoch migrate-schemas --backfill --skip-inaccessible

How to: DDL Migrations

This guide covers generating DDL migration scripts, integrating with external migration tools, and safely applying schema changes to production databases.

Prerequisites

Horizon Epoch installed (Installation Guide)
PostgreSQL database with schema versioning enabled
Understanding of DDL and database migrations

Generating Migrations with PostgresDdlGenerator

Horizon Epoch includes a DDL generator that produces properly ordered PostgreSQL migration scripts.

Basic Migration Generation

# Generate migration between branches
epoch schema diff main feature-branch --show-ddl

# Output migration to file
epoch schema diff main feature-branch --ddl-output migrations/V002__add_preferences.sql

Migration Script Structure

Generated migrations include both UP and DOWN sections:

-- Migration: main -> feature/add-preferences
-- Generated: 2024-12-18T10:30:00Z

-- ============================================================
-- UP: Apply changes
-- ============================================================

-- Create new enum types first
CREATE TYPE notification_type AS ENUM ('email', 'sms', 'push');

-- Drop constraints that will be modified
ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;

-- Add new columns
ALTER TABLE users ADD COLUMN preferences JSONB;
ALTER TABLE users ADD COLUMN notification_pref notification_type DEFAULT 'email';

-- Add new constraints
ALTER TABLE users ADD CONSTRAINT chk_users_email
    CHECK (email ~ '^[^@]+@[^@]+\.[^@]+$');

-- Create indexes
CREATE INDEX CONCURRENTLY idx_users_preferences
    ON users USING gin (preferences);

-- ============================================================
-- DOWN: Reverse changes
-- ============================================================

DROP INDEX CONCURRENTLY IF EXISTS idx_users_preferences;

ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;

ALTER TABLE users DROP COLUMN IF EXISTS notification_pref;
ALTER TABLE users DROP COLUMN IF EXISTS preferences;

-- WARNING: Cannot remove enum values in PostgreSQL
-- Enum 'notification_type' preserved (manual cleanup required)

Statement Ordering

The DDL generator follows a specific order to handle dependencies:

UP (Apply) Order:

Create new enum types (before columns using them)
Add values to modified enums
Drop constraints being modified/removed (FK first, then others)
Drop indexes being modified/removed
Handle primary key changes
Add/modify columns
Add new constraints (check, unique, exclusion, then FK last)
Create new indexes

DOWN (Reverse) Order:

Drop new indexes
Drop new constraints (FK first)
Drop new columns
Re-add dropped columns (if data preserved)
Re-add dropped constraints
Re-add dropped indexes

DDL Safety Features

Dry-Run Mode

Preview changes without applying them:

# Show what would be executed
epoch schema apply users --from feature-branch --dry-run

# Output:
# Dry-run mode - no changes will be applied
#
# Statements to execute:
#   1. ALTER TABLE users ADD COLUMN preferences JSONB
#   2. CREATE INDEX CONCURRENTLY idx_users_preferences ON users USING gin (preferences)
#
# Estimated impact:
#   Table size: 2.3 GB
#   Row count: 15,000,000
#   Recommended: Use --online mode for large table

Destructive Operation Warnings

The DDL generator identifies and warns about destructive operations:

epoch schema diff main feature-branch --show-ddl

# WARNING: Destructive operations detected:
#   - DROP COLUMN: users.legacy_field
#   - ALTER COLUMN TYPE: orders.amount (numeric -> integer, may lose precision)
#   - DROP INDEX: idx_users_legacy
#
# Use --force to generate DDL anyway

Destructive Operation Types:

Operation	Risk Level	Description
DROP COLUMN	High	Data permanently lost
DROP TABLE	Critical	Table and all data lost
ALTER COLUMN TYPE	Medium	May fail or lose precision
DROP CONSTRAINT	Low	Constraint removed
DROP INDEX	Low	Index removed (can rebuild)
DROP TYPE	Medium	Enum type removed
TRUNCATE TABLE	Critical	All data removed

Lock Timeout Configuration

Prevent long-running locks on production tables:

# Set lock timeout (default: 30 seconds)
epoch schema apply users --from feature-branch --lock-timeout 10s

# Set statement timeout (default: 5 minutes)
epoch schema apply users --from feature-branch --statement-timeout 2m

Retry Logic

Configure automatic retries for transient failures:

# Retry up to 5 times with exponential backoff
epoch schema apply users --from feature-branch --retries 5

# Configure backoff
epoch schema apply users --from feature-branch \
    --retries 5 \
    --initial-delay 500ms \
    --max-delay 30s

Integration with External Tools

pt-online-schema-change (Percona Toolkit for MySQL)

While Horizon Epoch primarily targets PostgreSQL, you can generate compatible scripts for MySQL migrations:

# Generate MySQL-compatible DDL
epoch schema diff main feature-branch --dialect mysql --ddl-output migration.sql

# Use with pt-online-schema-change
pt-online-schema-change \
    --execute \
    --alter "ADD COLUMN preferences JSON" \
    D=mydb,t=users

Recommended workflow:

Generate migration with Epoch
Extract ALTER TABLE statements
Apply using pt-online-schema-change for large tables
Update Epoch commit after successful application

# After applying via pt-osc
epoch commit -m "Applied preferences column via pt-osc" --rescan-schema

pg_repack (PostgreSQL)

For table reorganization and bloat reduction:

# Identify tables needing repack
epoch schema status --check-bloat

# Output:
# Table      | Size    | Dead Tuples | Bloat % | Recommendation
# -----------|---------|-------------|---------|----------------
# users      | 2.3 GB  | 150,000     | 15%     | Consider pg_repack
# orders     | 8.1 GB  | 2,500,000   | 28%     | Recommended

# Run pg_repack
pg_repack -d mydb -t users

# Rescan schema after repack
epoch commit -m "Reorganized users table" --rescan-schema

Using pg_repack with index changes:

# Generate index DDL
epoch schema diff main feature-branch --indexes-only --ddl-output indexes.sql

# Create indexes with pg_repack (minimal locking)
pg_repack -d mydb -t users --only-indexes

# Or use CONCURRENTLY option in Epoch
epoch schema apply users --from feature-branch --online

Flyway Integration

Export migrations for Flyway:

# Generate Flyway-compatible migration
epoch schema diff main feature-branch \
    --format flyway \
    --ddl-output db/migration/V2__add_preferences.sql

# Migration file follows Flyway naming convention
cat db/migration/V2__add_preferences.sql

Liquibase Integration

Export migrations for Liquibase:

# Generate Liquibase changeset
epoch schema diff main feature-branch \
    --format liquibase \
    --ddl-output db/changelog/changes/002-add-preferences.xml

Example output:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog">
    <changeSet id="002-add-preferences" author="epoch">
        <addColumn tableName="users">
            <column name="preferences" type="JSONB"/>
        </addColumn>
        <createIndex indexName="idx_users_preferences" tableName="users">
            <column name="preferences" type="gin"/>
        </createIndex>
        <rollback>
            <dropIndex indexName="idx_users_preferences" tableName="users"/>
            <dropColumn tableName="users" columnName="preferences"/>
        </rollback>
    </changeSet>
</databaseChangeLog>

Online Mode for Production

Using CONCURRENTLY

For indexes on large tables, use online mode to avoid blocking:

# Create indexes without blocking writes
epoch schema apply users --from feature-branch --online

# This generates:
# CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
#
# Instead of:
# CREATE INDEX idx_users_email ON users (email);

Limitations of CONCURRENTLY

Cannot run inside a transaction
May take longer than regular index creation
Requires more resources (CPU, I/O)
Index must be validated after creation

# Check index validity after CONCURRENTLY creation
epoch schema status users --check-indexes

# Output:
# Index                    | Valid | Size   | Last Used
# -------------------------|-------|--------|----------
# idx_users_email          | Yes   | 45 MB  | 2 min ago
# idx_users_preferences    | No    | 0 MB   | Never

If an index is invalid:

# Reindex with CONCURRENTLY
epoch schema apply users --reindex idx_users_preferences --online

Transactional DDL

PostgreSQL supports transactional DDL, allowing atomic migrations:

# Apply migration in a transaction (default)
epoch schema apply users --from feature-branch

# Disable transaction (required for CONCURRENTLY)
epoch schema apply users --from feature-branch --no-transaction --online

What happens on failure:

# With transaction (default):
# - All statements run in one transaction
# - On failure, all changes rolled back
# - Database remains unchanged

# Without transaction (--no-transaction):
# - Statements run individually
# - On failure, partial changes may remain
# - Manual cleanup may be needed

Table Size Considerations

Small Tables (< 100MB)

Standard migration is safe:

epoch schema apply users --from feature-branch

Medium Tables (100MB - 1GB)

Use lock timeouts:

epoch schema apply users --from feature-branch \
    --lock-timeout 10s \
    --retries 3

Large Tables (1GB - 10GB)

Use online mode:

epoch schema apply users --from feature-branch \
    --online \
    --statement-timeout 30m

Very Large Tables (> 10GB)

Consider specialized tools:

# 1. Generate DDL
epoch schema diff main feature-branch --ddl-output migration.sql

# 2. Review for large table operations
epoch schema diff main feature-branch --analyze-impact

# 3. Use pg_repack or manual process
pg_repack -d mydb -t users --no-superuser-check

Python SDK Usage

import asyncio
from horizon_epoch import Client

async def generate_migration():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Get constraint diff between branches
        diff = await client.diff_constraints("main", "feature-branch", table_name="users")

# Generate migration
generator = PostgresDdlGenerator()
migration = generator.generate_migration("users", diff.constraints)

# Print statements
print("UP statements:")
for stmt in migration.up:
    print(f"  {stmt};")

print("\nDOWN statements:")
for stmt in migration.down:
    print(f"  {stmt};")

# Check for warnings
if migration.warnings:
    print("\nWARNINGS:")
    for warn in migration.warnings:
        print(f"  - {warn}")

# Write to file
migration.write_to_file("migrations/V002__update_users.sql")

# Execute with safety features
from horizon_epoch.ddl import DdlExecutor, ExecutionConfig

executor = DdlExecutor(client.connection)
config = ExecutionConfig(
    dry_run=True,
    lock_timeout_seconds=10,
    statement_timeout_seconds=300,
    online=True,
)

result = executor.execute(migration.up, config)
print(f"Executed {result.statements_run} statements")

Best Practices

1. Always Use Dry-Run First

epoch schema apply users --from feature-branch --dry-run

2. Backup Before Destructive Changes

# Export table data
epoch export users --format parquet > backup/users_$(date +%Y%m%d).parquet

# Then apply
epoch schema apply users --from feature-branch --force

3. Use Online Mode for Production

epoch schema apply users --from feature-branch --online

4. Set Appropriate Timeouts

epoch schema apply users --from feature-branch \
    --lock-timeout 10s \
    --statement-timeout 5m

5. Monitor Long-Running Migrations

# In another terminal
epoch schema apply --status

# Or query PostgreSQL directly
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE query LIKE 'ALTER%' OR query LIKE 'CREATE INDEX%';

Credential Providers

Horizon Epoch supports multiple methods for providing credentials to storage backends. This guide covers when to use each provider and how to configure them.

Overview

Credential providers abstract how secrets are retrieved:

Static - Hardcoded credentials (dev/testing only)
Environment - From environment variables
File - From files on disk
Vault - From HashiCorp Vault
AWS - From AWS Secrets Manager or IAM

Static Credentials

Use for development and testing only.

from horizon_epoch import Config

config = Config(
    metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
    name="mydb",
    url="postgresql://user:password@localhost/db"
)

# epoch.toml
[storage.postgres.mydb]
url = "postgresql://user:password@localhost/db"

Environment Variables

Recommended for containers and CI/CD.

export EPOCH_POSTGRES_MYDB_URL="postgresql://user:pass@host/db"
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."

config.add_postgres(
    name="mydb",
    url="${EPOCH_POSTGRES_MYDB_URL}"  # Variable substitution
)

# epoch.toml - variables are substituted at runtime
[storage.postgres.mydb]
url = "${EPOCH_POSTGRES_MYDB_URL}"

File-Based Credentials

For credentials managed by external tools (e.g., Kubernetes secrets).

config.add_postgres(
    name="mydb",
    credentials_file="/run/secrets/db-password"
)

# epoch.toml
[storage.postgres.mydb]
host = "db.example.com"
database = "production"
username = "app_user"
password_file = "/run/secrets/db-password"

File format (JSON):

{
    "password": "secret-value"
}

Or plain text:

secret-value

Credential Caching

Credentials are cached to reduce external calls:

config = Config(
    credential_cache_ttl=300,  # 5 minutes
    credential_cache_max_size=100
)

For frequently rotating credentials:

config = Config(
    credential_cache_ttl=60,  # 1 minute
    credential_refresh_before_expiry=30  # Refresh 30s before expiry
)

Credential Refresh

Some providers support automatic refresh:

# AWS credentials auto-refresh when using IAM roles
config.add_s3(
    name="datalake",
    bucket="my-bucket",
    use_instance_credentials=True  # Auto-refresh enabled
)

Provider Priority

When multiple credential sources are available:

Explicit configuration (URL, password)
Environment variables
Credential files
External providers (Vault, AWS)
Instance credentials (IAM roles)

Security Best Practices

1. Never Commit Credentials

# .gitignore
epoch.toml
.env
*.pem
*.key

2. Use Separate Credentials Per Environment

# epoch.dev.toml
[storage.postgres.mydb]
url = "${DEV_DATABASE_URL}"

# epoch.prod.toml
[storage.postgres.mydb]
url = "${PROD_DATABASE_URL}"

3. Rotate Credentials Regularly

Use providers that support rotation:

AWS Secrets Manager with rotation
Vault dynamic database credentials
Short-lived IAM credentials

4. Audit Credential Access

Enable logging for credential operations:

config = Config(
    log_credential_access=True,  # Log when credentials are accessed
    log_level="info"
)

5. Use Least Privilege

Create dedicated credentials for Horizon Epoch with only required permissions.

Troubleshooting

Credential Not Found

Error: Credential 'EPOCH_POSTGRES_MYDB_URL' not found

Check environment variable is set
Verify file path exists and is readable
Check Vault/AWS connectivity

Credential Expired

Error: Credential expired and refresh failed

Check IAM role permissions
Verify Vault token is valid
Check system clock sync (for time-based auth)

Permission Denied

Error: Permission denied reading credential file

Check file permissions: chmod 600 /path/to/credential
Verify process user has read access

Next Steps

Vault Integration - HashiCorp Vault setup
AWS Secrets - AWS credential management
Configuration Reference - Full configuration options

HashiCorp Vault Integration

Integrate Horizon Epoch with HashiCorp Vault for secure credential management.

Overview

Vault integration supports:

Static secrets (KV v2)
Dynamic database credentials
PKI certificates for mTLS
Multiple authentication methods

Prerequisites

HashiCorp Vault server (1.x or later)
Vault CLI (for setup)
Network access from Horizon Epoch to Vault

Authentication Methods

Token Authentication

Simplest method, suitable for development:

from horizon_epoch import Config

config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_token="hvs.xxx..."
)

export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="hvs.xxx..."

AppRole Authentication

Recommended for applications:

config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_auth_method="approle",
    vault_role_id="xxx-xxx-xxx",
    vault_secret_id="${VAULT_SECRET_ID}"  # From env or file
)

Setup in Vault:

# Enable AppRole
vault auth enable approle

# Create policy
vault policy write horizon-epoch - <<EOF
path "secret/data/horizon-epoch/*" {
  capabilities = ["read"]
}
path "database/creds/horizon-epoch-*" {
  capabilities = ["read"]
}
EOF

# Create role
vault write auth/approle/role/horizon-epoch \
    token_policies="horizon-epoch" \
    token_ttl=1h \
    token_max_ttl=4h

Kubernetes Authentication

For Kubernetes deployments:

config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_auth_method="kubernetes",
    vault_role="horizon-epoch"
)

AWS IAM Authentication

For AWS environments:

config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_auth_method="aws",
    vault_role="horizon-epoch-ec2"
)

KV Secrets (Static)

Store database credentials in KV v2:

# Store secret
vault kv put secret/horizon-epoch/postgres \
    username=epoch_user \
    password=secret123

config.add_postgres(
    name="mydb",
    host="db.example.com",
    database="production",
    vault_path="secret/data/horizon-epoch/postgres"
)

Dynamic Database Credentials

Vault can generate short-lived database credentials:

# Configure database secrets engine
vault secrets enable database

vault write database/config/postgres \
    plugin_name=postgresql-database-plugin \
    allowed_roles="horizon-epoch-*" \
    connection_url="postgresql://{{username}}:{{password}}@db:5432/production" \
    username="vault" \
    password="vault-password"

vault write database/roles/horizon-epoch-readonly \
    db_name=postgres \
    creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
    default_ttl="1h" \
    max_ttl="24h"

config.add_postgres(
    name="mydb",
    host="db.example.com",
    database="production",
    vault_role="database/creds/horizon-epoch-readonly"
)

PKI Certificates

For mTLS authentication:

# Enable PKI
vault secrets enable pki

# Configure PKI (simplified)
vault write pki/root/generate/internal \
    common_name="Horizon Epoch CA"

vault write pki/roles/horizon-epoch-client \
    allowed_domains="epoch.internal" \
    allow_subdomains=true \
    max_ttl="720h"

config.add_postgres(
    name="mydb",
    host="db.example.com",
    vault_pki_role="pki/issue/horizon-epoch-client",
    sslmode="verify-full"
)

Configuration File

# epoch.toml
[vault]
addr = "https://vault.example.com:8200"
auth_method = "approle"
role_id = "xxx-xxx-xxx"
secret_id_file = "/run/secrets/vault-secret-id"

[storage.postgres.production]
host = "db.example.com"
database = "production"
vault_path = "secret/data/horizon-epoch/postgres"

[storage.postgres.readonly]
host = "db.example.com"
database = "production"
vault_role = "database/creds/horizon-epoch-readonly"

Troubleshooting

Permission Denied

Error: permission denied on path "secret/data/..."

Check policy is attached to token/role
Verify path is correct (KV v2 uses secret/data/ prefix)
Check Vault audit log for details

Token Expired

Error: Vault token expired

Token TTL may be too short
Enable token renewal in config
Use AppRole for automatic re-authentication

Connection Failed

Error: failed to connect to Vault

Verify VAULT_ADDR is correct
Check network connectivity
Verify TLS certificates if using HTTPS

Certificate Issues

Error: x509: certificate signed by unknown authority

config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_ca_cert="/path/to/ca.crt"
)

Best Practices

Use AppRole or Kubernetes auth in production
Enable audit logging in Vault
Use dynamic credentials when possible
Set appropriate TTLs - short enough for security, long enough for operations
Rotate static secrets regularly

Next Steps

AWS Secrets - Alternative cloud-native solution
mTLS Authentication - Certificate-based auth
Credential Providers - Overview of all providers

AWS Secrets Integration

Integrate Horizon Epoch with AWS Secrets Manager, IAM authentication, and other AWS credential services.

AWS Secrets Manager

Basic Configuration

from horizon_epoch import Config

config = Config(
    aws_region="us-east-1"
).add_postgres(
    name="production",
    host="mydb.cluster-xxx.us-east-1.rds.amazonaws.com",
    database="production",
    aws_secret_id="horizon-epoch/production/postgres"
)

Secret Format

Store credentials in Secrets Manager:

{
    "username": "epoch_user",
    "password": "secret-password",
    "host": "mydb.cluster-xxx.us-east-1.rds.amazonaws.com",
    "port": 5432,
    "database": "production"
}

aws secretsmanager create-secret \
    --name horizon-epoch/production/postgres \
    --secret-string '{"username":"epoch_user","password":"secret-password"}'

Secret Rotation

Enable automatic rotation:

aws secretsmanager rotate-secret \
    --secret-id horizon-epoch/production/postgres \
    --rotation-lambda-arn arn:aws:lambda:us-east-1:xxx:function:SecretsRotation \
    --rotation-rules AutomaticallyAfterDays=30

Horizon Epoch automatically uses refreshed credentials.

RDS IAM Authentication

Connect to RDS without passwords using IAM:

Prerequisites

Enable IAM authentication on RDS:

aws rds modify-db-instance \
    --db-instance-identifier mydb \
    --enable-iam-database-authentication

Create IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "rds-db:connect",
            "Resource": "arn:aws:rds-db:us-east-1:123456789012:dbuser:mydb/epoch_user"
        }
    ]
}

Create database user:

CREATE USER epoch_user WITH LOGIN;
GRANT rds_iam TO epoch_user;

Configuration

config.add_postgres(
    name="production",
    host="mydb.cluster-xxx.us-east-1.rds.amazonaws.com",
    database="production",
    username="epoch_user",
    use_iam_auth=True
)

IAM Roles for S3

Instance Credentials (EC2/ECS/Lambda)

config.add_s3(
    name="datalake",
    bucket="my-data-bucket",
    use_instance_credentials=True  # Uses instance profile
)

Assume Role

For cross-account access:

config.add_s3(
    name="partner-data",
    bucket="partner-bucket",
    assume_role_arn="arn:aws:iam::999888777666:role/HorizonEpochAccess",
    external_id="partner-external-id"  # If required
)

Role Chaining

For complex permission structures:

config.add_s3(
    name="restricted",
    bucket="restricted-bucket",
    assume_roles=[
        "arn:aws:iam::111222333444:role/JumpRole",
        "arn:aws:iam::555666777888:role/TargetRole"
    ]
)

AWS SSO / Identity Center

For development with SSO:

# Configure SSO profile
aws configure sso

# Set profile
export AWS_PROFILE=my-sso-profile

# Or in config
config = Config(
    aws_profile="my-sso-profile"
)

Configuration File

# epoch.toml
[aws]
region = "us-east-1"
profile = "production"  # Optional: use specific profile

[storage.postgres.production]
host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com"
database = "production"
aws_secret_id = "horizon-epoch/production/postgres"

[storage.postgres.analytics]
host = "analytics.cluster-xxx.us-east-1.rds.amazonaws.com"
database = "analytics"
username = "epoch_user"
use_iam_auth = true

[storage.s3.datalake]
bucket = "company-datalake"
assume_role_arn = "arn:aws:iam::123456789012:role/DataLakeAccess"

Required IAM Permissions

Secrets Manager Access

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret"
            ],
            "Resource": "arn:aws:secretsmanager:us-east-1:*:secret:horizon-epoch/*"
        }
    ]
}

S3 Access

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        }
    ]
}

Assume Role

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "arn:aws:iam::*:role/HorizonEpoch*"
        }
    ]
}

Troubleshooting

Access Denied (Secrets Manager)

Error: AccessDeniedException when calling GetSecretValue

Check IAM policy allows secretsmanager:GetSecretValue
Verify secret ARN matches policy
Check if secret has resource-based policy

IAM Auth Token Failed

Error: PAM authentication failed for user

Verify IAM authentication is enabled on RDS
Check user has rds_iam role granted
Verify IAM policy has correct resource ARN

Credentials Expired

Error: ExpiredTokenException

Instance credentials refresh automatically
For assumed roles, check session duration
For SSO, run aws sso login

Best Practices

Use IAM authentication over passwords when possible
Enable secret rotation in Secrets Manager
Use instance roles instead of access keys
Scope permissions narrowly
Enable CloudTrail for audit logging

Next Steps

Vault Integration - Alternative secrets management
Multi-Backend Setup - Multiple AWS accounts
Configuration Reference - Full options

SSH Tunnels

Connect to databases through SSH tunnels for secure access to private networks.

When to Use SSH Tunnels

Database is in a private network (no direct access)
Connecting through a bastion/jump host
Security policy requires encrypted connections
Accessing on-premises databases from cloud

Basic SSH Tunnel

from horizon_epoch import Config

config = Config(
    metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
    name="production",
    host="internal-db.private",  # Internal hostname
    database="production",
    username="epoch_user",
    password="secret",
    ssh_tunnel={
        "host": "bastion.example.com",
        "user": "ubuntu",
        "key_file": "~/.ssh/id_rsa"
    }
)

Configuration Options

SSH Key Authentication

ssh_tunnel={
    "host": "bastion.example.com",
    "port": 22,  # Default
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "key_passphrase": "${SSH_KEY_PASSPHRASE}"  # If key is encrypted
}

SSH Agent

Use keys loaded in SSH agent:

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "use_agent": True
}

SSH Certificate Authentication

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "cert_file": "~/.ssh/id_rsa-cert.pub"
}

Jump Hosts (ProxyJump)

For multi-hop connections:

ssh_tunnel={
    "host": "internal-bastion.private",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "jump_host": {
        "host": "external-bastion.example.com",
        "user": "ubuntu",
        "key_file": "~/.ssh/id_rsa"
    }
}

CLI Configuration

# Environment variables
export EPOCH_SSH_HOST="bastion.example.com"
export EPOCH_SSH_USER="ubuntu"
export EPOCH_SSH_KEY_FILE="~/.ssh/id_rsa"

# epoch.toml
[storage.postgres.production]
host = "internal-db.private"
database = "production"
username = "epoch_user"
password_file = "/run/secrets/db-password"

[storage.postgres.production.ssh_tunnel]
host = "bastion.example.com"
user = "ubuntu"
key_file = "~/.ssh/id_rsa"

Host Key Verification

Strict (Recommended)

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "known_hosts_file": "~/.ssh/known_hosts",
    "strict_host_key_checking": True
}

Accept on First Use

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "strict_host_key_checking": "accept-new"
}

Disable (Not Recommended)

# Only for testing/development
ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "strict_host_key_checking": False
}

Connection Management

Keep-Alive

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "keepalive_interval": 60,  # Send keepalive every 60 seconds
    "keepalive_count_max": 3   # Disconnect after 3 missed keepalives
}

Auto-Reconnection

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "auto_reconnect": True,
    "reconnect_delay": 5,      # Wait 5 seconds between attempts
    "max_reconnect_attempts": 3
}

Connection Timeout

ssh_tunnel={
    "host": "bastion.example.com",
    "user": "ubuntu",
    "key_file": "~/.ssh/id_rsa",
    "connect_timeout": 30  # 30 second timeout
}

Multiple Tunnels

For accessing multiple databases:

config.add_postgres(
    name="prod_users",
    host="users-db.private",
    database="users",
    ssh_tunnel=bastion_config
).add_postgres(
    name="prod_orders",
    host="orders-db.private",
    database="orders",
    ssh_tunnel=bastion_config  # Same tunnel reused
)

Troubleshooting

Connection Refused

Error: ssh: connect to host bastion.example.com port 22: Connection refused

Verify bastion host is running
Check security groups/firewall allow port 22
Verify DNS resolution

Authentication Failed

Error: ssh: handshake failed: ssh: unable to authenticate

Verify SSH key is correct
Check key has correct permissions: chmod 600 ~/.ssh/id_rsa
Verify user exists on bastion
Try connecting manually: ssh -i ~/.ssh/id_rsa ubuntu@bastion.example.com

Host Key Verification Failed

Error: Host key verification failed

Add host to known_hosts: ssh-keyscan bastion.example.com >> ~/.ssh/known_hosts
Or use strict_host_key_checking: "accept-new"

Tunnel Connection Lost

Error: SSH tunnel disconnected

Enable keepalive settings
Enable auto-reconnect
Check bastion stability
Review SSH server ClientAliveInterval setting

Slow Connection

Use connection pooling (tunnel is reused)
Consider running Horizon Epoch closer to the bastion
Check network latency between client and bastion

Security Best Practices

Use key-based authentication - Never use passwords
Protect private keys - Use encrypted keys, proper permissions
Verify host keys - Enable strict checking in production
Limit bastion access - Restrict who can SSH to bastion
Audit connections - Enable logging on bastion host
Rotate keys regularly - Use short-lived certificates when possible

Next Steps

mTLS Authentication - Certificate-based database auth
Vault Integration - SSH key management with Vault
Configuration Reference - Full options

mTLS Authentication

Configure mutual TLS (client certificate) authentication for database connections.

Overview

mTLS provides:

Strong authentication via X.509 certificates
Encrypted connections (TLS)
No password management required
Certificate-based identity

Prerequisites

TLS-enabled database server
Client certificate and key
CA certificate (for verification)

Basic Configuration

from horizon_epoch import Config

config = Config(
    metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
    name="production",
    host="db.example.com",
    database="production",
    username="epoch_user",
    sslmode="verify-full",
    ssl_cert="/path/to/client.crt",
    ssl_key="/path/to/client.key",
    ssl_rootcert="/path/to/ca.crt"
)

TLS Modes

Mode	Server Cert	Client Cert	Description
`disable`	No	No	Unencrypted connection
`require`	No	No	Encrypted, no verification
`verify-ca`	Yes	Optional	Verify server cert against CA
`verify-full`	Yes	Optional	Verify cert + hostname

For mTLS, use verify-full with client certificates.

Certificate Files

Client Certificate

PEM-encoded X.509 certificate:

-----BEGIN CERTIFICATE-----
MIIDXTCCAkWgAwIBAgIJAJC1HiIAZAiU...
-----END CERTIFICATE-----

Client Key

PEM-encoded private key:

-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA7Zq7k...
-----END RSA PRIVATE KEY-----

Or encrypted key:

-----BEGIN ENCRYPTED PRIVATE KEY-----
MIIFHzBJBgkqhkiG9w0BBQ0wPDA...
-----END ENCRYPTED PRIVATE KEY-----

CA Certificate

For verifying server certificate:

-----BEGIN CERTIFICATE-----
MIIDrzCCApegAwIBAgIQCDvg...
-----END CERTIFICATE-----

Configuration File

# epoch.toml
[storage.postgres.production]
host = "db.example.com"
database = "production"
username = "epoch_user"
sslmode = "verify-full"
ssl_cert = "/etc/epoch/certs/client.crt"
ssl_key = "/etc/epoch/certs/client.key"
ssl_rootcert = "/etc/epoch/certs/ca.crt"

Using Vault PKI

Dynamically issue certificates from Vault:

config = Config(
    vault_addr="https://vault.example.com:8200",
    vault_token="${VAULT_TOKEN}"
).add_postgres(
    name="production",
    host="db.example.com",
    database="production",
    username="epoch_user",
    sslmode="verify-full",
    vault_pki_role="pki/issue/epoch-client",
    ssl_rootcert="/etc/epoch/certs/ca.crt"
)

Certificates are automatically renewed before expiry.

PostgreSQL Server Setup

Enable SSL on PostgreSQL server:

# postgresql.conf
ssl = on
ssl_cert_file = '/var/lib/postgresql/server.crt'
ssl_key_file = '/var/lib/postgresql/server.key'
ssl_ca_file = '/var/lib/postgresql/ca.crt'

Configure client certificate authentication:

# pg_hba.conf
# TYPE  DATABASE  USER         ADDRESS        METHOD
hostssl all       epoch_user   0.0.0.0/0      cert clientcert=verify-full

Create user with certificate CN:

CREATE USER epoch_user;
-- The CN in the client cert must match the username

Encrypted Private Keys

If your key is password-protected:

config.add_postgres(
    name="production",
    host="db.example.com",
    ssl_key="/path/to/encrypted.key",
    ssl_key_password="${SSL_KEY_PASSWORD}"
)

Certificate Rotation

Manual Rotation

Generate new certificate
Update configuration
Restart/reload application

Automatic with Vault

Certificates are automatically rotated:

config = Config(
    vault_pki_role="pki/issue/epoch-client",
    cert_renewal_threshold=0.7  # Renew when 70% of TTL elapsed
)

Troubleshooting

Certificate Expired

Error: SSL error: certificate has expired

Check certificate expiry: openssl x509 -enddate -noout -in client.crt
Renew certificate
If using Vault, check renewal is working

Certificate Verification Failed

Error: SSL error: certificate verify failed

Verify CA certificate is correct
Check certificate chain is complete
Verify server hostname matches certificate

Permission Denied on Key File

Error: could not load private key file

Check file permissions: chmod 600 client.key
Verify file is readable by process user

Key Doesn’t Match Certificate

Error: key values mismatch

Verify key matches certificate:

openssl x509 -noout -modulus -in client.crt | md5sum
openssl rsa -noout -modulus -in client.key | md5sum

Wrong Password for Encrypted Key

Error: bad decrypt

Verify password is correct
Check key file format (PKCS#8 vs traditional)

Security Best Practices

Protect private keys - Use encrypted keys, restrict file permissions
Use short-lived certificates - Rotate frequently
Verify server certificates - Always use verify-full
Audit certificate usage - Log certificate fingerprints
Use separate certificates - Don’t share between environments

CLI Usage

# Using environment variables
export EPOCH_SSL_CERT="/path/to/client.crt"
export EPOCH_SSL_KEY="/path/to/client.key"
export EPOCH_SSL_ROOTCERT="/path/to/ca.crt"

epoch init my-repo \
    --metadata-url "postgresql://user@db.example.com/horizon_epoch?sslmode=verify-full"

Next Steps

Vault Integration - Dynamic certificate issuance
SSH Tunnels - Alternative secure connection
Credential Providers - Overview of auth methods

CLI Reference

Complete reference for the epoch command-line interface.

Global Options

Option	Description
`-C, --directory <PATH>`	Run as if started in PATH
`--format <FORMAT>`	Output format: `text`, `json`
`-q, --quiet`	Suppress non-essential output
`-v, --verbose`	Enable verbose output
`--config <FILE>`	Path to config file
`--version`	Show version
`--help`	Show help

Commands

epoch init

Initialize a new repository.

epoch init <NAME> [OPTIONS]

Arguments:

<NAME> - Repository name (required)

Options:

-d, --description <DESC> - Repository description
-b, --default-branch <BRANCH> - Default branch name (default: main)
--metadata-url <URL> - Metadata database URL

Examples:

epoch init my-repo
epoch init my-repo --description "Production data"
epoch init my-repo --metadata-url "postgresql://localhost/horizon_epoch"

epoch branch

Manage branches.

epoch branch [SUBCOMMAND]

Subcommands:

epoch branch list

List all branches.

epoch branch list
epoch branch list --format json

epoch branch create

Create a new branch.

epoch branch create <NAME> [--from <SOURCE>]

Options:

--from <SOURCE> - Source branch (default: current branch)

Examples:

epoch branch create feature-x
epoch branch create hotfix --from production

epoch branch delete

Delete a branch.

epoch branch delete <NAME> [--force]

Options:

-f, --force - Force delete even if unmerged

epoch checkout

Switch to a branch.

epoch checkout <BRANCH>

Examples:

epoch checkout main
epoch checkout feature/new-schema

epoch commit

Create a new commit.

epoch commit -m <MESSAGE> [OPTIONS]

Options:

-m, --message <MSG> - Commit message (required)
-a, --all - Commit all changed tables
--tables <TABLES> - Comma-separated list of tables to commit
--author <NAME> - Override author name
--email <EMAIL> - Override author email

Examples:

epoch commit -m "Add user preferences"
epoch commit -m "Update orders" --tables orders,order_items

epoch merge

Merge a branch into current branch.

epoch merge <BRANCH> [OPTIONS]

Options:

--strategy <STRATEGY> - Merge strategy: default, ours, theirs
--no-commit - Merge but don’t commit
--message <MSG> - Custom merge commit message
--continue - Continue after conflict resolution
--abort - Abort in-progress merge

Examples:

epoch merge feature/updates
epoch merge feature/updates --strategy theirs
epoch merge --continue
epoch merge --abort

epoch diff

Show differences between branches or commits.

epoch diff [SOURCE] [TARGET] [OPTIONS]

Arguments:

<SOURCE> - Source branch/commit (default: HEAD)
<TARGET> - Target branch/commit

Options:

--table <TABLE> - Show diff for specific table
--stat - Show statistics only
--name-only - Show only changed table names

Examples:

epoch diff                           # Uncommitted changes
epoch diff main                      # Current branch vs main
epoch diff main feature/updates      # Between two branches
epoch diff abc123f def456g           # Between commits
epoch diff --stat                    # Summary only

epoch log

Show commit history.

epoch log [OPTIONS]

Options:

-n, --max-count <N> - Limit output to N commits
--oneline - One line per commit
--since <DATE> - Show commits since date
--until <DATE> - Show commits until date
--author <PATTERN> - Filter by author
--table <TABLE> - Show commits affecting table

Examples:

epoch log
epoch log --oneline -10
epoch log --since "2024-01-01"
epoch log --table users

epoch status

Show working tree status.

epoch status [OPTIONS]

Options:

--short - Short format
--porcelain - Machine-readable format

Examples:

epoch status
epoch status --short

epoch table

Manage tables.

epoch table [SUBCOMMAND]

epoch table add

epoch table add <NAME> --location <LOCATION> [OPTIONS]

Options:

--location <URL> - Table location (required)
--primary-key <COLS> - Primary key columns
--storage <NAME> - Storage backend name (alternative to location URL)

Location Formats by Backend:

Backend	Format	Example
PostgreSQL	`postgresql://[user:pass@]host[:port]/database/schema.table`	`postgresql://localhost/mydb/public.users`
MySQL	`mysql://[user:pass@]host[:port]/database/table`	`mysql://localhost/analytics/events`
SQL Server	`mssql://[user:pass@]host[:port]/database/schema.table`	`mssql://sqlserver/warehouse/dbo.orders`
SQLite	`sqlite:///path/to/file.db/table`	`sqlite:///data/local.db/users`
S3	`s3://bucket/prefix/path`	`s3://datalake/delta/events`
Azure Blob	`azure://account/container/prefix`	`azure://myaccount/data/tables/users`
GCS	`gs://bucket/prefix/path`	`gs://my-bucket/delta/products`
Local	`file:///path/to/directory`	`file:///data/parquet/users`

Examples:

# PostgreSQL
epoch table add users --location "postgresql://localhost/mydb/public.users" \
    --primary-key id

# MySQL
epoch table add orders --location "mysql://mysql-host/shop/orders" \
    --primary-key order_id

# SQL Server
epoch table add inventory --location "mssql://sqlserver/warehouse/dbo.inventory" \
    --primary-key sku

# SQLite
epoch table add config --location "sqlite:///data/app.db/config" \
    --primary-key key

# S3/Delta Lake
epoch table add events --location "s3://datalake/delta/events" \
    --primary-key event_id

# Azure Blob
epoch table add telemetry --location "azure://myaccount/data/telemetry" \
    --primary-key id

# GCS
epoch table add products --location "gs://my-bucket/delta/products" \
    --primary-key product_id

# Local filesystem
epoch table add local_data --location "file:///data/parquet/sales" \
    --primary-key transaction_id

# Using storage backend name (when storage is already registered)
epoch table add users --storage prod-pg --primary-key id

epoch table list

List registered tables.

epoch table list [--format json]

epoch table remove

Unregister a table.

epoch table remove <NAME> [--force]

epoch tag

Manage tags.

epoch tag [SUBCOMMAND]

epoch tag create

Create a tag.

epoch tag <NAME> [COMMIT] [OPTIONS]

Options:

-m, --message <MSG> - Tag message

Examples:

epoch tag v1.0.0
epoch tag v1.0.0 abc123f -m "Release 1.0.0"

List all tags.

epoch tag list

epoch tag delete

Delete a tag.

epoch tag delete <NAME>

epoch reset

Reset current branch to a commit.

epoch reset [OPTIONS] <COMMIT>

Options:

--soft - Keep changes staged
--hard - Discard all changes

Examples:

epoch reset --soft HEAD~1
epoch reset --hard v1.0.0

epoch revert

Create a commit that undoes a previous commit.

epoch revert <COMMIT> [OPTIONS]

Options:

--no-commit - Revert but don’t commit

Examples:

epoch revert abc123f
epoch revert HEAD

epoch stash

Stash uncommitted changes.

epoch stash [SUBCOMMAND]

epoch stash push

Save changes to stash.

epoch stash push [-m <MESSAGE>]

epoch stash pop

Apply and remove latest stash.

epoch stash pop [<STASH>]

epoch stash list

List stashes.

epoch stash list

epoch storage

Manage storage backends.

epoch storage [SUBCOMMAND]

epoch storage add

epoch storage add <NAME> --type <TYPE> [OPTIONS]

Arguments:

<NAME> - Unique name for the storage backend (required)

Options:

--type <TYPE> - Backend type (required): postgresql, mysql, mssql, sqlite, s3, azureblob, gcs, local
--description <DESC> - Optional description
--set-default - Set as the default storage backend

PostgreSQL/MySQL/MSSQL Options:

--host <HOST> - Database host (required)
--port <PORT> - Database port (PostgreSQL: 5432, MySQL: 3306, MSSQL: 1433)
--database <DB> - Database name (required)
--schema <SCHEMA> - Schema name (PostgreSQL only)
--username <USER> - Username
--ssl-mode <MODE> - SSL mode

SQLite Options:

--database <PATH> - Database file path (required)

S3 Options:

--bucket <BUCKET> - Bucket name (required)
--region <REGION> - AWS region
--endpoint <URL> - Custom endpoint (for MinIO, etc.)
--path-prefix <PREFIX> - Key prefix
--path-style - Use path-style URLs

Azure Blob Options:

--account <ACCOUNT> - Storage account name (required)
--container <CONTAINER> - Container name (required)
--path-prefix <PREFIX> - Blob prefix

GCS Options:

--bucket <BUCKET> - Bucket name (required)
--project <PROJECT> - GCP project ID
--path-prefix <PREFIX> - Object prefix

Local Filesystem Options:

--base-path <PATH> - Base directory path (required)

Credential Options:

--credentials <TYPE> - Credential resolution: environment, aws-profile, gcp-service-account, azure-managed-identity, vault
--profile <NAME> - AWS profile name (with aws-profile)
--key-file <PATH> - GCP service account key file (with gcp-service-account)
--client-id <ID> - Azure client ID (with azure-managed-identity)
--vault-path <PATH> - Vault secret path (with vault)
--vault-role <ROLE> - Vault role name (with vault)

Examples:

# PostgreSQL
epoch storage add prod-pg --type postgresql \
    --host db.example.com --database production

# MySQL
epoch storage add analytics-mysql --type mysql \
    --host mysql.example.com --port 3306 --database analytics

# SQL Server
epoch storage add warehouse-mssql --type mssql \
    --host sqlserver.example.com --database warehouse

# SQLite
epoch storage add local-sqlite --type sqlite --database ./data/local.db

# S3
epoch storage add datalake --type s3 \
    --bucket company-datalake --region us-west-2

# S3-compatible (MinIO)
epoch storage add dev-minio --type s3 \
    --bucket local-data --endpoint http://localhost:9000 --path-style

# Azure Blob
epoch storage add azure-store --type azureblob \
    --account myaccount --container data

# GCS
epoch storage add gcs-store --type gcs \
    --bucket my-bucket --project my-project

# Local Filesystem
epoch storage add local-fs --type local --base-path /data/epoch

# With credentials
epoch storage add prod-s3 --type s3 --bucket prod-data \
    --credentials aws-profile --profile production

epoch storage list

List configured storage backends.

epoch storage list [--type <TYPE>] [--format json]

Options:

--type <TYPE> - Filter by backend type

epoch storage show

Show detailed information about a storage backend.

epoch storage show <NAME>

epoch storage update

Update a storage backend configuration.

epoch storage update <NAME> [OPTIONS]

Options:

--description <DESC> - Update description
--disable - Disable the backend
--enable - Enable the backend

epoch storage remove

Remove a storage backend.

epoch storage remove <NAME> [--force]

epoch storage default

Manage the default storage backend.

epoch storage default set <NAME>   # Set default
epoch storage default show         # Show current default
epoch storage default clear        # Clear default

epoch storage test-credentials

Test credential provider for a storage backend.

epoch storage test-credentials <NAME>
epoch storage test-credentials --all

epoch storage credentials

Show credential status for storage backends.

epoch storage credentials <NAME>
epoch storage credentials --all

epoch storage tunnels

Show SSH tunnel status for storage backends.

epoch storage tunnels [NAME]

Schema Commands

epoch schema show

Display schema for a table at a specific commit or branch.

epoch schema show <TABLE> [OPTIONS]

Arguments:

<TABLE> - Table name (required)

Options:

--at <REF> - Show schema at specific commit, branch, or tag
--live - Show live database schema (not stored snapshot)
--generate-ddl - Output as PostgreSQL DDL
--format <FORMAT> - Output format: text, json

Examples:

# Show current schema
epoch schema show users

# Show schema at specific commit
epoch schema show users --at abc123f

# Show schema on a branch
epoch schema show users --at feature/new-fields

# Compare with live database
epoch schema show users --live

# Generate DDL from schema
epoch schema show users --generate-ddl

epoch schema diff

Compare schemas between two refs and optionally generate migration DDL.

epoch schema diff <TABLE> <SOURCE> <TARGET> [OPTIONS]

Arguments:

<TABLE> - Table name (required)
<SOURCE> - Source ref (branch, commit, or tag)
<TARGET> - Target ref (branch, commit, or tag)

Options:

--show-ddl - Show DDL migration statements
--ddl-output <FILE> - Write DDL to file
--format <FORMAT> - Output format: text, json, flyway, liquibase
--columns-only - Show only column changes
--constraints-only - Show only constraint changes
--indexes-only - Show only index changes

Examples:

# Compare schemas between branches
epoch schema diff users main feature/updates

# Generate migration DDL
epoch schema diff users main feature/updates --show-ddl

# Write DDL to file
epoch schema diff users main feature/updates --ddl-output migrations/V2.sql

# Generate Flyway-compatible migration
epoch schema diff users main feature/updates --format flyway --ddl-output db/migration/V2__update_users.sql

epoch schema apply

Apply schema changes from a ref to the live database.

epoch schema apply <TABLE> --from <REF> [OPTIONS]

Arguments:

<TABLE> - Table name (required)

Options:

--from <REF> - Source ref with desired schema (required)
--dry-run - Show what would be executed without applying
--force - Allow destructive operations
--online - Use CONCURRENTLY for index operations
--lock-timeout <DURATION> - Lock timeout (default: 30s)
--statement-timeout <DURATION> - Statement timeout (default: 5m)
--retries <N> - Number of retries on failure (default: 3)
--no-transaction - Run statements individually (required for CONCURRENTLY)

Examples:

# Preview changes (dry-run)
epoch schema apply users --from feature/updates --dry-run

# Apply schema changes
epoch schema apply users --from feature/updates

# Apply with online mode for large tables
epoch schema apply users --from feature/updates --online

# Apply with custom timeouts
epoch schema apply users --from feature/updates \
    --lock-timeout 10s \
    --statement-timeout 10m

# Force destructive changes (after backup)
epoch schema apply users --from feature/updates --force

epoch schema status

Show schema drift status for tables.

epoch schema status [TABLES...] [OPTIONS]

Arguments:

[TABLES...] - Specific tables to check (optional, all if omitted)

Options:

--check-bloat - Also check table bloat
--check-indexes - Check index validity
--format <FORMAT> - Output format: text, json

Examples:

# Check all tables
epoch schema status

# Check specific tables
epoch schema status users orders

# Check with bloat analysis
epoch schema status --check-bloat

# JSON output for automation
epoch schema status --format json

Output:

Table      | Latest Schema | Historical | Live DB  | Status
-----------|---------------|------------|----------|--------
users      | Yes           | Yes        | Synced   | OK
orders     | Yes           | Partial    | Synced   | OK
products   | Yes           | No         | Drifted  | Needs sync
events     | No            | No         | N/A      | S3 table

epoch migrate-schemas

Backfill schema data for existing commits.

epoch migrate-schemas [OPTIONS]

Options:

--backfill - Backfill missing schemas from live database
--all - Process all tables
--table <TABLE> - Process specific table
--since <DATE> - Only backfill commits since date
--skip-inaccessible - Skip tables that can’t be introspected
--import-from <TOOL> - Import from external tool (flyway, liquibase)
--dry-run - Show what would be done

Examples:

# Backfill all tables (dry-run)
epoch migrate-schemas --backfill --all --dry-run

# Backfill specific table
epoch migrate-schemas --backfill --table users

# Backfill since a date
epoch migrate-schemas --backfill --all --since "2024-01-01"

# Import from Flyway migrations
epoch migrate-schemas --import-from flyway ./db/migration

epoch conflicts

Manage merge conflicts.

epoch conflicts [SUBCOMMAND]

epoch conflicts show

Show current conflicts.

epoch conflicts show [--format json]

epoch conflicts resolve

Resolve conflicts.

epoch conflicts resolve [OPTIONS]

Options:

--interactive - Interactive resolution
--ours - Accept our version
--theirs - Accept their version
--table <TABLE> - Resolve specific table
--record <ID> - Resolve specific record
--field <FIELD> - Resolve specific field
--value <VALUE> - Set specific value

epoch query

Run a query against the repository.

epoch query <SQL> [OPTIONS]

Options:

--branch <BRANCH> - Query specific branch
--at <REF> - Query at commit/tag/time
--format <FMT> - Output format: table, json, csv

Examples:

epoch query "SELECT * FROM users"
epoch query "SELECT COUNT(*) FROM orders" --branch staging
epoch query "SELECT * FROM users" --at v1.0.0 --format json

epoch local

Manage local development environment.

epoch local [SUBCOMMAND]

epoch local start

Start local services.

epoch local start

epoch local stop

Stop local services.

epoch local stop

epoch local status

Show local service status.

epoch local status

Configuration File

~/.epoch/config.toml:

[defaults]
format = "text"
author_name = "Your Name"
author_email = "you@example.com"

[metadata]
url = "postgresql://localhost/horizon_epoch"

Environment Variables

Variable	Description
`EPOCH_METADATA_URL`	Metadata database URL
`EPOCH_CONFIG_FILE`	Config file path
`EPOCH_LOG_LEVEL`	Log level (debug, info, warn, error)
`EPOCH_FORMAT`	Default output format

Python SDK Reference

Complete reference for the Horizon Epoch Python SDK.

Installation

The Python SDK requires building from source (see Installation Guide).

# From project root
cd python && uv sync
uv run maturin develop --manifest-path ../crates/horizon-epoch-py/Cargo.toml

Quick Start

from horizon_epoch import Client, ClientConfig, Author

# Create client configuration
config = ClientConfig(
    metadata_url="postgresql://localhost/horizon_epoch"
)

# Use async context manager for connection management
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
    # Initialize a repository
    await client.init("my-data-repo")

    # Create a branch
    await client.branch("feature-x")

    # Make changes and commit
    await client.commit("Initial commit")

Classes

ClientConfig

Configuration for the Horizon Epoch client.

from horizon_epoch import ClientConfig, Author

config = ClientConfig(
    metadata_url="postgresql://user:pass@localhost/horizon_epoch",
    default_branch="main",
    author=Author(name="Dev", email="dev@example.com"),
    timeout_seconds=30.0,
    retry_count=3,
    log_level="INFO",
)

Parameter	Type	Default	Description
`metadata_url`	str	required	Metadata database URL
`default_branch`	str	`"main"`	Default branch name
`author`	Author	None	Default commit author
`timeout_seconds`	float	30.0	Operation timeout
`retry_count`	int	3	Retry count for transient failures
`log_level`	str	`"INFO"`	Logging level

Client

Main client for interacting with Horizon Epoch.

Connection Management

from horizon_epoch import Client

# Option 1: Async context manager (recommended)
async with Client.connect("postgresql://...") as client:
    await client.init("my-repo")

# Option 2: Manual connection management
client = Client.from_url("postgresql://...")
await client.open()
try:
    await client.init("my-repo")
finally:
    await client.close()

Repository Operations

# Initialize a new repository
config = await client.init(
    name="my-repo",
    storage_configs={"primary": {"backend": "postgresql", "url": "..."}},
    description="My data repository"
)

# Open an existing repository
config = await client.open_repository("my-repo")

# Get repository status
status = await client.status(branch="main")

Branch Operations

# Create a branch
branch = await client.branch(
    name="feature/new-schema",
    start_point="main",  # Optional, defaults to default branch
    checkout=True        # Optional, defaults to True
)

# List all branches
branches = await client.branches()

# Get a specific branch
branch = await client.get_branch("main")

# Checkout a branch
status = await client.checkout("feature/new-schema")

# Delete a branch
await client.delete_branch("old-feature", force=False)

# Compare branches
comparison = await client.compare_branches("feature-x", "main")
print(f"Ahead: {comparison.ahead_count}, Behind: {comparison.behind_count}")

Commit Operations

from horizon_epoch import Author

# Create a commit
commit = await client.commit(
    message="Add user preferences",
    author=Author(name="Dev", email="dev@example.com"),
    branch="feature-x",      # Optional
    tables=["users"]         # Optional, defaults to all staged
)

# View commit history
log = await client.log(ref="main", limit=10, skip=0)
for entry in log.entries:
    print(f"{entry.commit.short_id}: {entry.commit.message}")

# Get commit details
commit = await client.show("abc123")

Diff Operations

# Diff between branches
diff = await client.diff(
    base="main",
    target="feature-x",
    tables=["users"],           # Optional
    include_record_diff=True    # Optional
)

for table_diff in diff.table_diffs:
    print(f"{table_diff.table_name}: {table_diff.total_changes} changes")

Merge Operations

from horizon_epoch import MergeStrategy

# Merge a branch
result = await client.merge(
    source="feature-x",
    target="main",                      # Optional, defaults to default branch
    strategy=MergeStrategy.THREE_WAY,   # Optional
    message="Merge feature-x",          # Optional
    dry_run=False                       # Optional
)

if result.has_conflicts:
    print(f"Conflicts: {result.conflict_count}")
else:
    print(f"Merged: {result.result_commit_id}")

# Abort a merge
await client.abort_merge(conflict_id="...")

Storage Operations

Horizon Epoch supports 8 storage backend types across databases, object storage, and filesystems.

from horizon_epoch import StorageBackend

# =============================================================================
# DATABASE BACKENDS
# =============================================================================

# PostgreSQL
backend_id = await client.add_storage(
    name="postgres-warehouse",
    backend=StorageBackend.POSTGRESQL,
    config={"url": "postgresql://user:pass@localhost:5432/mydb"},
    description="PostgreSQL data warehouse"
)

# MySQL
backend_id = await client.add_storage(
    name="mysql-analytics",
    backend=StorageBackend.MYSQL,
    config={"url": "mysql://user:pass@localhost:3306/analytics"},
    description="MySQL analytics database"
)

# Microsoft SQL Server
backend_id = await client.add_storage(
    name="mssql-legacy",
    backend=StorageBackend.MSSQL,
    config={"url": "mssql://user:pass@localhost:1433/legacy_db"},
    description="SQL Server legacy system"
)

# SQLite
backend_id = await client.add_storage(
    name="sqlite-local",
    backend=StorageBackend.SQLITE,
    config={"path": "/data/local.db"},
    description="Local SQLite database"
)

# =============================================================================
# OBJECT STORAGE BACKENDS
# =============================================================================

# AWS S3 / S3-compatible storage
backend_id = await client.add_storage(
    name="s3-datalake",
    backend=StorageBackend.S3,
    config={
        "bucket": "my-datalake",
        "region": "us-west-2",
        "endpoint": None,  # Use AWS default, or specify for MinIO/LocalStack
    },
    description="S3 data lake"
)

# Azure Blob Storage
backend_id = await client.add_storage(
    name="azure-warehouse",
    backend=StorageBackend.AZURE,
    config={
        "account": "mystorageaccount",
        "container": "data-warehouse",
    },
    description="Azure Blob data warehouse"
)

# Google Cloud Storage
backend_id = await client.add_storage(
    name="gcs-analytics",
    backend=StorageBackend.GCS,
    config={
        "bucket": "my-analytics-bucket",
        "project": "my-gcp-project",
    },
    description="GCS analytics storage"
)

# =============================================================================
# FILESYSTEM BACKEND
# =============================================================================

# Local filesystem
backend_id = await client.add_storage(
    name="local-dev",
    backend=StorageBackend.LOCAL,
    config={"path": "/data/epoch-storage"},
    description="Local development storage"
)

# =============================================================================
# STORAGE MANAGEMENT
# =============================================================================

# List all storage backends
backends = await client.list_storage()
for name, backend_type in backends.items():
    print(f"{name}: {backend_type.value}")

# Get storage details
config = await client.get_storage("s3-datalake")
print(f"Backend: {config.name}, Type: {config.backend_type}")

# Get detailed storage info
backends_info = await client.list_storage_info()
for backend in backends_info:
    status = "default" if backend.is_default else "enabled"
    print(f"{backend.name} ({backend.backend_type}): {status}")

# Remove storage backend
await client.remove_storage("old-storage")

Table Operations

Tables are tracked using StorageLocation objects that specify where the data lives. The native module provides factory methods for all 8 storage backends.

# Access native module for StorageLocation
from horizon_epoch.client import _native

# =============================================================================
# DATABASE STORAGE LOCATIONS
# =============================================================================

# PostgreSQL: connection_name, schema, table
loc = _native.StorageLocation.postgresql("main", "public", "users")
reg = await client.track_table("users", loc)

# MySQL: connection_name, database, table
loc = _native.StorageLocation.mysql("analytics", "sales_db", "orders")
reg = await client.track_table("orders", loc)

# Microsoft SQL Server: connection_name, database, schema, table
loc = _native.StorageLocation.mssql("legacy", "warehouse", "dbo", "customers")
reg = await client.track_table("customers", loc)

# SQLite: connection_name, table
loc = _native.StorageLocation.sqlite("local-db", "products")
reg = await client.track_table("products", loc)

# =============================================================================
# OBJECT STORAGE LOCATIONS
# =============================================================================

# S3: bucket, prefix, format (parquet, delta, csv)
loc = _native.StorageLocation.s3("my-datalake", "tables/events/", format="parquet")
reg = await client.track_table("events", loc)

# S3 with Delta Lake format
loc = _native.StorageLocation.s3("my-datalake", "delta/transactions/", format="delta")
reg = await client.track_table("transactions", loc)

# Azure Blob Storage: account, container, prefix, format
loc = _native.StorageLocation.azure_blob(
    "mystorageaccount", "warehouse", "data/metrics/", format="parquet"
)
reg = await client.track_table("metrics", loc)

# Google Cloud Storage: bucket, prefix, format
loc = _native.StorageLocation.gcs("my-analytics", "tables/sessions/", format="parquet")
reg = await client.track_table("sessions", loc)

# =============================================================================
# FILESYSTEM STORAGE LOCATIONS
# =============================================================================

# Local filesystem: path, format
loc = _native.StorageLocation.local_filesystem("/data/exports/reports/", format="parquet")
reg = await client.track_table("reports", loc)

# =============================================================================
# TABLE MANAGEMENT
# =============================================================================

# View registration result
print(f"Tracked: {reg.table_name}")
print(f"Content hash: {reg.content_hash}")
print(f"Record count: {reg.record_count}")
print(f"Location: {reg.storage_location_uri}")

# List all tracked tables
tables = await client.list_tables()
for table in tables:
    print(f"  - {table}")

# Untrack a table (data is preserved, only tracking is removed)
await client.untrack_table("old-table")

Staging Operations

# Stage all changes
result = await client.stage_all(branch="main")
print(f"Staged {result.tables_affected} tables")

# Stage specific table
await client.stage("users", branch="main")

# Unstage all
result = await client.unstage_all(branch="main")

# Unstage specific table
await client.unstage("users", branch="main")

Constraint Operations

# Get constraints for a table
constraints = await client.get_constraints("users", branch="main")
print(f"Foreign keys: {len(constraints.foreign_keys)}")
print(f"Unique: {len(constraints.unique_constraints)}")

# Add a constraint (using native types)
from horizon_epoch.client import _native
constraint = _native.UniqueConstraint("uq_email", ["email"])
await client.add_constraint("users", constraint)

# Remove a constraint
await client.remove_constraint("users", "uq_email")

# Diff constraints between branches
diff = await client.diff_constraints("main", "feature-x", table_name="users")

# Add an index
col = _native.IndexColumn("email")
idx = _native.IndexDefinition("idx_email", [col])
await client.add_index("users", idx)

# Remove an index
await client.remove_index("users", "idx_email")

Models

Author

from horizon_epoch import Author

author = Author(name="Jane Developer", email="jane@example.com")

Branch

from horizon_epoch import Branch

# Returned from client.branch(), client.branches(), client.get_branch()
branch.name           # str
branch.head_commit_id # str
branch.created_at     # datetime
branch.is_default     # bool

Commit

from horizon_epoch import Commit

# Returned from client.commit(), client.show()
commit.id             # str (full commit ID)
commit.short_id       # str (abbreviated)
commit.message        # str
commit.author         # Author
commit.timestamp      # datetime
commit.parent_ids     # List[str]

MergeResult

from horizon_epoch import MergeResult

result.source_branch     # str
result.target_branch     # str
result.is_fast_forward   # bool
result.has_conflicts     # bool
result.conflict_count    # int
result.result_commit_id  # Optional[str]
result.conflicts         # List[RecordConflict]

Status

from horizon_epoch import Status

status.branch            # Optional[str]
status.commit_id         # Optional[str]
status.is_detached       # bool
status.staged_tables     # List[str]
status.unstaged_tables   # List[str]

Enums

from horizon_epoch import MergeStrategy, StorageBackend

# Merge strategies
MergeStrategy.THREE_WAY   # Standard three-way merge
MergeStrategy.OURS        # Keep target branch version
MergeStrategy.THEIRS      # Keep source branch version
MergeStrategy.MANUAL      # Fail on conflicts

# Storage backends - Database
StorageBackend.POSTGRESQL  # PostgreSQL database
StorageBackend.MYSQL       # MySQL database
StorageBackend.MSSQL       # Microsoft SQL Server
StorageBackend.SQLITE      # SQLite database

# Storage backends - Object Storage
StorageBackend.S3          # AWS S3 / S3-compatible storage
StorageBackend.AZURE       # Azure Blob Storage
StorageBackend.GCS         # Google Cloud Storage

# Storage backends - Filesystem
StorageBackend.LOCAL       # Local filesystem

Exceptions

from horizon_epoch import (
    HorizonEpochError,           # Base exception
    RepositoryNotInitializedError,
    BranchNotFoundError,
    BranchAlreadyExistsError,
    CommitNotFoundError,
    MergeConflictError,
    ConnectionFailedError,
    StorageError,
    ValidationError,
)

Async Support

All client methods are async and should be awaited:

import asyncio
from horizon_epoch import Client

async def main():
    async with Client.connect("postgresql://...") as client:
        await client.init("my-repo")
        await client.branch("feature")
        await client.commit("Initial commit")

asyncio.run(main())

Native Module Access

For advanced operations, access the native Rust module directly:

from horizon_epoch.client import (
    _native,                  # Native module (if available)
    is_native_available,      # Check if native bindings loaded
    get_native_version,       # Get native module version
    get_native_info,          # Get diagnostic info
    require_native,           # Get native module or raise ImportError
)

if is_native_available():
    print(f"Native version: {get_native_version()}")

    # StorageLocation factory methods for all 8 backends
    # Database backends
    pg_loc = _native.StorageLocation.postgresql("conn", "schema", "table")
    mysql_loc = _native.StorageLocation.mysql("conn", "database", "table")
    mssql_loc = _native.StorageLocation.mssql("conn", "database", "schema", "table")
    sqlite_loc = _native.StorageLocation.sqlite("conn", "table")

    # Object storage backends
    s3_loc = _native.StorageLocation.s3("bucket", "prefix/", format="parquet")
    azure_loc = _native.StorageLocation.azure_blob("account", "container", "prefix/", format="parquet")
    gcs_loc = _native.StorageLocation.gcs("bucket", "prefix/", format="parquet")

    # Filesystem backend
    local_loc = _native.StorageLocation.local_filesystem("/path/to/data/", format="parquet")

    # Constraint types
    unique = _native.UniqueConstraint("uq_email", ["email"])
    check = _native.CheckConstraint("ck_age", "age >= 0")
    fk = _native.ForeignKeyConstraint("fk_user", ["user_id"], "users", ["id"])

    # Index types
    col = _native.IndexColumn("email")
    idx = _native.IndexDefinition("idx_email", [col])

Example Usage

import asyncio
from horizon_epoch import Client, Author, MergeStrategy

async def main():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Initialize repository
        await client.init("my-data-repo")

        # Create and checkout feature branch
        await client.branch("feature/new-schema")

        # Make changes (via your data pipeline)
        # ...

        # Stage and commit
        await client.stage_all()
        await client.commit(
            message="Add user preferences",
            author=Author(name="Dev", email="dev@example.com")
        )

        # Merge back to main
        result = await client.merge(
            source="feature/new-schema",
            strategy=MergeStrategy.THREE_WAY
        )

        if result.has_conflicts:
            print(f"Resolve {result.conflict_count} conflicts")
        else:
            print(f"Merged successfully: {result.result_commit_id}")

asyncio.run(main())

REST API Reference

The Horizon Epoch REST API provides HTTP endpoints for all version control operations. This reference documents all available endpoints, their parameters, and response formats.

Overview

Base URL

http://localhost:8000/api/v1

The API prefix is configurable via the EPOCH_API_PREFIX environment variable (default: /api/v1).

Content Type

All requests and responses use JSON format:

Content-Type: application/json

Running the API Server

# Start the server
uv run uvicorn horizon_epoch.api:app --reload --host 0.0.0.0 --port 8000

# With debug mode (enables Swagger UI)
EPOCH_DEBUG=true uv run uvicorn horizon_epoch.api:app --reload

Authentication

Current Status: The API does not currently implement authentication. All endpoints are open.

When authentication is implemented, the following methods will be supported:

Planned: API Key Authentication

curl -H "Authorization: Bearer <api-key>" \
  http://localhost:8000/api/v1/repositories

Planned: Basic Auth

curl -u username:password \
  http://localhost:8000/api/v1/repositories

For production deployments, we recommend placing the API behind a reverse proxy (nginx, Traefik) that handles authentication.

Health Endpoints

Health check endpoints are available at the root level (not under /api/v1).

Health Check

Check if the API server is running.

GET /health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "connected": true
}

curl Example:

curl http://localhost:8000/health

Readiness Check

Check if the API is ready to handle requests.

GET /ready

Response:

{
  "ready": true,
  "metadata_connected": true,
  "repository": "my-repo"
}

curl Example:

curl http://localhost:8000/ready

API Root

Get basic API information.

GET /

Response:

{
  "name": "Horizon Epoch API",
  "version": "0.1.0",
  "api_prefix": "/api/v1",
  "docs": "/docs"
}

curl Example:

curl http://localhost:8000/

Repository Endpoints

Manage repositories and their configuration.

Create Repository

Initialize a new Horizon Epoch repository.

POST /api/v1/repositories

Request Body:

{
  "name": "my-data-repo",
  "storage_configs": {
    "primary": {
      "backend": "postgresql",
      "url": "postgresql://user:pass@localhost/mydb"
    }
  }
}

Field	Type	Required	Description
`name`	string	Yes	Repository name (min 1 character)
`storage_configs`	object	No	Optional storage backend configurations

Response: 201 Created

{
  "name": "my-data-repo",
  "default_branch": "main",
  "metadata_url": "postgresql://localhost/horizon_epoch"
}

curl Example:

curl -X POST http://localhost:8000/api/v1/repositories \
  -H "Content-Type: application/json" \
  -d '{"name": "my-data-repo"}'

Open Repository

Open an existing repository.

POST /api/v1/repositories/{name}/open

Path Parameters:

Parameter	Type	Description
`name`	string	Repository name

Response: 200 OK

{
  "name": "my-data-repo",
  "default_branch": "main",
  "metadata_url": "postgresql://localhost/horizon_epoch"
}

curl Example:

curl -X POST http://localhost:8000/api/v1/repositories/my-data-repo/open

Get Repository Status

Get the current status of the repository.

GET /api/v1/repositories/status

Query Parameters:

Parameter	Type	Required	Description
`repository`	string	No	Repository name (uses current if not specified)

Response:

{
  "branch": "main",
  "commit_id": "abc123def456789",
  "is_detached": false,
  "is_clean": true,
  "staged_count": 0,
  "unstaged_count": 0,
  "merge_in_progress": false
}

curl Example:

# Current repository
curl http://localhost:8000/api/v1/repositories/status

# Specific repository
curl "http://localhost:8000/api/v1/repositories/status?repository=my-data-repo"

List Tracked Tables

List all tables being tracked for version control.

GET /api/v1/repositories/tables

Query Parameters:

Parameter	Type	Required	Description
`repository`	string	No	Repository name

Response:

["users", "orders", "products"]

curl Example:

curl http://localhost:8000/api/v1/repositories/tables

Add Storage Backend

Add a new storage backend to the repository.

POST /api/v1/repositories/storage

Request Body:

{
  "name": "warehouse",
  "backend": "postgresql",
  "config": {
    "url": "postgresql://user:pass@warehouse-host/analytics"
  }
}

Field	Type	Required	Description
`name`	string	Yes	Storage configuration name
`backend`	string	Yes	Backend type (see below)
`config`	object	Yes	Backend-specific configuration

Supported Backend Types:

Backend	Value	Required Config Fields
PostgreSQL	`postgresql`	`url` or (`host`, `database`)
MySQL	`mysql`	`url` or (`host`, `database`)
SQL Server	`mssql`	`url` or (`host`, `database`)
SQLite	`sqlite`	`path`
S3	`s3`	`bucket`
Azure Blob	`azure`	`account`, `container`
GCS	`gcs`	`bucket`
Local Filesystem	`local`	`path`

Backend Configuration Examples:

// PostgreSQL
{
  "name": "prod-pg",
  "backend": "postgresql",
  "config": {
    "host": "db.example.com",
    "port": 5432,
    "database": "production",
    "schema": "public"
  }
}

// MySQL
{
  "name": "analytics-mysql",
  "backend": "mysql",
  "config": {
    "host": "mysql.example.com",
    "port": 3306,
    "database": "analytics"
  }
}

// SQL Server
{
  "name": "warehouse-mssql",
  "backend": "mssql",
  "config": {
    "host": "sqlserver.example.com",
    "port": 1433,
    "database": "warehouse"
  }
}

// SQLite
{
  "name": "local-sqlite",
  "backend": "sqlite",
  "config": {
    "path": "/data/local.db"
  }
}

// S3
{
  "name": "datalake",
  "backend": "s3",
  "config": {
    "bucket": "company-datalake",
    "region": "us-west-2",
    "prefix": "epoch/"
  }
}

// Azure Blob
{
  "name": "azure-store",
  "backend": "azure",
  "config": {
    "account": "myaccount",
    "container": "data",
    "prefix": "epoch/"
  }
}

// GCS
{
  "name": "gcs-store",
  "backend": "gcs",
  "config": {
    "bucket": "my-bucket",
    "project": "my-project",
    "prefix": "epoch/"
  }
}

// Local Filesystem
{
  "name": "local-fs",
  "backend": "local",
  "config": {
    "path": "/data/epoch"
  }
}

Response: 201 Created

{
  "status": "created",
  "name": "warehouse"
}

curl Examples:

# PostgreSQL
curl -X POST http://localhost:8000/api/v1/repositories/storage \
  -H "Content-Type: application/json" \
  -d '{
    "name": "warehouse",
    "backend": "postgresql",
    "config": {"host": "db.example.com", "database": "warehouse"}
  }'

# S3
curl -X POST http://localhost:8000/api/v1/repositories/storage \
  -H "Content-Type: application/json" \
  -d '{
    "name": "datalake",
    "backend": "s3",
    "config": {"bucket": "company-datalake", "region": "us-west-2"}
  }'

# Azure Blob
curl -X POST http://localhost:8000/api/v1/repositories/storage \
  -H "Content-Type: application/json" \
  -d '{
    "name": "azure-store",
    "backend": "azure",
    "config": {"account": "myaccount", "container": "data"}
  }'

Track Table

Start tracking a table for version control.

POST /api/v1/repositories/tables/track

Request Body:

{
  "table_name": "users",
  "storage_name": "primary",
  "primary_key": ["id"]
}

Field	Type	Required	Description
`table_name`	string	Yes	Table name to track
`storage_name`	string	Yes	Storage backend containing the table
`primary_key`	array	Yes	Primary key column(s)

Response: 201 Created

{
  "status": "tracking",
  "table": "users"
}

curl Example:

curl -X POST http://localhost:8000/api/v1/repositories/tables/track \
  -H "Content-Type: application/json" \
  -d '{
    "table_name": "users",
    "storage_name": "primary",
    "primary_key": ["id"]
  }'

Branch Endpoints

Create, list, and manage branches.

Create Branch

Create a new branch.

POST /api/v1/branches

Request Body:

{
  "name": "feature/add-preferences",
  "start_point": "main",
  "checkout": true
}

Field	Type	Required	Default	Description
`name`	string	Yes	-	Branch name (min 1 character)
`start_point`	string	No	HEAD	Commit ID or branch to start from
`checkout`	boolean	No	`true`	Whether to checkout the new branch

Response: 201 Created

{
  "name": "feature/add-preferences",
  "head_commit_id": "abc123def456789",
  "is_protected": false,
  "created_at": "2024-12-18T10:30:00Z"
}

curl Example:

# Create and checkout a new branch
curl -X POST http://localhost:8000/api/v1/branches \
  -H "Content-Type: application/json" \
  -d '{"name": "feature/add-preferences"}'

# Create from specific commit without checkout
curl -X POST http://localhost:8000/api/v1/branches \
  -H "Content-Type: application/json" \
  -d '{"name": "hotfix/urgent", "start_point": "abc123", "checkout": false}'

List Branches

List all branches in the repository.

GET /api/v1/branches

Query Parameters:

Parameter	Type	Required	Description
`repository`	string	No	Repository name

Response:

{
  "branches": [
    {
      "name": "main",
      "head_commit_id": "abc123def456789",
      "is_protected": true,
      "created_at": "2024-12-01T00:00:00Z"
    },
    {
      "name": "feature/add-preferences",
      "head_commit_id": "def456abc789012",
      "is_protected": false,
      "created_at": "2024-12-18T10:30:00Z"
    }
  ],
  "current": "main"
}

curl Example:

curl http://localhost:8000/api/v1/branches

Checkout Branch

Switch to a different branch.

POST /api/v1/branches/{name}/checkout

Path Parameters:

Parameter	Type	Description
`name`	string	Branch name to checkout

Response:

{
  "status": "checked_out",
  "branch": "feature/add-preferences",
  "commit_id": "def456abc789012"
}

curl Example:

curl -X POST http://localhost:8000/api/v1/branches/feature%2Fadd-preferences/checkout

Note: URL-encode branch names containing / (e.g., feature/xyz becomes feature%2Fxyz).

Delete Branch

Delete a branch.

DELETE /api/v1/branches/{name}

Path Parameters:

Parameter	Type	Description
`name`	string	Branch name to delete

Query Parameters:

Parameter	Type	Required	Default	Description
`force`	boolean	No	`false`	Force delete even if unmerged
`repository`	string	No	-	Repository name

Response: 204 No Content

curl Example:

# Normal delete (fails if unmerged)
curl -X DELETE http://localhost:8000/api/v1/branches/feature%2Fadd-preferences

# Force delete
curl -X DELETE "http://localhost:8000/api/v1/branches/feature%2Fadd-preferences?force=true"

Compare Branches

Compare two branches to see how they’ve diverged.

GET /api/v1/branches/compare

Query Parameters:

Parameter	Type	Required	Description
`source`	string	Yes	Source branch
`target`	string	Yes	Target branch
`repository`	string	No	Repository name

Response:

{
  "source_branch": "feature/add-preferences",
  "target_branch": "main",
  "ahead_count": 3,
  "behind_count": 1,
  "can_fast_forward": false,
  "merge_base_id": "abc123def456789"
}

curl Example:

curl "http://localhost:8000/api/v1/branches/compare?source=feature/add-preferences&target=main"

Merge Branch

Merge a branch into the current branch.

POST /api/v1/branches/merge

Request Body:

{
  "source": "feature/add-preferences",
  "strategy": "THREE_WAY",
  "message": "Merge feature/add-preferences into main",
  "no_commit": false
}

Field	Type	Required	Default	Description
`source`	string	Yes	-	Source branch to merge from
`strategy`	string	No	`THREE_WAY`	Merge strategy: `THREE_WAY`, `FAST_FORWARD`, `OURS`, `THEIRS`
`message`	string	No	auto	Custom merge commit message
`no_commit`	boolean	No	`false`	Stage but don’t commit merge

Response:

{
  "source_branch": "feature/add-preferences",
  "target_branch": "main",
  "is_fast_forward": false,
  "has_conflicts": false,
  "conflict_count": 0,
  "result_commit_id": "ghi789jkl012345"
}

Response (with conflicts): 409 Conflict

{
  "error": "MergeConflictError",
  "message": "Merge conflict in 1 table(s)",
  "details": {
    "conflict_tables": ["users"],
    "conflict_count": 3
  }
}

curl Examples:

# Standard merge
curl -X POST http://localhost:8000/api/v1/branches/merge \
  -H "Content-Type: application/json" \
  -d '{"source": "feature/add-preferences"}'

# Fast-forward only
curl -X POST http://localhost:8000/api/v1/branches/merge \
  -H "Content-Type: application/json" \
  -d '{"source": "feature/add-preferences", "strategy": "FAST_FORWARD"}'

# Merge without committing
curl -X POST http://localhost:8000/api/v1/branches/merge \
  -H "Content-Type: application/json" \
  -d '{"source": "feature/add-preferences", "no_commit": true}'

Abort Merge

Abort an in-progress merge operation.

POST /api/v1/branches/merge/abort

Response:

{
  "status": "aborted"
}

curl Example:

curl -X POST http://localhost:8000/api/v1/branches/merge/abort

Commit Endpoints

Create commits and view commit history.

Create Commit

Create a new commit with staged changes.

POST /api/v1/commits

Request Body:

{
  "message": "Add user preferences feature",
  "author_name": "Jane Developer",
  "author_email": "jane@example.com"
}

Field	Type	Required	Description
`message`	string	Yes	Commit message (min 1 character)
`author_name`	string	No	Author name (uses config default)
`author_email`	string	No	Author email (uses config default)

Response: 201 Created

{
  "id": "abc123def456789abcdef0123456789abcdef01",
  "short_id": "abc123d",
  "message": "Add user preferences feature",
  "author_name": "Jane Developer",
  "author_email": "jane@example.com",
  "created_at": "2024-12-18T10:45:00Z",
  "parent_count": 1
}

curl Example:

curl -X POST http://localhost:8000/api/v1/commits \
  -H "Content-Type: application/json" \
  -d '{"message": "Add user preferences feature"}'

List Commits

Get commit history.

GET /api/v1/commits

Query Parameters:

Parameter	Type	Required	Default	Description
`ref`	string	No	HEAD	Branch or commit reference
`limit`	integer	No	100	Max commits to return (1-1000)
`offset`	integer	No	0	Offset for pagination
`repository`	string	No	-	Repository name

Response:

{
  "commits": [
    {
      "id": "abc123def456789abcdef0123456789abcdef01",
      "short_id": "abc123d",
      "message": "Add user preferences feature",
      "author_name": "Jane Developer",
      "author_email": "jane@example.com",
      "created_at": "2024-12-18T10:45:00Z",
      "parent_count": 1
    }
  ],
  "total": 42,
  "has_more": true
}

curl Examples:

# Get recent commits
curl http://localhost:8000/api/v1/commits

# Get commits from specific branch
curl "http://localhost:8000/api/v1/commits?ref=feature/add-preferences"

# Pagination
curl "http://localhost:8000/api/v1/commits?limit=10&offset=20"

Get Commit

Get details of a specific commit.

GET /api/v1/commits/{commit_id}

Path Parameters:

Parameter	Type	Description
`commit_id`	string	Full or short commit ID

Response:

{
  "id": "abc123def456789abcdef0123456789abcdef01",
  "short_id": "abc123d",
  "message": "Add user preferences feature",
  "author_name": "Jane Developer",
  "author_email": "jane@example.com",
  "created_at": "2024-12-18T10:45:00Z",
  "parent_count": 1
}

curl Example:

curl http://localhost:8000/api/v1/commits/abc123def456789

Get Diff

Show differences between commits or branches.

GET /api/v1/commits/diff

Query Parameters:

Parameter	Type	Required	Description
`base`	string	No	Base commit/branch (defaults to parent)
`target`	string	No	Target commit/branch (defaults to HEAD)
`repository`	string	No	Repository name

Response:

{
  "base_commit_id": "abc123def456789",
  "target_commit_id": "def456ghi789012",
  "tables_changed": 2,
  "total_inserts": 15,
  "total_updates": 8,
  "total_deletes": 3
}

curl Examples:

# Diff working state vs HEAD
curl http://localhost:8000/api/v1/commits/diff

# Diff between branches
curl "http://localhost:8000/api/v1/commits/diff?base=main&target=feature/add-preferences"

# Diff between commits
curl "http://localhost:8000/api/v1/commits/diff?base=abc123&target=def456"

Stage All Changes

Stage all pending changes for commit.

POST /api/v1/commits/stage

Response:

{
  "status": "staged",
  "count": 5
}

curl Example:

curl -X POST http://localhost:8000/api/v1/commits/stage

Unstage All Changes

Remove all changes from the staging area.

POST /api/v1/commits/unstage

Response:

{
  "status": "unstaged",
  "count": 5
}

curl Example:

curl -X POST http://localhost:8000/api/v1/commits/unstage

Error Responses

All errors follow a consistent format:

{
  "error": "ErrorClassName",
  "message": "Human-readable error description",
  "details": {
    "additional": "context information"
  }
}

HTTP Status Codes

Code	Meaning	When Used
`200`	OK	Successful GET/POST operations
`201`	Created	Resource successfully created
`204`	No Content	Successful DELETE operations
`400`	Bad Request	Invalid request format or parameters
`404`	Not Found	Resource not found
`409`	Conflict	Merge conflicts, duplicate resources
`422`	Unprocessable Entity	Validation errors
`503`	Service Unavailable	Database connection issues

Error Types

Error	HTTP Status	Description
`BranchNotFoundError`	404	Specified branch doesn’t exist
`CommitNotFoundError`	404	Specified commit doesn’t exist
`RepositoryNotInitializedError`	404	Repository not initialized or opened
`BranchAlreadyExistsError`	409	Branch name already in use
`MergeConflictError`	409	Merge has unresolved conflicts
`ConnectionFailedError`	503	Unable to connect to metadata database
`ValidationError`	422	Request body validation failed

Example Error Responses

Branch Not Found:

{
  "error": "BranchNotFoundError",
  "message": "Branch 'feature/nonexistent' not found",
  "details": {
    "branch": "feature/nonexistent"
  }
}

Validation Error:

{
  "detail": [
    {
      "type": "string_too_short",
      "loc": ["body", "name"],
      "msg": "String should have at least 1 character",
      "input": "",
      "ctx": {"min_length": 1}
    }
  ]
}

Connection Error:

{
  "error": "ConnectionFailedError",
  "message": "Failed to connect to metadata database",
  "details": {
    "url": "postgresql://localhost/horizon_epoch"
  }
}

Pagination

List endpoints support pagination via limit and offset query parameters.

GET /api/v1/commits?limit=10&offset=20

Parameter	Type	Default	Range	Description
`limit`	integer	100	1-1000	Maximum items per page
`offset`	integer	0	0+	Number of items to skip

Paginated responses include metadata:

{
  "items": [...],
  "total": 150,
  "has_more": true
}

OpenAPI Specification

The API provides auto-generated OpenAPI 3.0 documentation:

Endpoint	Description
`GET /openapi.json`	Raw OpenAPI specification
`GET /docs`	Swagger UI (interactive, debug mode only)
`GET /redoc`	ReDoc (alternative UI, debug mode only)

Enable debug mode to access Swagger UI:

EPOCH_DEBUG=true uv run uvicorn horizon_epoch.api:app --reload

Then visit: http://localhost:8000/docs

Export OpenAPI Spec

You can export the OpenAPI specification using the provided script:

# Export as JSON
cd python
uv run python scripts/export_openapi.py --output ../docs/openapi.json

# Export as YAML
uv run python scripts/export_openapi.py --format yaml --output ../docs/openapi.yaml

Environment Configuration

The API can be configured via environment variables:

Variable	Default	Description
`EPOCH_METADATA_URL`	`postgresql://localhost:5432/horizon_epoch`	Metadata database URL
`EPOCH_DEFAULT_BRANCH`	`main`	Default branch name
`EPOCH_API_PREFIX`	`/api/v1`	API route prefix
`EPOCH_DEBUG`	`false`	Enable debug mode (exposes Swagger UI)
`EPOCH_LOG_LEVEL`	`INFO`	Logging level
`EPOCH_AUTHOR_NAME`	-	Default author name for commits
`EPOCH_AUTHOR_EMAIL`	-	Default author email for commits

Example:

export EPOCH_METADATA_URL="postgresql://user:pass@localhost:5432/horizon_epoch"
export EPOCH_DEBUG=true
export EPOCH_AUTHOR_NAME="API User"
export EPOCH_AUTHOR_EMAIL="api@example.com"

uv run uvicorn horizon_epoch.api:app --host 0.0.0.0 --port 8000

Rate Limiting

Current Status: Rate limiting is not currently implemented.

When rate limiting is implemented, limits will be returned in response headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1705312200

CORS

Cross-Origin Resource Sharing is enabled for all origins by default. For production, configure allowed origins appropriately.

Current configuration:

Origins: * (all)
Methods: All HTTP methods
Headers: All headers
Credentials: Allowed

Configuration Reference

Complete reference for Horizon Epoch configuration options.

Configuration Sources

Configuration is loaded from (in order of precedence):

Command-line arguments (highest priority)
Environment variables
Local repository configuration (.epoch/config)
Configuration file (epoch.toml)
Default values (lowest priority)

Configuration File Search Order

The CLI searches for configuration files in this order:

./epoch.toml (current directory)
./config/epoch.toml
./.epoch/config.toml
~/.config/epoch/config.toml
/etc/epoch/config.toml

The first file found is used. Override with --config or EPOCH_CONFIG_FILE.

Configuration File

Default location: ~/.epoch/config.toml or ./epoch.toml

Full Example

# epoch.toml

[metadata]
url = "postgresql://localhost/horizon_epoch"
pool_size = 10
connect_timeout = 30

[defaults]
branch = "main"
author_name = "Your Name"
author_email = "you@example.com"
format = "text"

[logging]
level = "info"
format = "pretty"  # or "json"
file = "/var/log/epoch/epoch.log"

# PostgreSQL storage backends
[storage.postgres.production]
url = "postgresql://user:pass@prod-db.example.com/production"
pool_size = 20
connect_timeout = 30

[storage.postgres.staging]
host = "staging-db.example.com"
port = 5432
database = "staging"
username = "epoch_user"
password_file = "/run/secrets/db-password"
sslmode = "verify-full"
ssl_rootcert = "/etc/ssl/certs/ca.crt"

[storage.postgres.secure]
host = "secure-db.internal"
database = "secure"
vault_path = "secret/data/secure-db"

[storage.postgres.secure.ssh_tunnel]
host = "bastion.example.com"
user = "ubuntu"
key_file = "~/.ssh/id_rsa"

# S3 storage backends
[storage.s3.datalake]
bucket = "company-datalake"
region = "us-east-1"
prefix = "horizon-epoch/"

[storage.s3.partner]
bucket = "partner-shared"
region = "us-west-2"
assume_role_arn = "arn:aws:iam::123456789012:role/PartnerAccess"
external_id = "partner-123"

[storage.s3.minio]
bucket = "local-data"
endpoint = "http://localhost:9000"
access_key = "minioadmin"
secret_key = "minioadmin"
force_path_style = true

# Vault integration
[vault]
addr = "https://vault.example.com:8200"
auth_method = "approle"
role_id = "xxx-xxx-xxx"
secret_id_file = "/run/secrets/vault-secret-id"
ca_cert = "/etc/ssl/certs/vault-ca.crt"

# AWS integration
[aws]
region = "us-east-1"
profile = "production"

# Credential caching
[credentials]
cache_enabled = true
cache_ttl = 300
refresh_before_expiry = 60

Environment Variables

All configuration can be set via environment variables with EPOCH_ prefix:

Core Settings

Variable	Description	Default
`EPOCH_METADATA_URL`	Metadata database URL	-
`DATABASE_URL`	Fallback for metadata URL (SQLx compatible)	-
`EPOCH_CONFIG_FILE`	Config file path	`~/.epoch/config.toml`
`EPOCH_DEFAULT_BRANCH`	Default branch name	`main`
`EPOCH_FORMAT`	Output format	`text`

Author Settings

Variable	Description	Default
`EPOCH_AUTHOR_NAME`	Default author name for commits	git config user.name
`EPOCH_AUTHOR_EMAIL`	Default author email for commits	git config user.email

Author information is resolved in this priority order:

Command-line arguments (--author-name, --author-email)
Environment variables (EPOCH_AUTHOR_NAME, EPOCH_AUTHOR_EMAIL)
Local repository config (.epoch/config)
Git configuration (git config user.name, git config user.email)

Editor Settings

Variable	Description	Default
`EPOCH_EDITOR`	Editor for interactive operations	-
`EDITOR`	Fallback editor	-
`VISUAL`	Secondary fallback editor	-

Editor is selected in priority order: EPOCH_EDITOR > EDITOR > VISUAL > platform default (vi on Unix, notepad.exe on Windows).

Commit Signing

Variable	Description	Default
`EPOCH_SIGNING_KEY`	Path to signing key file (ASCII-armored)	-
`EPOCH_SIGNING_KEY_PASSWORD`	Password for signing key	- (prompts)

Logging

Variable	Description	Default
`EPOCH_LOG_LEVEL`	Log level (error/warn/info/debug/trace)	`info`
`EPOCH_LOG_FORMAT`	Log format (pretty/json)	`pretty`
`EPOCH_LOG_FILE`	Log file path	-
`RUST_LOG`	Rust logging filter (module=level format)	-

API Server

Variable	Description	Default
`EPOCH_API_HOST`	API server bind address	`0.0.0.0`
`EPOCH_API_PORT`	API server port	`8000`
`EPOCH_API_PREFIX`	API route prefix	`/api/v1`
`EPOCH_DEBUG`	Enable debug mode (exposes Swagger UI)	`false`
`EPOCH_SERVER_URL`	Remote server URL for CLI	-

OAuth/OIDC Authentication

Variable	Description
`EPOCH_OAUTH_CLIENT_ID`	Default OAuth client ID
`EPOCH_OAUTH_CLIENT_SECRET`	Default OAuth client secret
`EPOCH_OAUTH_{PROVIDER}_CLIENT_ID`	Provider-specific client ID
`EPOCH_OAUTH_{PROVIDER}_CLIENT_SECRET`	Provider-specific client secret
`EPOCH_OAUTH_OIDC_ISSUER`	OIDC issuer URL for custom provider

Supported providers: GITHUB, GOOGLE, AZURE, OKTA, OIDC (custom).

Example:

# GitHub OAuth
export EPOCH_OAUTH_GITHUB_CLIENT_ID="your-client-id"
export EPOCH_OAUTH_GITHUB_CLIENT_SECRET="your-client-secret"
epoch auth login --method oauth --provider github

# Custom OIDC provider
export EPOCH_OAUTH_CLIENT_ID="your-client-id"
export EPOCH_OAUTH_OIDC_ISSUER="https://your-idp.example.com"
epoch auth login --method oauth --provider oidc

PostgreSQL

Variable	Description
`EPOCH_POSTGRES_{NAME}_URL`	Full connection URL
`EPOCH_POSTGRES_{NAME}_HOST`	Database host
`EPOCH_POSTGRES_{NAME}_PORT`	Database port
`EPOCH_POSTGRES_{NAME}_DATABASE`	Database name
`EPOCH_POSTGRES_{NAME}_USERNAME`	Username
`EPOCH_POSTGRES_{NAME}_PASSWORD`	Password
`EPOCH_POSTGRES_{NAME}_SSLMODE`	SSL mode

S3

Variable	Description
`EPOCH_S3_{NAME}_BUCKET`	Bucket name
`EPOCH_S3_{NAME}_REGION`	AWS region
`EPOCH_S3_{NAME}_ENDPOINT`	Custom endpoint
`EPOCH_S3_{NAME}_ACCESS_KEY`	Access key ID
`EPOCH_S3_{NAME}_SECRET_KEY`	Secret access key
`AWS_ACCESS_KEY_ID`	Default AWS access key
`AWS_SECRET_ACCESS_KEY`	Default AWS secret key
`AWS_DEFAULT_REGION`	Default AWS region

MySQL

Variable	Description
`EPOCH_MYSQL_{NAME}_URL`	Full connection URL
`EPOCH_MYSQL_{NAME}_HOST`	Database host
`EPOCH_MYSQL_{NAME}_PORT`	Database port
`EPOCH_MYSQL_{NAME}_DATABASE`	Database name
`EPOCH_MYSQL_{NAME}_USERNAME`	Username
`EPOCH_MYSQL_{NAME}_PASSWORD`	Password

SQL Server

Variable	Description
`EPOCH_MSSQL_{NAME}_URL`	Full connection URL
`EPOCH_MSSQL_{NAME}_HOST`	Database host
`EPOCH_MSSQL_{NAME}_PORT`	Database port
`EPOCH_MSSQL_{NAME}_DATABASE`	Database name
`EPOCH_MSSQL_{NAME}_USERNAME`	Username
`EPOCH_MSSQL_{NAME}_PASSWORD`	Password

SQLite

Variable	Description
`EPOCH_SQLITE_{NAME}_PATH`	Database file path

Azure Blob Storage

Variable	Description
`EPOCH_AZURE_{NAME}_ACCOUNT`	Storage account name
`EPOCH_AZURE_{NAME}_CONTAINER`	Container name
`EPOCH_AZURE_{NAME}_CONNECTION_STRING`	Connection string
`EPOCH_AZURE_{NAME}_ACCESS_KEY`	Storage access key
`AZURE_TENANT_ID`	Azure AD tenant ID
`AZURE_CLIENT_ID`	Azure AD client ID
`AZURE_CLIENT_SECRET`	Azure AD client secret

Google Cloud Storage

Variable	Description
`EPOCH_GCS_{NAME}_BUCKET`	Bucket name
`EPOCH_GCS_{NAME}_PROJECT`	GCP project ID
`GOOGLE_APPLICATION_CREDENTIALS`	Service account key file path

Local Filesystem

Variable	Description
`EPOCH_LOCAL_{NAME}_PATH`	Base directory path

Vault

Variable	Description
`VAULT_ADDR`	Vault server address
`VAULT_TOKEN`	Vault token
`VAULT_ROLE_ID`	AppRole role ID
`VAULT_SECRET_ID`	AppRole secret ID

SSH

Variable	Description
`EPOCH_SSH_HOST`	SSH tunnel host
`EPOCH_SSH_USER`	SSH username
`EPOCH_SSH_KEY_FILE`	SSH private key path
`SSH_AUTH_SOCK`	SSH agent socket

Configuration Sections

[metadata]

Metadata database configuration.

Option	Type	Default	Description
`url`	string	-	PostgreSQL connection URL
`pool_size`	int	10	Connection pool size
`connect_timeout`	int	30	Connection timeout (seconds)
`idle_timeout`	int	600	Idle connection timeout
`max_lifetime`	int	3600	Max connection lifetime

[defaults]

Default values for operations.

Option	Type	Default	Description
`branch`	string	“main”	Default branch
`author_name`	string	-	Default author name
`author_email`	string	-	Default author email
`format`	string	“text”	Output format

[logging]

Logging configuration.

Option	Type	Default	Description
`level`	string	“info”	Log level
`format`	string	“pretty”	Log format
`file`	string	-	Log file path
`json`	bool	false	JSON log format

[storage.postgres.{name}]

PostgreSQL backend configuration.

Option	Type	Default	Description
`url`	string	-	Full connection URL
`host`	string	-	Database host
`port`	int	5432	Database port
`database`	string	-	Database name
`username`	string	-	Username
`password`	string	-	Password
`password_file`	string	-	Password file path
`sslmode`	string	“prefer”	SSL mode
`ssl_cert`	string	-	Client certificate
`ssl_key`	string	-	Client key
`ssl_rootcert`	string	-	CA certificate
`pool_size`	int	10	Pool size
`connect_timeout`	int	30	Timeout
`vault_path`	string	-	Vault secret path
`vault_role`	string	-	Vault dynamic role
`aws_secret_id`	string	-	AWS Secrets Manager ID
`use_iam_auth`	bool	false	Use RDS IAM auth

[storage.postgres.{name}.ssh_tunnel]

SSH tunnel configuration for PostgreSQL.

Option	Type	Default	Description
`host`	string	-	Bastion host
`port`	int	22	SSH port
`user`	string	-	SSH username
`key_file`	string	-	Private key path
`key_passphrase`	string	-	Key passphrase
`use_agent`	bool	false	Use SSH agent
`known_hosts_file`	string	-	Known hosts file
`strict_host_key_checking`	bool/string	true	Host key checking
`keepalive_interval`	int	60	Keepalive interval
`connect_timeout`	int	30	Connection timeout
`auto_reconnect`	bool	true	Auto-reconnect

[storage.s3.{name}]

S3 backend configuration.

Option	Type	Default	Description
`bucket`	string	-	Bucket name
`region`	string	-	AWS region
`endpoint`	string	-	Custom endpoint
`prefix`	string	“”	Key prefix
`access_key`	string	-	Access key ID
`secret_key`	string	-	Secret access key
`session_token`	string	-	Session token
`force_path_style`	bool	false	Path-style URLs
`use_instance_credentials`	bool	false	Use instance profile
`assume_role_arn`	string	-	Role to assume
`external_id`	string	-	External ID
`vault_path`	string	-	Vault secret path
`aws_secret_id`	string	-	Secrets Manager ID

[storage.mysql.{name}]

MySQL backend configuration.

Option	Type	Default	Description
`url`	string	-	Full connection URL
`host`	string	-	Database host
`port`	int	3306	Database port
`database`	string	-	Database name
`username`	string	-	Username
`password`	string	-	Password
`password_file`	string	-	Password file path
`ssl_mode`	string	“preferred”	SSL mode
`ssl_ca`	string	-	CA certificate
`pool_size`	int	10	Pool size
`connect_timeout`	int	30	Timeout
`vault_path`	string	-	Vault secret path

[storage.mysql.{name}.ssh_tunnel]

SSH tunnel configuration for MySQL (same options as PostgreSQL).

[storage.mssql.{name}]

SQL Server backend configuration.

Option	Type	Default	Description
`url`	string	-	Full connection URL
`host`	string	-	Database host
`port`	int	1433	Database port
`database`	string	-	Database name
`username`	string	-	Username
`password`	string	-	Password
`password_file`	string	-	Password file path
`trust_cert`	bool	false	Trust server certificate
`encrypt`	bool	true	Encrypt connection
`pool_size`	int	10	Pool size
`connect_timeout`	int	30	Timeout
`vault_path`	string	-	Vault secret path
`use_windows_auth`	bool	false	Use Windows auth

[storage.sqlite.{name}]

SQLite backend configuration.

Option	Type	Default	Description
`path`	string	-	Database file path
`mode`	string	“rwc”	Open mode (r/rw/rwc)
`in_memory`	bool	false	Use in-memory database
`busy_timeout`	int	5000	Busy timeout (ms)
`journal_mode`	string	“wal”	Journal mode

[storage.azure.{name}]

Azure Blob Storage backend configuration.

Option	Type	Default	Description
`account`	string	-	Storage account name
`container`	string	-	Container name
`prefix`	string	“”	Blob prefix
`auth_method`	string	“connection_string”	Auth method
`connection_string`	string	-	Connection string
`access_key`	string	-	Storage access key
`sas_token`	string	-	SAS token
`tenant_id`	string	-	Azure AD tenant ID
`client_id`	string	-	Azure AD client ID
`client_secret`	string	-	Azure AD client secret
`use_managed_identity`	bool	false	Use managed identity
`vault_path`	string	-	Vault secret path

[storage.gcs.{name}]

Google Cloud Storage backend configuration.

Option	Type	Default	Description
`bucket`	string	-	Bucket name
`project_id`	string	-	GCP project ID
`prefix`	string	“”	Object prefix
`auth_method`	string	“service_account”	Auth method
`credentials_file`	string	-	Service account key file
`use_adc`	bool	false	Use Application Default Credentials
`vault_path`	string	-	Vault secret path

[storage.local.{name}]

Local filesystem backend configuration.

Option	Type	Default	Description
`path`	string	-	Base directory path
`create_dirs`	bool	true	Create directories if missing

[vault]

HashiCorp Vault configuration.

Option	Type	Default	Description
`addr`	string	-	Vault address
`token`	string	-	Vault token
`auth_method`	string	“token”	Auth method
`role_id`	string	-	AppRole role ID
`secret_id`	string	-	AppRole secret ID
`secret_id_file`	string	-	Secret ID file
`role`	string	-	K8s/AWS role
`ca_cert`	string	-	CA certificate
`skip_verify`	bool	false	Skip TLS verify
`namespace`	string	-	Vault namespace

[aws]

AWS configuration.

Option	Type	Default	Description
`region`	string	-	Default region
`profile`	string	-	AWS profile
`access_key_id`	string	-	Access key
`secret_access_key`	string	-	Secret key

[credentials]

Credential caching configuration.

Option	Type	Default	Description
`cache_enabled`	bool	true	Enable caching
`cache_ttl`	int	300	Cache TTL (seconds)
`cache_max_size`	int	100	Max cache entries
`refresh_before_expiry`	int	60	Refresh threshold

Variable Substitution

Environment variables can be referenced in config:

[storage.postgres.production]
password = "${DB_PASSWORD}"

File contents can be referenced:

[storage.postgres.production]
password = "@file:/run/secrets/db-password"

Profiles

Multiple profiles can be defined:

[profiles.development]
metadata.url = "postgresql://localhost/horizon_epoch_dev"

[profiles.production]
metadata.url = "postgresql://prod-db/horizon_epoch"

Switch profiles:

epoch --profile production status

Local Repository Configuration

When you initialize a repository with epoch init, a .epoch/ directory is created with local configuration.

Directory Structure

.epoch/
├── config          # Repository configuration (TOML)
├── HEAD            # Current branch reference
└── refs/
    ├── heads/      # Branch references
    └── tags/       # Tag references

Local Config File (`.epoch/config`)

# Repository identity
repository_name = "my-data-repo"
repository_id = "550e8400-e29b-41d4-a716-446655440000"

# Metadata database connection
metadata_url = "postgresql://localhost/horizon_epoch"

# Current branch (also tracked in HEAD file)
current_branch = "main"

# Author information (optional, overrides global config)
[author]
name = "Your Name"
email = "you@example.com"

# Branching configuration (optional)
[branching]
strategy = "environment"  # or "gitflow", "trunk"

[branching.promotion]
paths = [
    { source = "development", target = "staging" },
    { source = "staging", target = "production" }
]
require_approval = true

HEAD File

The HEAD file tracks the current branch:

ref: refs/heads/main

Or in detached HEAD state (direct commit reference):

abc123def456...

DDL Configuration

DDL (Data Definition Language) execution settings for schema changes.

[ddl]

Option	Type	Default	Description
`lock_timeout_secs`	int	30	Lock acquisition timeout
`statement_timeout_secs`	int	300	Statement execution timeout
`check_table_stats`	bool	true	Check table size before DDL
`use_concurrent`	bool	false	Use concurrent index operations
`warn_large_table_rows`	int	1000000	Warn if table exceeds rows
`warn_large_table_bytes`	int	1073741824	Warn if table exceeds bytes (1GB)

[ddl.retry]

Option	Type	Default	Description
`max_retries`	int	3	Maximum retry attempts
`initial_delay_ms`	int	500	Initial retry delay
`max_delay_ms`	int	30000	Maximum retry delay
`backoff_multiplier`	float	2.0	Exponential backoff factor

Example:

[ddl]
lock_timeout_secs = 30
statement_timeout_secs = 300
check_table_stats = true

[ddl.retry]
max_retries = 3
initial_delay_ms = 500
max_delay_ms = 30000

Branching Configuration

[branching]

Option	Type	Default	Description
`strategy`	string	“environment”	Branching strategy
`default_branch`	string	“main”	Default branch name
`protected_branches`	array	[“main”, “production”]	Protected branch patterns

Branching Strategies

environment (default):

Branches represent environments (development, staging, production)
Promotion flows upward through environments
Good for data pipelines and ETL workflows

gitflow:

Feature branches, develop, release, hotfix, main
Traditional Git Flow pattern adapted for data
Good for teams familiar with Git Flow

trunk:

Single main branch with short-lived feature branches
Continuous integration style
Good for fast-moving projects

[branching.promotion]

Option	Type	Default	Description
`paths`	array	[]	Allowed promotion paths
`require_approval`	bool	false	Require approval for promotions
`auto_merge`	bool	true	Auto-merge if no conflicts
`conflict_strategy`	string	“fail”	Conflict handling (fail/ours/theirs)

Example:

[branching]
strategy = "environment"
default_branch = "development"
protected_branches = ["staging", "production"]

[branching.promotion]
paths = [
    { source = "development", target = "staging" },
    { source = "staging", target = "production" }
]
require_approval = true
conflict_strategy = "fail"

Conflict Handling

[conflict]

Option	Type	Default	Description
`default_strategy`	string	“manual”	Default conflict resolution
`auto_resolve_schema`	bool	false	Auto-resolve schema conflicts

[conflict.table_rules]

Per-table conflict rules:

[conflict.table_rules.users]
strategy = "last_write_wins"
timestamp_field = "updated_at"

[conflict.table_rules.audit_log]
strategy = "append_only"

[conflict.field_rules]

Per-field conflict rules:

[conflict.field_rules."users.email"]
strategy = "source_wins"

[conflict.field_rules."users.login_count"]
strategy = "sum"

Strategies: manual, source_wins, target_wins, last_write_wins, sum, max, min, append_only.

Testing and Development

Development Environment Variables

Variable	Description
`DATABASE_URL`	Database URL for tests/development
`TEST_DATABASE_URL`	Alternative test database URL
`SQLX_OFFLINE`	Enable SQLx offline mode (`true`)

S3/MinIO Testing

Variable	Description
`S3_ENDPOINT`	S3/MinIO endpoint URL
`S3_BUCKET`	Test bucket name
`S3_ACCESS_KEY`	Access key for testing
`S3_SECRET_KEY`	Secret key for testing

E2E Testing

Variable	Description	Default
`EPOCH_BIN`	Path to CLI binary for tests	`./target/debug/epoch`
`E2E_KEEP_TEMP`	Keep temporary test directories	`false`
`E2E_VERBOSE`	Verbose test output	`false`

Example Development Setup

# .env file for development
DATABASE_URL=postgresql://horizon:horizon_dev_password@localhost:5432/horizon_epoch
EPOCH_METADATA_URL=postgresql://horizon:horizon_dev_password@localhost:5432/horizon_epoch

# MinIO for local S3 testing
EPOCH_S3_ENDPOINT=http://localhost:9000
EPOCH_S3_ACCESS_KEY=minioadmin
EPOCH_S3_SECRET_KEY=minioadmin
EPOCH_S3_BUCKET=horizon-epoch-data

# Development settings
EPOCH_DEBUG=true
EPOCH_LOG_LEVEL=debug
RUST_LOG=horizon_epoch=debug,sqlx=warn

# Author info
EPOCH_AUTHOR_NAME="Development User"
EPOCH_AUTHOR_EMAIL="dev@example.com"

Complete Configuration Example

# epoch.toml - Complete configuration example

# =============================================================================
# Metadata Database
# =============================================================================
[metadata_db]
url = "postgresql://localhost:5432/horizon_epoch"
max_connections = 10
min_connections = 2

# =============================================================================
# Logging
# =============================================================================
[logging]
level = "info"
format = "json"  # "pretty" or "json"

# =============================================================================
# DDL Execution
# =============================================================================
[ddl]
lock_timeout_secs = 30
statement_timeout_secs = 300
check_table_stats = true
use_concurrent = false

[ddl.retry]
max_retries = 3
initial_delay_ms = 500
max_delay_ms = 30000

# =============================================================================
# Branching Strategy
# =============================================================================
[branching]
strategy = "environment"
default_branch = "development"
protected_branches = ["staging", "production"]

[branching.promotion]
paths = [
    { source = "development", target = "staging" },
    { source = "staging", target = "production" }
]
require_approval = true

# =============================================================================
# Storage Backends
# =============================================================================

# PostgreSQL backends
[[storage.postgresql]]
name = "primary"
url = "postgresql://localhost:5432/data"

[[storage.postgresql]]
name = "analytics"
host = "analytics-db.internal"
port = 5432
database = "analytics"

[storage.postgresql.credentials]
type = "vault"
addr = "https://vault.example.com:8200"
secret_path = "database/creds/analytics"

[storage.postgresql.tls]
mode = "verify_full"
ca_cert = "/etc/ssl/certs/ca.pem"

# S3 backends
[[storage.s3]]
name = "datalake"
endpoint = "https://s3.amazonaws.com"
bucket = "company-datalake"
region = "us-east-1"
prefix = "epoch/"

[[storage.s3]]
name = "archive"
endpoint = "http://minio.internal:9000"
bucket = "archive"
access_key_id = "${MINIO_ACCESS_KEY}"
secret_access_key = "${MINIO_SECRET_KEY}"
force_path_style = true

Credential Provider Reference

Complete reference for the Horizon Epoch credential provider abstraction. This document covers the architecture, all available providers, credential types, caching, and error handling.

Architecture Overview

The credential provider system provides a unified abstraction for managing credentials across all storage backends.

                    ┌─────────────────────────────────────────┐
                    │          Storage Backend                │
                    │  (PostgreSQL, S3, Vault, etc.)          │
                    └─────────────────────┬───────────────────┘
                                          │
                                          │ get_credentials()
                                          ▼
                    ┌─────────────────────────────────────────┐
                    │       CachedCredentialProvider          │
                    │  (optional caching layer)               │
                    └─────────────────────┬───────────────────┘
                                          │
          ┌───────────────────────────────┼───────────────────────────────┐
          │                               │                               │
          ▼                               ▼                               ▼
┌─────────────────┐             ┌─────────────────┐             ┌─────────────────┐
│    Static       │             │  Environment    │             │     File        │
│   Provider      │             │   Provider      │             │   Provider      │
└─────────────────┘             └─────────────────┘             └─────────────────┘
          │                               │                               │
          ▼                               ▼                               ▼
┌─────────────────┐             ┌─────────────────┐             ┌─────────────────┐
│     Vault       │             │     AWS         │             │   Web Identity  │
│   Provider      │             │   Provider      │             │   Provider      │
└─────────────────┘             └─────────────────┘             └─────────────────┘

Core Trait: `CredentialProvider`

All credential providers implement this async trait:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait CredentialProvider: Send + Sync {
    /// Retrieves credentials from this provider.
    async fn get_credentials(&self) -> CredentialResult<StorageCredentials>;

    /// Returns true if credentials need refresh.
    fn requires_refresh(&self) -> bool;

    /// Refreshes the credentials.
    async fn refresh(&self) -> CredentialResult<StorageCredentials>;

    /// Returns the recommended refresh interval.
    fn refresh_interval(&self) -> Option<Duration>;

    /// Returns the provider name for logging.
    fn provider_name(&self) -> &str;

    /// Returns true if refresh is supported.
    fn supports_refresh(&self) -> bool {
        self.refresh_interval().is_some()
    }

    /// Returns metadata for diagnostics.
    fn provider_info(&self) -> ProviderInfo;
}
}

Extension Trait: `CredentialProviderExt`

Additional utility methods available on all providers:

Method	Description
`get_fresh_credentials()`	Gets credentials, refreshing first if needed
`validate()`	Validates that credentials can be retrieved

Credential Types

`StorageCredentials`

The unified credential container with expiration tracking:

#![allow(unused)]
fn main() {
pub struct StorageCredentials {
    /// The specific type of credential
    pub credential_type: StorageCredentialType,

    /// When these credentials expire (optional)
    pub expires_at: Option<DateTime<Utc>>,

    /// Additional metadata
    pub metadata: HashMap<String, String>,
}
}

Factory Methods:

Method	Description
`username_password(user, pass)`	Create username/password credentials
`token(token)`	Create token-based credentials
`aws(access_key, secret_key, session_token)`	Create AWS credentials
`oauth2(access_token, refresh_token, token_type)`	Create OAuth2 credentials
`certificate(cert, key)`	Create certificate credentials

Builder Methods:

Method	Description
`.with_expiration(datetime)`	Set expiration time
`.with_ttl(duration)`	Set TTL relative to now
`.with_metadata(key, value)`	Add metadata

Inspection Methods:

Method	Description
`is_expired()`	Check if credentials are expired
`time_to_expiry()`	Get time until expiration
`is_username_password()`	Type check
`is_token()`	Type check
`is_aws()`	Type check
`is_certificate()`	Type check
`is_oauth2()`	Type check

`StorageCredentialType`

The underlying credential type enum:

Variant	Fields	Use Case
`UsernamePassword`	`username`, `password`	Database authentication
`Token`	`token`	API key/token authentication
`AwsCredentials`	`access_key_id`, `secret_access_key`, `session_token?`	AWS services
`Certificate`	`cert`, `key`	mTLS client certificates
`OAuth2`	`access_token`, `refresh_token?`, `token_type`	OAuth2 authentication

Certificate Types

CertificateData:

#![allow(unused)]
fn main() {
pub struct CertificateData {
    pub pem: SecretString,           // PEM-encoded certificate
    pub chain: Option<Vec<SecretString>>,  // Optional chain
}
}

PrivateKeyData:

#![allow(unused)]
fn main() {
pub struct PrivateKeyData {
    pub pem: SecretString,           // PEM-encoded private key
    pub passphrase: Option<SecretString>,  // Optional passphrase
}
}

Built-in Providers

Static Provider

Pre-configured credentials for testing and development. Never use in production.

Factory Methods:

#![allow(unused)]
fn main() {
StaticCredentialProvider::username_password("user", "pass")
StaticCredentialProvider::token("api-key")
StaticCredentialProvider::aws("AKIA...", "secret", None)
StaticCredentialProvider::oauth2("access", Some("refresh"), "Bearer")
StaticCredentialProvider::certificate(cert_data, key_data)
}

Builder Pattern:

#![allow(unused)]
fn main() {
StaticCredentialProviderBuilder::new()
    .username("admin")
    .password("secret")
    .build()
}

Configuration (TOML):

[storage.postgres.mydb]
# Static credentials embedded in config (dev only!)
url = "postgresql://user:password@localhost/db"

Property	Required	Description
`url`	Yes	Full connection URL with credentials
`username`	Alternative	Username (with separate password)
`password`	With username	Password

Refresh Support: No

Environment Provider

Reads credentials from environment variables.

Factory Methods:

#![allow(unused)]
fn main() {
// PostgreSQL standard: PGUSER, PGPASSWORD
EnvCredentialProvider::postgres_default()

// AWS standard: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
EnvCredentialProvider::aws_default()

// Custom prefix: MYAPP_DB_USERNAME, MYAPP_DB_PASSWORD
EnvCredentialProvider::with_prefix("MYAPP_DB")
}

Builder Pattern:

#![allow(unused)]
fn main() {
EnvCredentialProviderBuilder::new()
    .username_var("DB_USER")
    .password_var("DB_PASS")
    .credential_type(EnvCredentialType::UsernamePassword)
    .build()
}

Environment Variable Mappings:

Preset	Username	Password	Access Key	Secret Key	Session Token
`postgres_default()`	`PGUSER`	`PGPASSWORD`	-	-	-
`mysql_default()`	`MYSQL_USER`	`MYSQL_PASSWORD`	-	-	-
`mssql_default()`	`MSSQL_USER`	`MSSQL_PASSWORD`	-	-	-
`aws_default()`	-	-	`AWS_ACCESS_KEY_ID`	`AWS_SECRET_ACCESS_KEY`	`AWS_SESSION_TOKEN`
`with_prefix("X")`	`X_USERNAME`	`X_PASSWORD`	`X_ACCESS_KEY`	`X_SECRET_KEY`	`X_SESSION_TOKEN`

Configuration (TOML):

[storage.postgres.mydb]
url = "${EPOCH_POSTGRES_MYDB_URL}"  # Variable substitution

# Or individual fields
host = "db.example.com"
username = "${DB_USER}"
password = "${DB_PASS}"

Refresh Support: Yes (re-reads environment on refresh)

File Provider

Reads credentials from files in multiple formats.

Factory Methods:

#![allow(unused)]
fn main() {
FileCredentialProvider::json("/etc/secrets/db.json")
FileCredentialProvider::yaml("/etc/secrets/db.yaml")
FileCredentialProvider::ini("/etc/secrets/db.ini")
FileCredentialProvider::env("/etc/secrets/.env")
FileCredentialProvider::aws_credentials()  // ~/.aws/credentials
FileCredentialProvider::certificate("/path/cert.pem", "/path/key.pem")
}

Builder Pattern:

#![allow(unused)]
fn main() {
FileCredentialProviderBuilder::new()
    .path("/etc/secrets/creds.json")
    .format(CredentialFileFormat::Json)
    .profile("production")  // For INI files
    .build()
}

Supported Formats:

Format	Extension	Example
JSON	`.json`	`{"username": "x", "password": "y"}`
YAML	`.yaml`, `.yml`	`username: x` `password: y`
INI	`.ini`, `.cfg`	`[default]` `username = x`
ENV	`.env`	`USERNAME=x` `PASSWORD=y`
PEM	`.pem`, `.crt`, `.key`	Certificate/key files

Configuration (TOML):

[storage.postgres.mydb]
password_file = "/run/secrets/db-password"

# Or JSON credentials file
credentials_file = "/etc/secrets/db.json"

Refresh Support: Yes (re-reads file on refresh)

AWS Provider

Retrieves credentials from AWS services.

Secrets Manager Provider

#![allow(unused)]
fn main() {
SecretsManagerProvider::new("my-secret-id")
    .region("us-east-1")
    .field_mapping(SecretFieldMapping::default())
    .build()
}

Configuration (TOML):

[storage.postgres.mydb]
aws_secret_id = "prod/db/credentials"

[aws]
region = "us-east-1"

Property	Required	Description
`aws_secret_id`	Yes	Secrets Manager secret ID or ARN
`region`	No	AWS region (default: from environment)

RDS IAM Provider

Generates temporary IAM authentication tokens for RDS.

#![allow(unused)]
fn main() {
RdsIamProvider::new()
    .host("mydb.cluster-xxx.us-east-1.rds.amazonaws.com")
    .port(5432)
    .username("iam_user")
    .region("us-east-1")
    .build()
}

Configuration (TOML):

[storage.postgres.mydb]
host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com"
use_iam_auth = true
username = "iam_user"

Property	Required	Description
`use_iam_auth`	Yes	Enable RDS IAM authentication
`host`	Yes	RDS endpoint
`username`	Yes	IAM database user

Credential Sources

#![allow(unused)]
fn main() {
pub enum AwsCredentialSource {
    /// Default AWS credential chain
    DefaultChain,

    /// Explicit credentials
    Explicit {
        access_key_id: String,
        secret_access_key: String,
        session_token: Option<String>,
    },

    /// Assume a role
    AssumeRole {
        role_arn: String,
        external_id: Option<String>,
        session_name: Option<String>,
        session_tags: Option<HashMap<String, String>>,
    },

    /// Named AWS profile
    Profile { name: String },

    /// Web Identity (OIDC)
    WebIdentity { /* ... */ },
}
}

Refresh Support: Yes (automatic token refresh)

Vault Provider

Retrieves credentials from HashiCorp Vault.

#![allow(unused)]
fn main() {
VaultCredentialProvider::new(VaultConfig {
    addr: "https://vault.example.com:8200".to_string(),
    namespace: Some("my-namespace".to_string()),
    auth: VaultAuthMethod::AppRole {
        role_id: "xxx".to_string(),
        secret_id: "yyy".to_string(),
    },
    tls: Some(VaultTlsConfig {
        ca_cert: Some("/etc/ssl/vault-ca.crt".to_string()),
        client_cert: None,
        client_key: None,
    }),
})
.secret_path("secret/data/mydb")
.secret_type(VaultSecretType::KvV2)
.build()
}

Authentication Methods:

Method	Configuration
Token	`auth_method = "token"`, `token = "hvs.xxx"`
AppRole	`auth_method = "approle"`, `role_id`, `secret_id`
Kubernetes	`auth_method = "kubernetes"`, `role`
AWS IAM	`auth_method = "aws"`, `role`
TLS	`auth_method = "tls"`, client certificate
UserPass	`auth_method = "userpass"`, `username`, `password`

Secret Types:

Type	Description
`KvV2`	Key-Value v2 secrets engine (static secrets)
`Database`	Database secrets engine (dynamic credentials)

Configuration (TOML):

[vault]
addr = "https://vault.example.com:8200"
auth_method = "approle"
role_id = "xxx-xxx-xxx"
secret_id_file = "/run/secrets/vault-secret-id"
ca_cert = "/etc/ssl/certs/vault-ca.crt"
namespace = "my-namespace"

[storage.postgres.mydb]
vault_path = "secret/data/mydb"
# Or for dynamic credentials:
vault_role = "my-db-role"

Property	Required	Description
`addr`	Yes	Vault server address
`auth_method`	Yes	Authentication method
`vault_path`	For KV	Path to KV secret
`vault_role`	For DB	Database role name
`namespace`	No	Vault namespace

Refresh Support: Yes (lease renewal, token refresh)

Vault PKI Provider

Issues X.509 certificates from Vault’s PKI secrets engine.

#![allow(unused)]
fn main() {
VaultPkiProvider::new(vault_config)
    .pki_path("pki/issue/my-role")
    .common_name("myapp.example.com")
    .ttl(Duration::from_secs(86400))
    .dns_sans(vec!["myapp.example.com", "myapp-alt.example.com"])
    .build()
}

Configuration (TOML):

[storage.postgres.secure]
vault_pki_path = "pki/issue/postgres-client"
vault_pki_common_name = "epoch-client"
vault_pki_ttl = "24h"

Property	Required	Description
`vault_pki_path`	Yes	PKI issue path
`vault_pki_common_name`	Yes	Certificate CN
`vault_pki_ttl`	No	Certificate TTL
`vault_pki_dns_sans`	No	DNS Subject Alt Names
`vault_pki_ip_sans`	No	IP Subject Alt Names

Refresh Support: Yes (automatic certificate renewal)

Web Identity Provider

Uses OIDC tokens for AWS authentication.

#![allow(unused)]
fn main() {
WebIdentityCredentialProvider::new(WebIdentityConfig {
    role_arn: "arn:aws:iam::123456789012:role/MyRole".to_string(),
    web_identity_token_file: Some("/var/run/secrets/token".to_string()),
    session_name: Some("my-session".to_string()),
    provider_url: Some("https://oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E".to_string()),
    duration_seconds: Some(3600),
    policy_arns: None,
    inline_policy: None,
})
.build()
}

Supported OIDC Providers:

Provider	Token Source
GitHub Actions	`GITHUB_TOKEN` or `ACTIONS_ID_TOKEN_REQUEST_*`
GitLab CI/CD	`CI_JOB_JWT`
Azure AD / Entra ID	Azure managed identity
Google Cloud	GCP service account
Kubernetes (EKS/GKE)	Service account token file
Custom	Token file or direct token

Configuration (TOML):

[aws]
role_arn = "arn:aws:iam::123456789012:role/MyRole"
web_identity_token_file = "/var/run/secrets/token"

Refresh Support: Yes (token refresh on expiration)

SSO Provider

Integrates with AWS IAM Identity Center (formerly AWS SSO).

#![allow(unused)]
fn main() {
SsoCredentialProvider::new(SsoConfig {
    start_url: "https://my-sso-portal.awsapps.com/start".to_string(),
    region: "us-east-1".to_string(),
    account_id: "123456789012".to_string(),
    role_name: "MyRole".to_string(),
})
.build()
}

Uses the OAuth 2.0 Device Authorization Grant flow for browser-based authentication.

Configuration (TOML):

[aws.sso]
start_url = "https://my-sso-portal.awsapps.com/start"
region = "us-east-1"
account_id = "123456789012"
role_name = "MyRole"

Refresh Support: Yes (automatic token refresh, cached tokens)

Credential Caching

Wrap any provider with CachedCredentialProvider to reduce external calls.

#![allow(unused)]
fn main() {
// Simple caching
let cached = CachedCredentialProvider::new(inner_provider);

// With configuration
let cached = CachedCredentialProvider::new(inner_provider)
    .with_refresh_buffer(Duration::from_secs(120))   // Refresh 2 min before expiry
    .with_max_cache_age(Duration::from_secs(3600));  // Max 1 hour cache

// Using extension trait
let cached = inner_provider.with_cache();

// Using builder
let cached = inner_provider.cached()
    .refresh_buffer(Duration::from_secs(60))
    .max_cache_age(Duration::from_secs(3600))
    .build();
}

Cache Configuration:

Parameter	Default	Description
`refresh_buffer`	60s	How long before expiry to refresh
`max_cache_age`	None	Max cache age for non-expiring credentials

Cache Operations:

Method	Description
`invalidate()`	Force cache invalidation
`cache_stats()`	Get cache statistics

CacheStats Fields:

Field	Type	Description
`is_cached`	`bool`	Whether credentials are cached
`cache_age`	`Option<Duration>`	How long cached
`time_to_refresh`	`Option<Duration>`	Time until auto-refresh
`is_expired`	`bool`	Whether cached credentials expired

Configuration (TOML):

[credentials]
cache_enabled = true
cache_ttl = 300           # 5 minutes
cache_max_size = 100      # Max entries
refresh_before_expiry = 60  # Refresh 60s before expiry

Provider Chains

Chain multiple providers for fallback behavior:

#![allow(unused)]
fn main() {
use std::sync::Arc;

let env_provider = Arc::new(EnvCredentialProvider::postgres_default());
let file_provider = Arc::new(FileCredentialProvider::json("/etc/db.json"));
let vault_provider = Arc::new(VaultCredentialProvider::new(config));

let chain = ChainedCredentialProvider::new(vec![
    env_provider,
    file_provider,
    vault_provider,
]).with_name("my-credential-chain");

// Tries each provider in order until one succeeds
let creds = chain.get_credentials().await?;
}

Chain Behavior:

Tries providers in order
Returns first successful result
Falls through on any error
Collects last error if all fail
Refresh interval = minimum of all providers

Error Types

`CredentialError` Variants

Variant	Description	Retryable
`NotFound`	Credentials not found	No
`Expired`	Credentials expired	No
`ProviderUnavailable`	Provider unreachable	Yes
`AuthFailed`	Authentication failed	No
`Network`	Network error	Yes
`Config`	Configuration error	No
`FileRead`	File read error	No
`FileParse`	File parse error	No
`EnvVarNotSet`	Missing env var	No
`EnvVarInvalid`	Invalid env var value	No
`MissingField`	Required field missing	No
`RefreshNotSupported`	Refresh not supported	No
`RefreshFailed`	Refresh failed	Maybe
`InvalidFormat`	Invalid format	No
`Certificate`	Certificate error	No
`PrivateKey`	Private key error	No

Error Helper Methods:

Method	Description
`is_not_found()`	Check if NotFound
`is_expired()`	Check if Expired
`is_retryable()`	Check if safe to retry

Implementing Custom Providers

Implement the CredentialProvider trait for custom credential sources:

#![allow(unused)]
fn main() {
use async_trait::async_trait;
use horizon_epoch_core::storage::credentials::{
    CredentialProvider, StorageCredentials, CredentialError, CredentialResult,
};
use std::time::Duration;

struct MyProvider {
    config: MyConfig,
}

#[async_trait]
impl CredentialProvider for MyProvider {
    async fn get_credentials(&self) -> CredentialResult<StorageCredentials> {
        // Fetch from your credential source
        let secret = self.fetch_secret().await?;

        Ok(StorageCredentials::username_password(
            secret.username,
            secret.password,
        ).with_ttl(Duration::from_secs(3600)))
    }

    fn requires_refresh(&self) -> bool {
        // Check if credentials need refresh
        false
    }

    async fn refresh(&self) -> CredentialResult<StorageCredentials> {
        // Refresh credentials
        self.get_credentials().await
    }

    fn refresh_interval(&self) -> Option<Duration> {
        // Recommended refresh interval
        Some(Duration::from_secs(300))
    }

    fn provider_name(&self) -> &str {
        "my_provider"
    }
}
}

Security Considerations

Secret Protection

All sensitive values use secrecy::SecretString:

Not printed in Debug output
Zeroized on drop
Explicit .expose_secret() required

Best Practices

Never log credentials - Use structured logging carefully
Use environment variables - Avoid hardcoded credentials
Enable caching - Reduce calls to external services
Check expiration - Always verify before use
Rotate regularly - Use providers that support rotation
Least privilege - Grant minimal required permissions

Secure Configuration Order

Vault/Secrets Manager (production)
Environment variables (containers)
Credential files with proper permissions (Kubernetes secrets)
Never: hardcoded credentials in config files

Feature Flags

Credential providers are enabled via Cargo features:

Feature	Providers Enabled
`vault`	VaultCredentialProvider, VaultPkiProvider
`aws`	SecretsManagerProvider, RdsIamProvider, SsoProvider, WebIdentityProvider

Core providers (Static, Environment, File) are always available.

Database Backend Credentials

PostgreSQL

PostgreSQL credentials can be provided via:

Method	Configuration
Environment	`PGUSER`, `PGPASSWORD` or custom prefix
Connection URL	`postgresql://user:pass@host/database`
Vault	KV secret or database secrets engine
AWS RDS IAM	Token-based authentication
AWS Secrets Manager	Stored credentials

Example (Environment):

export PGUSER="myuser"
export PGPASSWORD="mypassword"

Example (Vault):

[storage.postgres.prod]
vault_path = "secret/data/postgres/prod"

MySQL

MySQL credentials can be provided via:

Method	Configuration
Environment	`MYSQL_USER`, `MYSQL_PASSWORD` or custom prefix
Connection URL	`mysql://user:pass@host/database`
Vault	KV secret or database secrets engine
AWS RDS IAM	Token-based authentication (Aurora MySQL)
AWS Secrets Manager	Stored credentials

Example (Environment):

export MYSQL_USER="myuser"
export MYSQL_PASSWORD="mypassword"

Example (TOML):

[storage.mysql.analytics]
host = "mysql.example.com"
database = "analytics"
username = "${MYSQL_USER}"
password = "${MYSQL_PASSWORD}"

SQL Server (MSSQL)

SQL Server credentials can be provided via:

Method	Configuration
Environment	`MSSQL_USER`, `MSSQL_PASSWORD` or custom prefix
Connection URL	`mssql://user:pass@host/database`
Vault	KV secret or database secrets engine
Windows Authentication	Integrated security (Windows only)
Azure AD	Token-based authentication (Azure SQL)

Example (Environment):

export MSSQL_USER="sa"
export MSSQL_PASSWORD="YourPassword123!"

Example (TOML):

[storage.mssql.warehouse]
host = "sqlserver.example.com"
database = "warehouse"
username = "${MSSQL_USER}"
password = "${MSSQL_PASSWORD}"
encrypt = true
trust_cert = false

Example (Windows Authentication):

[storage.mssql.warehouse]
host = "sqlserver.internal"
database = "warehouse"
use_windows_auth = true

SQLite

SQLite typically does not require credentials as it uses file-based storage. However, encrypted SQLite databases (SQLCipher) may require a passphrase.

Method	Configuration
No credentials	Default for unencrypted databases
Environment	`SQLITE_PASSPHRASE` for encrypted databases
File	Passphrase file for SQLCipher

Example (Unencrypted):

[storage.sqlite.local]
path = "/data/local.db"
# No credentials needed

Example (Encrypted with SQLCipher):

[storage.sqlite.secure]
path = "/data/secure.db"
passphrase = "${SQLITE_PASSPHRASE}"

Constraint Reference

Horizon Epoch versions database constraints alongside your data, enabling schema evolution with the same Git-like branching, merging, and conflict detection you use for data.

Constraint Types

Horizon Epoch supports seven constraint types that map to PostgreSQL DDL:

Type	Rust Struct	PostgreSQL DDL
Foreign Key	`ForeignKeyConstraint`	`ALTER TABLE ADD CONSTRAINT ... FOREIGN KEY ... REFERENCES`
Unique	`UniqueConstraint`	`ALTER TABLE ADD CONSTRAINT ... UNIQUE`
Check	`CheckConstraint`	`ALTER TABLE ADD CONSTRAINT ... CHECK (expression)`
Exclusion	`ExclusionConstraint`	`ALTER TABLE ADD CONSTRAINT ... EXCLUDE USING`
Index	`IndexDefinition`	`CREATE INDEX`
Enum Type	`EnumType`	`CREATE TYPE ... AS ENUM`
Default Value	`DefaultValue`	`ALTER COLUMN ... SET DEFAULT`

Foreign Key Constraints

Foreign keys define referential integrity between tables.

Structure

#![allow(unused)]
fn main() {
ForeignKeyConstraint {
    name: String,                    // Constraint name
    columns: Vec<String>,            // Local columns
    references_table: String,        // Referenced table
    references_columns: Vec<String>, // Referenced columns
    on_update: ReferentialAction,    // Action on parent update
    on_delete: ReferentialAction,    // Action on parent delete
    deferrable: bool,                // Can be deferred
    initially_deferred: bool,        // Deferred by default
}
}

Referential Actions

Action	Description
`NoAction`	Raise error if referenced rows exist (default)
`Restrict`	Same as NoAction, checked immediately
`Cascade`	Delete/update child rows automatically
`SetNull`	Set child columns to NULL
`SetDefault`	Set child columns to default value

PostgreSQL Mapping

-- Generated DDL
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer
    FOREIGN KEY (customer_id) REFERENCES customers (id)
    ON DELETE CASCADE ON UPDATE NO ACTION;

Python Example

from horizon_epoch import Schema, ForeignKeyConstraint

schema = Schema.builder() \
    .add_field("customer_id", "integer") \
    .add_foreign_key(
        name="fk_orders_customer",
        columns=["customer_id"],
        references_table="customers",
        references_columns=["id"],
        on_delete="cascade"
    ) \
    .build()

Unique Constraints

Unique constraints ensure no duplicate values exist in the specified columns.

Structure

#![allow(unused)]
fn main() {
UniqueConstraint {
    name: String,               // Constraint name
    columns: Vec<String>,       // Columns in the constraint
    nulls_distinct: bool,       // PostgreSQL 15+ NULLS NOT DISTINCT
    deferrable: bool,           // Can be deferred
    initially_deferred: bool,   // Deferred by default
    index_method: Option<String>, // Index method (btree, hash)
}
}

PostgreSQL Mapping

-- Simple unique constraint
ALTER TABLE users ADD CONSTRAINT uq_users_email
    UNIQUE (email);

-- Composite unique constraint
ALTER TABLE order_items ADD CONSTRAINT uq_order_product
    UNIQUE (order_id, product_id);

-- PostgreSQL 15+ with NULLS NOT DISTINCT
ALTER TABLE users ADD CONSTRAINT uq_users_phone
    UNIQUE NULLS NOT DISTINCT (phone);

Python Example

schema = Schema.builder() \
    .add_field("email", "text") \
    .add_unique_constraint(
        name="uq_users_email",
        columns=["email"]
    ) \
    .build()

Check Constraints

Check constraints validate that column values satisfy a boolean expression.

Structure

#![allow(unused)]
fn main() {
CheckConstraint {
    name: String,                      // Constraint name
    expression: String,                // SQL boolean expression
    referenced_columns: Vec<String>,   // Columns used in expression
    is_not_null: bool,                 // Is this a NOT NULL constraint?
    no_inherit: bool,                  // PostgreSQL NO INHERIT option
}
}

PostgreSQL Mapping

-- Value range check
ALTER TABLE products ADD CONSTRAINT chk_products_price
    CHECK (price > 0);

-- Pattern check
ALTER TABLE users ADD CONSTRAINT chk_users_email
    CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');

-- NOT NULL (special case)
ALTER TABLE users ALTER COLUMN name SET NOT NULL;

Python Example

schema = Schema.builder() \
    .add_field("price", "numeric") \
    .add_check_constraint(
        name="chk_products_price",
        expression="price > 0"
    ) \
    .build()

Exclusion Constraints

Exclusion constraints prevent overlapping or conflicting values using operators.

Structure

#![allow(unused)]
fn main() {
ExclusionConstraint {
    name: String,                      // Constraint name
    index_method: String,              // Usually "gist" or "btree"
    columns: Vec<String>,              // Columns for display
    elements: Vec<ExclusionElement>,   // Column/operator pairs
    where_clause: Option<String>,      // Partial exclusion
    deferrable: bool,                  // Can be deferred
    initially_deferred: bool,          // Deferred by default
}

ExclusionElement {
    column: String,          // Column name or expression
    operator: String,        // Comparison operator
    opclass: Option<String>, // Operator class
}
}

PostgreSQL Mapping

-- Prevent overlapping time ranges
ALTER TABLE reservations ADD CONSTRAINT excl_reservations_overlap
    EXCLUDE USING gist (
        room_id WITH =,
        tsrange(start_time, end_time) WITH &&
    );

-- Prevent overlapping circles (geometric)
ALTER TABLE zones ADD CONSTRAINT excl_zones_overlap
    EXCLUDE USING gist (area WITH &&);

Python Example

schema = Schema.builder() \
    .add_field("room_id", "integer") \
    .add_field("start_time", "timestamp") \
    .add_field("end_time", "timestamp") \
    .add_exclusion_constraint(
        name="excl_reservations_overlap",
        index_method="gist",
        elements=[
            {"column": "room_id", "operator": "="},
            {"column": "tsrange(start_time, end_time)", "operator": "&&"}
        ]
    ) \
    .build()

Index Definitions

Indexes improve query performance and can enforce uniqueness.

Structure

#![allow(unused)]
fn main() {
IndexDefinition {
    name: String,                        // Index name
    method: IndexMethod,                 // btree, hash, gist, gin, brin
    columns: Vec<IndexColumn>,           // Columns with sort options
    is_unique: bool,                     // UNIQUE index
    where_clause: Option<String>,        // Partial index
    include_columns: Vec<String>,        // Covering index (INCLUDE)
    storage_parameters: HashMap<String, String>, // fillfactor, etc.
    concurrent: bool,                    // CONCURRENTLY option
    if_not_exists: bool,                 // IF NOT EXISTS
}

IndexColumn {
    name: String,                    // Column name or expression
    sort_order: Option<SortOrder>,   // ASC or DESC
    nulls_order: Option<NullsOrder>, // NULLS FIRST or NULLS LAST
    opclass: Option<String>,         // Operator class
}
}

Index Methods

Method	Use Case
`btree`	Default; equality and range queries
`hash`	Equality comparisons only
`gist`	Geometric, full-text, range types
`gin`	Arrays, JSONB, full-text search
`brin`	Large tables with natural ordering

PostgreSQL Mapping

-- Basic index
CREATE INDEX idx_users_email ON users (email);

-- Composite index with sort order
CREATE INDEX idx_orders_date_desc ON orders (created_at DESC NULLS LAST);

-- Partial index
CREATE INDEX idx_orders_pending ON orders (status)
    WHERE status = 'pending';

-- Covering index (PostgreSQL 11+)
CREATE INDEX idx_users_email_include ON users (email)
    INCLUDE (name, created_at);

-- GIN index for JSONB
CREATE INDEX idx_products_attrs ON products USING gin (attributes);

Python Example

schema = Schema.builder() \
    .add_field("email", "text") \
    .add_index(
        name="idx_users_email",
        columns=[{"name": "email"}],
        method="btree"
    ) \
    .build()

Enum Types

Enum types define a fixed set of allowed values.

Structure

#![allow(unused)]
fn main() {
EnumType {
    name: String,           // Type name
    values: Vec<String>,    // Ordered enum values
    schema: Option<String>, // Schema name (optional)
}
}

PostgreSQL Mapping

-- Create enum type
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered');

-- Add value to enum (PostgreSQL is additive-only for enums)
ALTER TYPE order_status ADD VALUE 'cancelled' AFTER 'delivered';

-- Use in table
ALTER TABLE orders ALTER COLUMN status TYPE order_status
    USING status::order_status;

Enum Diffing

When comparing enum types between branches, Horizon Epoch tracks:

Added values: Safe to add in migrations
Removed values: Requires data migration (destructive)
Reordered values: May affect comparisons

# Check if enum change is safe (additive only)
enum_diff = client.diff_enums("main", "feature-branch")
if enum_diff.is_additive():
    print("Safe to merge - only new values added")
else:
    print("Warning: Enum values removed or reordered")

Python Example

schema = Schema.builder() \
    .add_enum_type(
        name="order_status",
        values=["pending", "processing", "shipped", "delivered"]
    ) \
    .add_field("status", "order_status") \
    .build()

Default Values

Default values specify what value to use when none is provided.

Structure

#![allow(unused)]
fn main() {
enum DefaultValue {
    Literal(String),      // Literal value: 'active', 42, true
    Function(String),     // Function call: gen_random_uuid()
    Sequence(String),     // Sequence: nextval('users_id_seq')
    Expression(String),   // Expression: CURRENT_DATE + INTERVAL '30 days'
    CurrentTimestamp,     // CURRENT_TIMESTAMP / NOW()
    CurrentDate,          // CURRENT_DATE
    CurrentUser,          // CURRENT_USER
    Null,                 // NULL (explicit)
}
}

PostgreSQL Mapping

-- Literal default
ALTER TABLE users ALTER COLUMN status SET DEFAULT 'active';

-- Function default
ALTER TABLE users ALTER COLUMN id SET DEFAULT gen_random_uuid();

-- Timestamp default
ALTER TABLE users ALTER COLUMN created_at SET DEFAULT CURRENT_TIMESTAMP;

-- Expression default
ALTER TABLE subscriptions ALTER COLUMN expires_at
    SET DEFAULT CURRENT_DATE + INTERVAL '30 days';

Determinism

Horizon Epoch tracks whether defaults are deterministic:

Default Type	Deterministic	Notes
`Literal`	Yes	Same value every time
`Null`	Yes	Always NULL
`Function`	No	May return different values
`Sequence`	No	Returns next value
`CurrentTimestamp`	No	Current time
`CurrentDate`	No	Current date
`CurrentUser`	No	Depends on session
`Expression`	Varies	Analyzed at runtime

Storage Backend Support

Different storage backends support constraints at different levels:

Constraint Support Matrix

Backend	Support Level	Primary Key	Foreign Key	Unique	Check	Index
PostgreSQL	Full	Enforced	Enforced	Enforced	Enforced	Enforced
MySQL	Full	Enforced	Enforced	Enforced	Enforced	Enforced
SQL Server	Full	Enforced	Enforced	Enforced	Enforced	Enforced
SQLite	Partial	Enforced	Optional*	Enforced	Enforced	Enforced
S3/Delta Lake	Metadata	Merge key	Metadata	Metadata	Metadata	N/A
Azure Blob	Metadata	Merge key	Metadata	Metadata	Metadata	N/A
GCS	Metadata	Merge key	Metadata	Metadata	Metadata	N/A
Local FS	Metadata	Merge key	Metadata	Metadata	Metadata	N/A

* SQLite requires PRAGMA foreign_keys = ON to enforce foreign keys.

PostgreSQL (Full Support)

PostgreSQL provides ConstraintSupportLevel::Full:

Constraint Type	Support	Enforcement
Primary Key	Full	Enforced via DDL
Foreign Key	Full	Enforced via DDL
Unique	Full	Enforced via DDL
Check	Full	Enforced via DDL
Exclusion	Full	Enforced via DDL
Index	Full	Created via DDL
Enum Type	Full	Created via DDL
Default Value	Full	Applied via DDL

MySQL (Full Support)

MySQL provides ConstraintSupportLevel::Full:

Constraint Type	Support	Notes
Primary Key	Full	Enforced via DDL
Foreign Key	Full	Enforced via DDL (InnoDB only)
Unique	Full	Enforced via DDL
Check	Full	Enforced via DDL (MySQL 8.0.16+)
Exclusion	Not Supported	MySQL-specific alternative: unique constraints
Index	Full	Created via DDL
Enum Type	Full	Native `ENUM` type support
Default Value	Full	Applied via DDL

SQL Server (Full Support)

SQL Server provides ConstraintSupportLevel::Full:

Constraint Type	Support	Notes
Primary Key	Full	Enforced via DDL
Foreign Key	Full	Enforced via DDL
Unique	Full	Enforced via DDL
Check	Full	Enforced via DDL
Exclusion	Not Supported	Use unique constraints or triggers
Index	Full	Created via DDL
Enum Type	Partial	Use `CHECK` constraints for enum-like behavior
Default Value	Full	Applied via DDL

SQLite (Partial Support)

SQLite provides ConstraintSupportLevel::Partial:

Constraint Type	Support	Notes
Primary Key	Full	Enforced via DDL
Foreign Key	Partial	Requires `PRAGMA foreign_keys = ON`
Unique	Full	Enforced via DDL
Check	Full	Enforced via DDL
Exclusion	Not Supported	SQLite-specific; not available
Index	Full	Created via DDL
Enum Type	Metadata	Use `CHECK` constraints for validation
Default Value	Full	Applied via DDL

S3/Delta Lake (Metadata Only)

S3/Delta Lake provides ConstraintSupportLevel::MetadataOnly:

Constraint Type	Support	Notes
Primary Key	Partial	Used as merge key; uniqueness not enforced
NOT NULL	Enforced	Via Arrow schema nullability
Foreign Key	Metadata	Stored for documentation; no enforcement
Unique	Metadata	Stored for documentation; duplicates allowed
Check	Metadata	Stored for documentation; expressions not evaluated
Exclusion	Metadata	PostgreSQL-specific; not enforced
Index	Metadata	Delta Lake handles its own indexing
Enum Type	Partial	Can use Arrow dictionary encoding
Default Value	Metadata	Stored for documentation

Azure Blob Storage (Metadata Only)

Azure Blob Storage provides ConstraintSupportLevel::MetadataOnly:

Constraint Type	Support	Notes
Primary Key	Partial	Used as merge key; uniqueness not enforced
NOT NULL	Enforced	Via Arrow schema nullability
Foreign Key	Metadata	Stored for documentation; no enforcement
Unique	Metadata	Stored for documentation; duplicates allowed
Check	Metadata	Stored for documentation; expressions not evaluated
Exclusion	Metadata	Not applicable to blob storage
Index	Metadata	Not applicable to blob storage
Enum Type	Partial	Can use Arrow dictionary encoding
Default Value	Metadata	Stored for documentation

Google Cloud Storage (Metadata Only)

GCS provides ConstraintSupportLevel::MetadataOnly:

Constraint Type	Support	Notes
Primary Key	Partial	Used as merge key; uniqueness not enforced
NOT NULL	Enforced	Via Arrow schema nullability
Foreign Key	Metadata	Stored for documentation; no enforcement
Unique	Metadata	Stored for documentation; duplicates allowed
Check	Metadata	Stored for documentation; expressions not evaluated
Exclusion	Metadata	Not applicable to object storage
Index	Metadata	Not applicable to object storage
Enum Type	Partial	Can use Arrow dictionary encoding
Default Value	Metadata	Stored for documentation

Local Filesystem (Metadata Only)

Local filesystem provides ConstraintSupportLevel::MetadataOnly:

Constraint Type	Support	Notes
Primary Key	Partial	Used as merge key; uniqueness not enforced
NOT NULL	Enforced	Via Arrow schema nullability
Foreign Key	Metadata	Stored for documentation; no enforcement
Unique	Metadata	Stored for documentation; duplicates allowed
Check	Metadata	Stored for documentation; expressions not evaluated
Exclusion	Metadata	Not applicable to file storage
Index	Metadata	Parquet/Delta handles its own indexing
Enum Type	Partial	Can use Arrow dictionary encoding
Default Value	Metadata	Stored for documentation

ConstraintSupportLevel Enum

#![allow(unused)]
fn main() {
pub enum ConstraintSupportLevel {
    Full,          // All constraints enforced (PostgreSQL)
    MetadataOnly,  // Constraints stored but not enforced (S3)
    Partial,       // Some constraints work, others don't
    None,          // No constraint support
}

impl ConstraintSupportLevel {
    pub fn can_enforce(&self) -> bool;      // True only for Full
    pub fn can_store(&self) -> bool;        // True for Full, MetadataOnly, Partial
    pub fn can_execute_ddl(&self) -> bool;  // True only for Full
}
}

Cross-Backend Considerations

When working with mixed storage backends:

Constraints are always stored in metadata - Even if not enforced, constraints are tracked
DDL generation targets PostgreSQL - Use PostgresDdlGenerator for migrations
Merge conflicts work everywhere - Constraint conflicts are detected regardless of enforcement
Validation is backend-specific - Check can_enforce() before relying on constraint enforcement

TableConstraints Container

All constraints for a table are grouped in TableConstraints:

#![allow(unused)]
fn main() {
pub struct TableConstraints {
    pub foreign_keys: Vec<ForeignKeyConstraint>,
    pub unique_constraints: Vec<UniqueConstraint>,
    pub check_constraints: Vec<CheckConstraint>,
    pub exclusion_constraints: Vec<ExclusionConstraint>,
    pub indexes: Vec<IndexDefinition>,
    pub enum_types: Vec<EnumType>,
}
}

Key Methods

#![allow(unused)]
fn main() {
// Get all constraints as enum variants
let all: Vec<Constraint> = constraints.all_constraints();

// Find constraint by name
let fk: Option<&ForeignKeyConstraint> = constraints.find_foreign_key("fk_orders_customer");

// Get constraints affecting a column
let affecting: Vec<&Constraint> = constraints.constraints_on_column("customer_id");

// Validate all constraints
let errors: Vec<String> = constraints.validate();
}

Architecture

System Overview

┌─────────────────────────────────────────────────────────────┐
│                     User Interfaces                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │    CLI      │  │ Python SDK  │  │    REST API         │  │
│  └──────┬──────┘  └──────┬──────┘  └──────────┬──────────┘  │
└─────────┼────────────────┼─────────────────────┼────────────┘
          │                │                     │
          └────────────────┼─────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│                     Rust Core Engine                         │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Version Control Operations                            │ │
│  │  branch, commit, merge, diff, promote                  │ │
│  └────────────────────────────────────────────────────────┘ │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Metadata Layer                                        │ │
│  │  commit graph, branch pointers, table versions         │ │
│  └────────────────────────────────────────────────────────┘ │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Storage Abstraction                                   │ │
│  │  unified interface for all backends                    │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────┬───────────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────────┐
│                    Storage Backends                          │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐  │
│  │ PostgreSQL │ │   MySQL    │ │ SQL Server │ │  SQLite  │  │
│  └────────────┘ └────────────┘ └────────────┘ └──────────┘  │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐  │
│  │  AWS S3    │ │Azure Blob  │ │    GCS     │ │  Local   │  │
│  └────────────┘ └────────────┘ └────────────┘ └──────────┘  │
└─────────────────────────────────────────────────────────────┘

Design Principles

1. Git Semantics

If you know Git, you know Horizon Epoch. We use the same conceptual model:

Repositories contain versioned data
Branches are independent lines of development
Commits are immutable snapshots
Merges combine branches with conflict detection

2. Storage Agnostic

Work with data where it lives:

No data migration required
Each storage backend has an adapter
Same operations work across all backends
Mix and match storage types

3. Zero-Copy Branching

Creating a branch is instant:

No data is duplicated
Branch is just a pointer to a commit
Changes are tracked incrementally
Storage is only used for actual changes

4. Record-Level Tracking

Granular change detection:

Track individual record changes (not just files)
Field-level conflict detection
Efficient storage of deltas
Precise merge resolution

5. Metadata Separation

Version control metadata is separate from data:

Metadata stored in dedicated PostgreSQL database
Data stays in original storage
No modification to existing data infrastructure
Easy to add/remove without affecting data

Component Details

Rust Core Engine

The heart of Horizon Epoch, written in Rust for:

Performance and memory safety
Concurrent operations
Cross-platform compatibility

Key modules:

operations/ - Branch, commit, merge, diff logic
metadata/ - Commit graph, branch management
storage/ - Storage adapter traits and implementations

Metadata Layer

Stores all versioning information:

┌─────────────────────────────────────────┐
│           Metadata Database              │
│  ┌─────────────┐  ┌─────────────────┐   │
│  │ Repositories│  │ Branches        │   │
│  └─────────────┘  └─────────────────┘   │
│  ┌─────────────┐  ┌─────────────────┐   │
│  │ Commits     │  │ Tables          │   │
│  └─────────────┘  └─────────────────┘   │
│  ┌─────────────┐  ┌─────────────────┐   │
│  │ Tags        │  │ Change Tracking │   │
│  └─────────────┘  └─────────────────┘   │
└─────────────────────────────────────────┘

Storage Adapters

Each adapter implements a common trait:

#![allow(unused)]
fn main() {
trait StorageAdapter {
    fn read_records(&self, table: &str, keys: &[RecordKey]) -> Result<Vec<Record>>;
    fn write_records(&self, table: &str, records: &[Record]) -> Result<()>;
    fn delete_records(&self, table: &str, keys: &[RecordKey]) -> Result<()>;
    fn get_schema(&self, table: &str) -> Result<Schema>;
    fn list_tables(&self) -> Result<Vec<String>>;
    // ... more methods
}
}

Python Bindings (PyO3)

Python SDK wraps Rust core via PyO3:

Zero-copy data exchange where possible
Async support via tokio
Pythonic API design
Type hints for IDE support

REST API (FastAPI)

HTTP API for external integrations:

OpenAPI documentation
Authentication/authorization
Rate limiting
Webhook support

CLI

Command-line interface for developers:

Git-like commands
Interactive prompts
JSON output option
Shell completion

Data Flow

Commit Operation

1. User: epoch commit -m "Update users"
         │
2. CLI:  Parse command, validate
         │
3. Core: Identify changed records
         │
4. Core: Create commit metadata
         │
5. Core: Store change delta
         │
6. Meta: Update branch pointer
         │
7. CLI:  Return success + commit ID

Merge Operation

1. User: epoch merge feature/branch
         │
2. Core: Find merge base (common ancestor)
         │
3. Core: Compute changes from base to source
         │
4. Core: Compute changes from base to target
         │
5. Core: Identify conflicts
         │
    ┌────┴────┐
    │         │
 No conflicts  Has conflicts
    │         │
    ▼         ▼
6a. Apply    6b. Report
    changes       conflicts
    │         │
7a. Create   7b. Wait for
    merge         resolution
    commit

Query Operation

1. User: SELECT * FROM users WHERE id = 42
         │
2. Core: Determine current branch
         │
3. Core: Check branch overlay for record
         │
    ┌────┴────┐
    │         │
 Found in    Not in
 overlay     overlay
    │         │
    ▼         ▼
4a. Return  4b. Query
    overlay      base data
    record       │
         ┌──────┴──────┐
         │             │
      Found         Not found
         │             │
         ▼             ▼
      Return        Return
      record        empty

Scalability Considerations

Horizontal Scaling

Metadata database can be replicated
Storage backends scale independently
API servers are stateless

Performance Optimization

Connection pooling for databases
Lazy loading of large datasets
Caching of frequently accessed data
Batch operations for bulk changes

Large Repository Support

Incremental operations
Streaming for large diffs
Pagination for history
Sparse checkout support

Security Model

Authentication

Multiple auth methods supported
Credential providers for secrets
Token-based API access

Authorization

Repository-level permissions
Branch protection rules
Audit logging

Data Protection

TLS for all connections
Encryption at rest (via storage backends)
Credential encryption

Future Architecture

Planned enhancements:

Distributed metadata for global deployments
Real-time collaboration features
Advanced caching layer
Plugin system for custom adapters

Storage Adapters

Storage adapters enable Horizon Epoch to work with different data storage systems while providing a unified interface for version control operations.

Supported Backends

Horizon Epoch supports 8 storage backends:

Backend	Status	Constraint Support	Use Case
PostgreSQL	Production	Full	Transactional databases
MySQL	Production	Full	Transactional databases
Microsoft SQL Server	Production	Full	Enterprise databases
SQLite	Production	Partial	Local/embedded databases
S3/Delta Lake	Production	Metadata Only	Data lakes, analytics
Azure Blob Storage	Production	Metadata Only	Cloud data lakes
Google Cloud Storage	Production	Metadata Only	Cloud data lakes
Local Filesystem	Production	Metadata Only	Development, testing

Adapter Architecture

┌─────────────────────────────────────────────────────────────┐
│                  Storage Abstraction Layer                   │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │              StorageAdapter Trait                       │ │
│  │  - read_records()    - write_records()                 │ │
│  │  - delete_records()  - get_schema()                    │ │
│  │  - list_tables()     - scan_table()                    │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
        │              │              │              │
        ▼              ▼              ▼              ▼
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ PostgreSQL │ │   MySQL    │ │    MSSQL   │ │   SQLite   │
│  Adapter   │ │  Adapter   │ │   Adapter  │ │  Adapter   │
└────────────┘ └────────────┘ └────────────┘ └────────────┘
        │              │              │              │
        ▼              ▼              ▼              ▼
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ S3/Delta   │ │   Azure    │ │    GCS     │ │  LocalFS   │
│  Adapter   │ │   Adapter  │ │  Adapter   │ │  Adapter   │
└────────────┘ └────────────┘ └────────────┘ └────────────┘

Common Interface

All storage adapters implement these core operations:

Record Operations

#![allow(unused)]
fn main() {
// Read specific records by primary key
fn read_records(
    &self,
    table: &str,
    keys: &[RecordKey],
    branch_context: &BranchContext
) -> Result<Vec<Record>>;

// Write records (insert or update)
fn write_records(
    &self,
    table: &str,
    records: &[Record],
    branch_context: &BranchContext
) -> Result<WriteResult>;

// Delete records by primary key
fn delete_records(
    &self,
    table: &str,
    keys: &[RecordKey],
    branch_context: &BranchContext
) -> Result<DeleteResult>;
}

Schema Operations

#![allow(unused)]
fn main() {
// Get table schema
fn get_schema(&self, table: &str) -> Result<Schema>;

// Detect schema changes
fn compare_schema(
    &self,
    table: &str,
    old_schema: &Schema
) -> Result<SchemaChanges>;
}

Table Operations

#![allow(unused)]
fn main() {
// List available tables
fn list_tables(&self) -> Result<Vec<TableInfo>>;

// Scan entire table (for initial import or full sync)
fn scan_table(
    &self,
    table: &str,
    options: ScanOptions
) -> Result<RecordStream>;

// Get table statistics
fn get_table_stats(&self, table: &str) -> Result<TableStats>;
}

Relational Database Adapters

PostgreSQL Adapter

Capabilities:

Direct SQL execution
ACID transactions
Full schema introspection
Constraint enforcement (foreign keys, unique, check, exclusion)
Connection pooling
TLS/SSL secure connections

Configuration:

[storage.postgresql.mydb]
url = "postgresql://user:pass@host/database"
pool_size = 10
connect_timeout = 30
statement_timeout = 60000  # 60 seconds
ssl_mode = "require"       # disable, prefer, require

MySQL Adapter

Capabilities:

Full CRUD operations
Schema introspection via information_schema
Full constraint support
5 SSL/TLS modes
SSH tunnel support
Connection pooling

Configuration:

[storage.mysql.mydb]
url = "mysql://user:pass@host/database"
pool_size = 10
ssl_mode = "required"  # disabled, preferred, required, verify_ca, verify_identity

Microsoft SQL Server Adapter

Capabilities:

Full CRUD operations
Schema introspection via sys.* catalog
Full constraint support
TLS/SSL encryption (3 levels)
Windows and SQL Server authentication

Configuration:

[storage.mssql.mydb]
url = "mssql://user:pass@host/database"
pool_size = 10
encrypt = "required"  # off, on, required
trust_server_certificate = false

SQLite Adapter

Capabilities:

File-based and in-memory databases
Schema introspection via PRAGMA
Partial constraint support (foreign keys require PRAGMA foreign_keys = ON)
WAL journal mode for concurrency
Connection pooling with health checks

Configuration:

[storage.sqlite.local]
path = "/path/to/database.db"
# Or use in-memory:
# path = ":memory:"
journal_mode = "wal"
foreign_keys = true

Object Storage Adapters

All object storage adapters use Delta Lake format for transactional semantics.

S3/Delta Lake Adapter

Capabilities:

Delta Lake protocol support
Parquet file format
Time travel via Delta versioning
Schema evolution
Efficient columnar storage
AWS S3 and S3-compatible (MinIO, etc.)

Configuration:

[storage.s3.datalake]
bucket = "company-datalake"
region = "us-east-1"
prefix = "horizon-epoch/"
endpoint = "https://s3.amazonaws.com"  # Optional, for S3-compatible
delta_log_retention_days = 30

Azure Blob Storage Adapter

Capabilities:

Delta Lake format on Azure Blob
Multiple authentication methods:
- Account key
- SAS token
- Service principal
- Managed identity
Copy-on-write semantics

Configuration:

[storage.azure.datalake]
account = "mystorageaccount"
container = "data"
prefix = "horizon-epoch/"
auth_method = "account_key"  # account_key, sas_token, service_principal, managed_identity

Google Cloud Storage Adapter

Capabilities:

Delta Lake format on GCS
Authentication:
- Service account key
- Application Default Credentials (ADC)
Copy-on-write semantics

Configuration:

[storage.gcs.datalake]
bucket = "my-gcs-bucket"
prefix = "horizon-epoch/"
project_id = "my-project"
auth_method = "service_account"  # service_account, adc

Local Filesystem Adapter

Capabilities:

Delta Lake format on local disk
SQL query execution via DataFusion
Ideal for development and testing
Copy-on-write semantics

Configuration:

[storage.local.dev]
path = "/data/horizon-epoch"

Constraint Support Levels

Level	Description	Backends
Full	Constraints enforced via DDL	PostgreSQL, MySQL, MSSQL
Partial	Some constraints require config	SQLite
Metadata Only	Stored but not enforced	S3, Azure, GCS, LocalFS

For object storage backends, constraints are versioned in metadata but enforcement happens at the application layer or when promoting to a relational database.

Adapter Selection

Horizon Epoch selects adapters based on table location URL scheme:

# PostgreSQL table
client.track_table("users", StorageLocation.postgresql("main", "public", "users"))

# MySQL table
client.track_table("orders", StorageLocation.mysql("main", "mydb", "orders"))

# S3 Delta table
client.track_table("events", StorageLocation.s3("bucket", "delta/events"))

# Local filesystem
client.track_table("test_data", StorageLocation.local("/data/test"))

Performance Characteristics

Operation	PostgreSQL	MySQL	MSSQL	SQLite	S3/Delta	Azure/GCS
Point read	Fast	Fast	Fast	Fast	Medium	Medium
Range scan	Fast	Fast	Fast	Fast	Fast	Fast
Write single	Fast	Fast	Fast	Fast	Slow	Slow
Write batch	Fast	Fast	Fast	Medium	Fast	Fast
Branch create	Instant	Instant	Instant	Instant	Instant	Instant
Full scan	Depends	Depends	Depends	Fast	Very Fast	Very Fast

Best Practices

Relational Databases (PostgreSQL, MySQL, MSSQL)

Ensure proper indexes on primary keys and commonly queried columns
Use connection pooling for high-throughput workloads
Configure appropriate timeouts for long-running operations
Monitor overlay table size and compact periodically

SQLite

Use WAL mode for better concurrency (journal_mode = "wal")
Enable foreign keys explicitly if needed (foreign_keys = true)
Use file-based for persistence, in-memory for testing

Object Storage (S3, Azure, GCS, Local)

Use appropriate file sizes (target 100MB-1GB per file)
Partition large tables for efficient queries
Enable predicate pushdown for filtered scans
Run OPTIMIZE periodically for better read performance
Configure retention for Delta log cleanup

Troubleshooting

Connection Issues

PostgreSQL/MySQL/MSSQL:

Verify network connectivity and firewall rules
Check SSL/TLS configuration matches server requirements
Ensure connection pool size is appropriate for workload

Object Storage:

Verify credentials and permissions
Check bucket/container exists and is accessible
Ensure network allows access to storage endpoints

Performance Issues

Slow Queries:

Add indexes for overlay joins (relational)
Analyze query plans
Consider materializing branches for read-heavy workloads

Slow Writes:

Batch writes together (especially for object storage)
Check network latency to storage
Use regional endpoints for cloud storage

Copy-on-Write

Copy-on-Write (CoW) is the fundamental mechanism that enables Horizon Epoch to create instant branches and track changes efficiently.

Overview

When you create a branch, no data is copied. Instead:

The branch is created as a pointer to the current commit
When you modify a record, only that record is copied
Queries combine the original data with branch-specific changes

How It Works

Initial State

main branch:
┌─────────────────────────────────────────┐
│              users table                 │
│  id │ name    │ email                   │
│─────┼─────────┼─────────────────────────│
│  1  │ Alice   │ alice@example.com       │
│  2  │ Bob     │ bob@example.com         │
│  3  │ Charlie │ charlie@example.com     │
└─────────────────────────────────────────┘

Branch Creation (Zero-Copy)

epoch branch create feature/update-emails

                     ┌─── feature/update-emails
                     │    (pointer to same data)
                     ▼
main ───────────────►┌─────────────────────────────────────────┐
                     │              users table                 │
                     │  (shared, no copy)                      │
                     └─────────────────────────────────────────┘

No data is copied. Both branches point to the same underlying data.

First Modification (Copy-on-Write)

# On feature/update-emails branch
client.execute("UPDATE users SET email = 'alice@company.com' WHERE id = 1")

Now the modified record is stored separately:

main branch:                    feature/update-emails:
┌───────────────────────┐       ┌─────────────────────────────┐
│     users (base)      │       │     users (overlay)         │
│  id │ name  │ email   │       │  id │ name  │ email         │
│─────┼───────┼─────────│       │─────┼───────┼───────────────│
│  1  │ Alice │ alice@  │       │  1  │ Alice │ alice@company │
│  2  │ Bob   │ bob@    │       └─────────────────────────────┘
│  3  │ Charl │ charlie@│       Only modified records stored
└───────────────────────┘

Querying a Branch

When you query the feature branch:

SELECT * FROM users;

Horizon Epoch:

Checks the branch overlay for each record
Returns overlay version if found
Returns base version otherwise

Result on feature/update-emails:

│  1  │ Alice   │ alice@company.com    │  ← from overlay
│  2  │ Bob     │ bob@example.com      │  ← from base
│  3  │ Charlie │ charlie@example.com  │  ← from base

Storage Layout: Before and After Modifications

Understanding how data is physically stored helps you reason about performance and storage costs.

PostgreSQL Implementation

Horizon Epoch uses version tables with a naming convention {table}__v_{version_id}:

State 0: Initial Setup (Main Branch Only)

┌─────────────────────────────────────────────────────────────────────────┐
│  Database: my_database                                                   │
│                                                                          │
│  users (base table)                                                      │
│  ┌──────┬─────────┬─────────────────────┬──────────┐                    │
│  │  id  │  name   │       email         │  status  │                    │
│  ├──────┼─────────┼─────────────────────┼──────────┤                    │
│  │  1   │ Alice   │ alice@example.com   │ active   │                    │
│  │  2   │ Bob     │ bob@example.com     │ active   │                    │
│  │  3   │ Charlie │ charlie@example.com │ inactive │                    │
│  └──────┴─────────┴─────────────────────┴──────────┘                    │
│                                                                          │
│  Total storage: ~3 records                                               │
└─────────────────────────────────────────────────────────────────────────┘

State 1: Branch Created (Zero-Copy)

epoch branch create feature/update-users

┌─────────────────────────────────────────────────────────────────────────┐
│  Database: my_database                                                   │
│                                                                          │
│  users (base table)                        users__v_feature-update-users │
│  ┌──────┬─────────┬─────────────────────┐  ┌───────────────────────────┐│
│  │  id  │  name   │       email         │  │      (empty table)        ││
│  ├──────┼─────────┼─────────────────────┤  │                           ││
│  │  1   │ Alice   │ alice@example.com   │  │  Created with same schema ││
│  │  2   │ Bob     │ bob@example.com     │  │  using LIKE ... INCLUDING ││
│  │  3   │ Charlie │ charlie@example.com │  │  ALL (copies constraints) ││
│  └──────┴─────────┴─────────────────────┘  └───────────────────────────┘│
│                                                                          │
│  Total storage: ~3 records (0 additional for branch)                     │
│  Branch creation time: O(1) - just creates empty table                   │
└─────────────────────────────────────────────────────────────────────────┘

State 2: First Modification (Copy-on-Write Triggers)

# On feature branch: Update Alice's email
client.execute("UPDATE users SET email = 'alice@company.com' WHERE id = 1")

┌─────────────────────────────────────────────────────────────────────────┐
│  Database: my_database                                                   │
│                                                                          │
│  users (base table)                        users__v_feature-update-users │
│  ┌──────┬─────────┬─────────────────────┐  ┌──────┬───────┬────────────┐│
│  │  id  │  name   │       email         │  │  id  │ name  │   email    ││
│  ├──────┼─────────┼─────────────────────┤  ├──────┼───────┼────────────┤│
│  │  1   │ Alice   │ alice@example.com   │  │  1   │ Alice │ alice@co.. ││
│  │  2   │ Bob     │ bob@example.com     │  └──────┴───────┴────────────┘│
│  │  3   │ Charlie │ charlie@example.com │                                │
│  └──────┴─────────┴─────────────────────┘  Only 1 record copied!        │
│                                                                          │
│  Total storage: ~4 records (1 additional for branch)                     │
│  Base table: UNCHANGED                                                   │
└─────────────────────────────────────────────────────────────────────────┘

State 3: Multiple Modifications

# More changes on feature branch
client.execute("UPDATE users SET status = 'vip' WHERE id = 1")  # Updates overlay
client.execute("INSERT INTO users VALUES (4, 'Dana', 'dana@company.com', 'active')")
client.execute("DELETE FROM users WHERE id = 2")  # Marks as deleted

┌─────────────────────────────────────────────────────────────────────────┐
│  Database: my_database                                                   │
│                                                                          │
│  users (base table)                        users__v_feature-update-users │
│  ┌──────┬─────────┬─────────────────────┐  ┌──────┬───────┬────────────┐│
│  │  id  │  name   │       email         │  │  id  │ name  │   status   ││
│  ├──────┼─────────┼─────────────────────┤  ├──────┼───────┼────────────┤│
│  │  1   │ Alice   │ alice@example.com   │  │  1   │ Alice │ vip (mod)  ││
│  │  2   │ Bob     │ bob@example.com     │  │  4   │ Dana  │ active(new)││
│  │  3   │ Charlie │ charlie@example.com │  │  2   │ NULL  │ _DELETED_  ││
│  └──────┴─────────┴─────────────────────┘  └──────┴───────┴────────────┘│
│                                                                          │
│  Total storage: ~6 records (3 additional for branch)                     │
│  Main branch queries: See only original 3 records                        │
│  Feature branch queries: See Alice(mod) + Charlie + Dana = 3 records     │
└─────────────────────────────────────────────────────────────────────────┘

S3/Delta Lake Implementation

Delta Lake uses a similar pattern with separate directories:

State 0: Initial Table

s3://horizon-data/
└── repositories/my-repo/
    └── tables/
        └── users/                          # Base table (main branch)
            ├── _delta_log/
            │   └── 00000000000000000000.json   # Transaction log
            ├── part-00000-abc123.parquet       # Data file 1
            └── part-00001-def456.parquet       # Data file 2

Total: ~100MB (all records)

State 1: Branch Created

s3://horizon-data/
└── repositories/my-repo/
    └── tables/
        ├── users/                          # Base table (unchanged)
        │   ├── _delta_log/
        │   └── *.parquet
        │
        └── users__v_feature-xyz/           # Version table (empty)
            └── _delta_log/
                └── 00000000000000000000.json   # Schema only, no data files

Additional storage: ~1KB (just metadata)

State 2: After Modifications

s3://horizon-data/
└── repositories/my-repo/
    └── tables/
        ├── users/                          # Base table (unchanged: 100MB)
        │   ├── _delta_log/
        │   └── *.parquet
        │
        └── users__v_feature-xyz/           # Version table
            ├── _delta_log/
            │   ├── 00000000000000000000.json   # Initial schema
            │   └── 00000000000000000001.json   # Insert/update transaction
            └── part-00000-xyz789.parquet       # Only modified records (~1MB)

Additional storage: ~1MB (only changed records)

Query Overlay Mechanism

The query overlay mechanism is the heart of CoW - it makes branched data appear seamless to applications.

The UNION ALL + NOT EXISTS Pattern

Horizon Epoch uses a highly optimized SQL pattern that leverages PostgreSQL’s query planner:

-- Overlay query for feature branch
SELECT * FROM schema.users__v_feature_branch
UNION ALL
SELECT b.* FROM schema.users b
WHERE NOT EXISTS (
    SELECT 1 FROM schema.users__v_feature_branch v
    WHERE b.id = v.id  -- Primary key match
)

How it works:

First SELECT: Returns ALL records from the version table (modified/new records)
UNION ALL: Combines without deduplication (faster than UNION)
Second SELECT: Returns base records WHERE they DON’T exist in version table
NOT EXISTS: Anti-join pattern - highly optimized in PostgreSQL

Query Flow Diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                        SELECT * FROM users                               │
│                        (on feature branch)                               │
└─────────────────────────────────┬───────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  Horizon Epoch Query Rewriter                                            │
│  ┌────────────────────────────────────────────────────────────────────┐ │
│  │  1. Detect current branch context                                  │ │
│  │  2. Identify version table: users__v_feature_branch                │ │
│  │  3. Rewrite query with UNION ALL pattern                           │ │
│  └────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────┬───────────────────────────────────────┘
                                  │
                    ┌─────────────┴─────────────┐
                    │                           │
                    ▼                           ▼
┌──────────────────────────────┐  ┌──────────────────────────────────────┐
│   Version Table Scan         │  │   Base Table Scan with Anti-Join     │
│   users__v_feature_branch    │  │   users WHERE NOT EXISTS (version)   │
│                              │  │                                      │
│   Returns: Modified records  │  │   Returns: Unmodified records        │
│   (Alice with new email)     │  │   (Bob, Charlie - unchanged)         │
└──────────────┬───────────────┘  └──────────────────┬───────────────────┘
               │                                      │
               └──────────────┬───────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                        UNION ALL Result                                  │
│  ┌──────┬─────────┬─────────────────────┐                               │
│  │  id  │  name   │       email         │                               │
│  ├──────┼─────────┼─────────────────────┤                               │
│  │  1   │ Alice   │ alice@company.com   │  ← from version table         │
│  │  2   │ Bob     │ bob@example.com     │  ← from base (not in version) │
│  │  3   │ Charlie │ charlie@example.com │  ← from base (not in version) │
│  └──────┴─────────┴─────────────────────┘                               │
└─────────────────────────────────────────────────────────────────────────┘

Composite Primary Key Handling

For tables with composite primary keys, the pattern adapts:

-- Table with composite PK (order_id, line_item_id)
SELECT * FROM schema.order_lines__v_feature_branch
UNION ALL
SELECT b.* FROM schema.order_lines b
WHERE NOT EXISTS (
    SELECT 1 FROM schema.order_lines__v_feature_branch v
    WHERE b.order_id = v.order_id
      AND b.line_item_id = v.line_item_id
)

Filtered Queries

When you add a WHERE clause, it’s applied to both parts:

-- Original query
SELECT * FROM users WHERE status = 'active'

-- Rewritten overlay query
SELECT * FROM schema.users__v_feature_branch
WHERE status = 'active'
UNION ALL
SELECT b.* FROM schema.users b
WHERE b.status = 'active'
  AND NOT EXISTS (
    SELECT 1 FROM schema.users__v_feature_branch v
    WHERE b.id = v.id
)

Query Optimization Details

Point Lookups (O(1))

SELECT * FROM users WHERE id = 1

Optimized path:

Check version table first (indexed lookup)
If found → return immediately
If not found → check base table
Return result

-- Horizon Epoch generates:
SELECT * FROM schema.users__v_feature_branch WHERE id = 1
UNION ALL
SELECT * FROM schema.users
WHERE id = 1
  AND NOT EXISTS (
    SELECT 1 FROM schema.users__v_feature_branch WHERE id = 1
)

With proper indexing, this is two index lookups - very fast.

Full Table Scans

SELECT * FROM users

Performance depends on overlay size:

Empty overlay: ~0% overhead (NOT EXISTS returns quickly)
Small overlay (< 1% of base): ~5-10% overhead
Large overlay (> 10% of base): Consider materialization

Aggregations

SELECT status, COUNT(*) FROM users GROUP BY status

Aggregations work correctly across the union:

SELECT status, COUNT(*) FROM (
    SELECT status FROM schema.users__v_feature_branch
    UNION ALL
    SELECT b.status FROM schema.users b
    WHERE NOT EXISTS (...)
) combined
GROUP BY status

S3/Delta Lake Query Pattern

For Delta Lake tables, Horizon Epoch uses DataFusion SQL engine:

-- Register tables in DataFusion context
CREATE EXTERNAL TABLE base STORED AS DELTA LOCATION 's3://bucket/users/'
CREATE EXTERNAL TABLE branch STORED AS DELTA LOCATION 's3://bucket/users__v_feature/'

-- Query with overlay
SELECT * FROM branch
UNION ALL
SELECT * FROM base
WHERE id NOT IN (SELECT id FROM branch)

Performance Benchmarks

Scenario	Base Query	Overlay Query	Overhead
Point lookup (indexed)	0.5ms	0.6ms	+20%
Range scan (1000 rows)	5ms	5.5ms	+10%
Full scan (1M rows, 1K overlay)	500ms	520ms	+4%
Full scan (1M rows, 100K overlay)	500ms	600ms	+20%
Aggregation (1M rows)	200ms	220ms	+10%

Key insight: Overhead is proportional to overlay size, not base table size.

Index Recommendations

For optimal overlay query performance:

-- Version tables automatically inherit indexes via LIKE ... INCLUDING ALL
-- But you may want to ensure these exist:

-- Primary key index (automatic)
CREATE INDEX ON users__v_feature_branch (id);

-- Covering indexes for common queries
CREATE INDEX ON users__v_feature_branch (status) INCLUDE (name, email);

Benefits

1. Instant Branch Creation

Creating a branch takes constant time regardless of data size:

1 row table: ~1ms
1 billion row table: ~1ms

2. Storage Efficiency

Only store what changes:

Base: 1TB
Branch with 1000 changes: ~1MB additional

3. Fast Merging

Merge only examines changed records:

Base: 1 billion rows
Changes: 1000 rows
Merge compares: 1000 rows (not 1 billion)

4. Time Travel

Every commit preserves the overlay state:

Query any historical version
No additional storage for old versions
Instant access to history

Considerations

Query Performance

Overlaid queries have slight overhead:

Must check overlay before/after base
Overhead is proportional to overlay size
Negligible for typical branch sizes

For large overlays:

# Materialize long-lived branches
client.materialize_branch("long-running-branch")

Garbage Collection

Old overlays must be cleaned up:

# Merged branches can have overlays removed
client.gc_branch("merged-feature")

# Or automatic cleanup
config = Config(
    gc_merged_branches=True,
    gc_retention_days=7
)

Deep Branch Hierarchies

main → dev → feature → sub-feature → experiment

Each level adds overlay resolution. Keep hierarchies shallow.

Comparison with Full Copy

Aspect	Copy-on-Write	Full Copy
Branch creation	O(1)	O(n)
Storage per branch	O(changes)	O(data)
Query (point)	O(1)	O(1)
Query (scan)	O(n + overlay)	O(n)
Merge	O(changes)	O(n)

Best Practices

Keep branches short-lived - Smaller overlays are faster
Merge frequently - Reduces overlay accumulation
Monitor overlay size - Large overlays may need materialization
Use appropriate indexes - Index overlay tables for performance
Clean up merged branches - Run garbage collection regularly

Merge Algorithm

Horizon Epoch uses a three-way merge algorithm similar to Git, extended for structured data with field-level conflict detection.

Overview

Three-way merge compares:

Base - The common ancestor (merge base)
Ours - The target branch (where we’re merging into)
Theirs - The source branch (what we’re merging)

Visual Overview

                    ┌─────────────────────────────────────────────────────────┐
                    │                  THREE-WAY MERGE                         │
                    └─────────────────────────────────────────────────────────┘

    Source Branch                      Target Branch
    (feature)                          (main)
         │                                  │
         ▼                                  ▼
    ┌─────────┐                        ┌─────────┐
    │Commit E │                        │Commit G │
    │ users:  │                        │ users:  │
    │ Alice→  │                        │ Alice→  │
    │ Smith   │                        │ alice@  │
    └────┬────┘                        │ company │
         │                             └────┬────┘
         │                                  │
         │      ┌──────────────────┐        │
         └──────►    MERGE BASE    ◄────────┘
                │    Commit B      │
                │    users:        │
                │    Alice         │
                │    alice@example │
                └────────┬─────────┘
                         │
                         ▼
                ┌────────────────────┐
                │  MERGE RESULT      │
                │  users:            │
                │  Alice Smith       │  ← from source (name changed)
                │  alice@company.com │  ← from target (email changed)
                └────────────────────┘

Finding the Merge Base

Commit Graph

        C---D---E  (feature)
       /
  A---B---F---G    (main)

The merge base for merging feature into main is commit B - the most recent common ancestor.

Algorithm

def find_merge_base(source_commit, target_commit):
    """Find the lowest common ancestor in the commit DAG."""
    source_ancestors = get_all_ancestors(source_commit)

    # BFS from target to find first intersection
    queue = [target_commit]
    while queue:
        commit = queue.pop(0)
        if commit in source_ancestors:
            return commit
        queue.extend(commit.parents)

    return None  # No common ancestor (shouldn't happen)

Content Hash Optimization

Before performing record-level comparison, Horizon Epoch uses content hashes for quick diff detection:

def should_merge_table(base_version, source_version, target_version):
    """Use content hashes to skip unchanged tables."""
    base_hash = base_version.content_hash
    source_hash = source_version.content_hash
    target_hash = target_version.content_hash

    if source_hash == target_hash:
        # Both branches have identical data - no merge needed
        return MergeAction.SKIP

    if source_hash == base_hash:
        # Only target changed - keep target (fast-forward)
        return MergeAction.KEEP_TARGET

    if target_hash == base_hash:
        # Only source changed - take source
        return MergeAction.TAKE_SOURCE

    # Both changed differently - full record merge required
    return MergeAction.RECORD_LEVEL_MERGE

Three-Way Merge Process

Step 1: Compute Changes

For each table, compute:

Changes from base to ours (ours_changes)
Changes from base to theirs (theirs_changes)

def compute_changes(base_state, current_state):
    changes = {}

    for record in current_state:
        base_record = base_state.get(record.key)

        if base_record is None:
            changes[record.key] = Change(type=INSERT, record=record)
        elif base_record != record:
            changes[record.key] = Change(type=UPDATE,
                                         base=base_record,
                                         current=record)

    for key in base_state:
        if key not in current_state:
            changes[key] = Change(type=DELETE, base=base_state[key])

    return changes

Step 2: Classify Changes

For each changed record:

Ours	Theirs	Result
No change	Changed	Accept theirs
Changed	No change	Keep ours
Same change	Same change	Keep either (identical)
Different changes	Different changes	CONFLICT
Deleted	Changed	CONFLICT
Changed	Deleted	CONFLICT
Deleted	Deleted	Keep deleted

Step 3: Apply Non-Conflicting Changes

def apply_changes(base, ours_changes, theirs_changes):
    result = base.copy()
    conflicts = []

    for key in set(ours_changes.keys()) | set(theirs_changes.keys()):
        ours = ours_changes.get(key)
        theirs = theirs_changes.get(key)

        if ours is None:
            # Only theirs changed
            result.apply(theirs)
        elif theirs is None:
            # Only ours changed - already in result
            pass
        elif ours == theirs:
            # Same change - already in result
            pass
        else:
            # Potential conflict - check field level
            conflict = check_field_conflict(key, ours, theirs)
            if conflict:
                conflicts.append(conflict)
            else:
                result.apply(merge_changes(ours, theirs))

    return result, conflicts

Field-Level Conflict Detection

Unlike file-based version control, Horizon Epoch detects conflicts at the field level. This is one of the most powerful features - it allows more changes to merge automatically.

How Field-Level Merge Works

┌─────────────────────────────────────────────────────────────────────────────┐
│                    FIELD-LEVEL MERGE ALGORITHM                               │
└─────────────────────────────────────────────────────────────────────────────┘

For each field in a record where both branches made changes:

┌──────────────────┬──────────────────┬──────────────────┬────────────────────┐
│   Ancestor (A)   │   Source (S)     │   Target (T)     │   Result           │
├──────────────────┼──────────────────┼──────────────────┼────────────────────┤
│   value_a        │   value_a        │   value_a        │   value_a (same)   │
│   value_a        │   value_s        │   value_a        │   value_s (source) │
│   value_a        │   value_a        │   value_t        │   value_t (target) │
│   value_a        │   value_s        │   value_s        │   value_s (same)   │
│   value_a        │   value_s        │   value_t        │   ⚠️ CONFLICT      │
│   NULL           │   value_s        │   NULL           │   value_s (added)  │
│   NULL           │   NULL           │   value_t        │   value_t (added)  │
│   NULL           │   value_s        │   value_t        │   ⚠️ CONFLICT      │
│   value_a        │   NULL           │   value_a        │   NULL (deleted)   │
│   value_a        │   value_a        │   NULL           │   NULL (deleted)   │
│   value_a        │   NULL           │   value_t        │   ⚠️ CONFLICT      │
└──────────────────┴──────────────────┴──────────────────┴────────────────────┘

Implementation

def field_level_merge(ancestor, source, target, strategy):
    """
    Merge two records field-by-field using three-way merge.
    Returns merged record or conflict information.
    """
    result = {}
    conflicting_fields = []

    # Get all fields across all three versions
    all_fields = set(ancestor.keys()) | set(source.keys()) | set(target.keys())

    for field in all_fields:
        ancestor_val = ancestor.get(field)
        source_val = source.get(field)
        target_val = target.get(field)

        # Case 1: Same in both source and target → no conflict
        if source_val == target_val:
            result[field] = source_val
            continue

        # Case 2: Only source changed from ancestor
        if ancestor_val == target_val and ancestor_val != source_val:
            result[field] = source_val
            continue

        # Case 3: Only target changed from ancestor
        if ancestor_val == source_val and ancestor_val != target_val:
            result[field] = target_val
            continue

        # Case 4: Field added in source only
        if ancestor_val is None and target_val is None and source_val is not None:
            result[field] = source_val
            continue

        # Case 5: Field added in target only
        if ancestor_val is None and source_val is None and target_val is not None:
            result[field] = target_val
            continue

        # Case 6: CONFLICT - both changed differently
        conflicting_fields.append(FieldConflict(
            field_name=field,
            ancestor_value=ancestor_val,
            source_value=source_val,
            target_value=target_val
        ))

    if conflicting_fields:
        return MergeRecordResult.Conflict(
            merged_fields=result,
            conflicting_fields=conflicting_fields
        )

    return MergeRecordResult.Resolved(result)

Example: Successful Auto-Merge

Base record:

{"id": 1, "name": "Alice", "email": "alice@example.com", "status": "active"}

Ours (main):

{"id": 1, "name": "Alice", "email": "alice@company.com", "status": "active"}

Theirs (feature):

{"id": 1, "name": "Alice Smith", "email": "alice@example.com", "status": "inactive"}

Field-by-Field Analysis

┌─────────┬───────────────┬───────────────┬───────────────┬────────────────────┐
│ Field   │ Base          │ Ours (main)   │ Theirs (feat) │ Result             │
├─────────┼───────────────┼───────────────┼───────────────┼────────────────────┤
│ id      │ 1             │ 1             │ 1             │ 1 (unchanged)      │
│ name    │ Alice         │ Alice         │ Alice Smith   │ Alice Smith ✓      │
│         │               │ (unchanged)   │ (changed)     │ (take theirs)      │
│ email   │ alice@example │ alice@company │ alice@example │ alice@company ✓    │
│         │               │ (changed)     │ (unchanged)   │ (keep ours)        │
│ status  │ active        │ active        │ inactive      │ inactive ✓         │
│         │               │ (unchanged)   │ (changed)     │ (take theirs)      │
└─────────┴───────────────┴───────────────┴───────────────┴────────────────────┘

Result (auto-merged):

{"id": 1, "name": "Alice Smith", "email": "alice@company.com", "status": "inactive"}

Example: Field-Level Conflict

If both branches changed the same field to different values:

┌─────────┬───────────────┬────────────────────┬────────────────────┬──────────┐
│ Field   │ Base          │ Ours (main)        │ Theirs (feature)   │ Result   │
├─────────┼───────────────┼────────────────────┼────────────────────┼──────────┤
│ email   │ alice@example │ alice@company.com  │ a.smith@example    │ CONFLICT │
│         │               │ (changed)          │ (changed)          │ ⚠️       │
└─────────┴───────────────┴────────────────────┴────────────────────┴──────────┘

Conflict Types

Horizon Epoch distinguishes several conflict types:

#![allow(unused)]
fn main() {
pub enum RecordConflictType {
    /// Both sides modified the same record differently
    ModifyModify,

    /// Source deleted, target modified
    DeleteModify,

    /// Source modified, target deleted
    ModifyDelete,

    /// Both sides added a record with the same primary key
    AddAdd,

    /// Other edge cases
    Other,
}
}

Each type may require different resolution strategies.

Merge Strategies

Horizon Epoch supports four merge strategies, each suited for different workflows:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         MERGE STRATEGIES                                     │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────┬────────────────────────────────────────────────────────────┐
│ Strategy        │ Behavior                                                    │
├─────────────────┼────────────────────────────────────────────────────────────┤
│ ThreeWay        │ Standard 3-way merge, reports conflicts for resolution     │
│ (default)       │ Use when: You want to review and manually resolve conflicts│
├─────────────────┼────────────────────────────────────────────────────────────┤
│ AcceptSource    │ Auto-resolve conflicts by taking source branch values      │
│ (--strategy     │ Use when: Source branch should "win" (e.g., hotfixes)      │
│  theirs)        │                                                            │
├─────────────────┼────────────────────────────────────────────────────────────┤
│ AcceptTarget    │ Auto-resolve conflicts by keeping target branch values     │
│ (--strategy     │ Use when: Target branch should "win" (preserve main)       │
│  ours)          │                                                            │
├─────────────────┼────────────────────────────────────────────────────────────┤
│ FailOnConflict  │ Abort immediately if any conflict is detected              │
│ (--fail-on-     │ Use when: CI/CD pipelines, automated processes             │
│  conflict)      │                                                            │
└─────────────────┴────────────────────────────────────────────────────────────┘

Default Strategy (ThreeWay)

Field-level three-way merge with conflict detection. Non-conflicting changes merge automatically; conflicts are reported for manual resolution.

epoch merge feature/branch

Accept Source (Theirs) Strategy

On conflicts, automatically choose the source branch (the branch being merged in):

epoch merge feature/branch --strategy theirs

Use cases:

Hotfix branches that must override any conflicting changes
Feature branches developed by domain experts
Emergency deployments

Accept Target (Ours) Strategy

On conflicts, automatically keep the target branch values (the branch you’re merging into):

epoch merge feature/branch --strategy ours

Use cases:

Integrating experimental branches where existing data should be preserved
Rebasing feature branches onto main
Pulling updates while preserving local modifications

Fail on Conflict Strategy

Fail immediately if any conflict is detected. Useful for automated pipelines.

epoch merge feature/branch --fail-on-conflict

Use cases:

CI/CD pipelines where conflicts indicate process failures
Automated data sync jobs
Pre-merge validation checks

Strategy Comparison

Scenario	ThreeWay	AcceptSource	AcceptTarget	FailOnConflict
Non-conflicting changes	✓ Merge	✓ Merge	✓ Merge	✓ Merge
Same field, different values	⚠️ Report	Take source	Keep target	❌ Abort
Delete vs modify	⚠️ Report	Delete	Keep modified	❌ Abort
Both add same key	⚠️ Report	Take source	Keep target	❌ Abort

Conflict Resolution

Interactive Resolution

epoch conflicts resolve --interactive

Walks through each conflict:

Conflict in users.email (record id=1):
  BASE:   alice@example.com
  OURS:   alice@company.com
  THEIRS: alice.smith@example.com

Choose:
  [o] Keep ours
  [t] Keep theirs
  [b] Keep base
  [c] Custom value
>

Programmatic Resolution

conflicts = client.get_conflicts()

for conflict in conflicts:
    for field in conflict.fields:
        # Custom resolution logic
        if field.name == "email":
            # Business rule: prefer company domain
            if "@company.com" in field.ours_value:
                resolution = field.ours_value
            else:
                resolution = field.theirs_value

            client.resolve_conflict(
                table=conflict.table_name,
                record_id=conflict.record_id,
                field=field.name,
                value=resolution
            )

client.merge_continue()

Schema Merge

Schema changes are also merged:

Compatible Changes

These merge automatically:

Adding a column (either branch)
Adding an index
Changing a default value (non-conflicting)

Conflicting Changes

These require resolution:

Same column changed to different types
Same column renamed differently
Conflicting constraints

Schema conflict in users:
  Column 'status':
    OURS:   VARCHAR(20)
    THEIRS: INTEGER

Choose resolution:
  [o] Keep VARCHAR(20)
  [t] Change to INTEGER
  [c] Custom type
>

Merge Commits

After successful merge, a merge commit is created:

Commit: ghi789
Parents: abc123 (main), def456 (feature)
Message: Merge feature/branch into main

Changes:
  users: 5 records merged, 2 conflicts resolved

Performance

Complexity

Finding merge base: O(commit history depth)
Computing changes: O(changed records)
Conflict detection: O(changed records)

Optimization

Change tracking - Only examine records marked as changed
Parallel comparison - Compare multiple tables concurrently
Early termination - Stop at first conflict if not auto-resolving

Best Practices

Merge frequently - Smaller merges have fewer conflicts
Keep branches focused - One feature per branch reduces conflicts
Communicate on shared data - Coordinate changes to same records
Test after merge - Verify data integrity post-merge
Document conflict resolution - Include context in merge commit

Multi-Backend Architecture

Horizon Epoch is designed to manage data across multiple, heterogeneous storage backends through a unified interface.

Design Philosophy

Separation of Concerns

┌─────────────────────────────────────────────────────────────┐
│                    Horizon Epoch                             │
│                                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                 Metadata Layer                          │ │
│  │  - Commits, branches, tags                             │ │
│  │  - Table registrations                                  │ │
│  │  - Change tracking indices                              │ │
│  │  - Version graph                                        │ │
│  └────────────────────────────────────────────────────────┘ │
│                           │                                  │
│                           │ (references, not data)          │
│                           ▼                                  │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                  Storage Layer                          │ │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐  │ │
│  │  │PostgreSQL│ │  MySQL   │ │SQL Server│ │  SQLite  │  │ │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘  │ │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐  │ │
│  │  │  AWS S3  │ │  Azure   │ │   GCS    │ │  Local   │  │ │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘  │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Key insight: The metadata layer stores what changed and when, while storage adapters handle where the data physically lives.

Benefits

No Data Migration - Keep data where it is
Best Tool for Job - Use PostgreSQL for transactional, S3 for analytics
Gradual Adoption - Add version control to existing infrastructure
Unified Operations - Same commands work across all backends

Architecture Components

Storage Registry

Central registry of all configured backends:

config = Config(
    metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
    "prod_users",      # Logical name
    "postgresql://prod/users"
).add_postgres(
    "prod_orders",
    "postgresql://prod/orders"
).add_s3(
    "analytics",
    bucket="company-analytics"
)

Storage Location

Each table has a storage location that identifies:

Protocol - Which adapter to use
Backend Name - Which configured backend
Path - Table identifier within the backend

postgresql://prod_users/public.users
│            │          │
protocol     backend    table path

Table Registration

Tables are registered with their location:

# PostgreSQL table
client.register_table("users", "postgresql://prod_users/public.users")

# S3 Delta table
client.register_table("events", "s3://analytics/delta/events")

Metadata References

Metadata stores references to data, not the data itself:

-- In metadata database
SELECT * FROM epoch_tables;
┌──────────┬────────────────────────────────────────┐
│ name     │ location                                │
├──────────┼────────────────────────────────────────┤
│ users    │ postgresql://prod_users/public.users   │
│ orders   │ postgresql://prod_orders/public.orders │
│ events   │ s3://analytics/delta/events            │
└──────────┴────────────────────────────────────────┘

Cross-Backend Operations

Branching

Branches span all registered tables:

# Creates a branch that covers:
# - users (PostgreSQL)
# - orders (PostgreSQL)
# - events (S3)
client.create_branch("feature/new-reporting")

Each backend maintains its own overlay:

PostgreSQL: Overlay tables
S3: Separate Delta log

Committing

Commits can include changes from multiple backends:

# Changes to users (PostgreSQL) and events (S3)
# are captured in a single commit
client.commit(message="Update user events schema")

The commit metadata tracks which backends have changes:

{
  "commit_id": "abc123",
  "message": "Update user events schema",
  "changes": {
    "postgresql://prod_users": ["users"],
    "s3://analytics": ["events"]
  }
}

Diffing

Diff operations aggregate across backends:

diff = client.diff("main", "feature/branch")

# Returns changes from all backends
for table_diff in diff.table_diffs:
    print(f"{table_diff.location}: {table_diff.status}")

Merging

Merges are coordinated across backends:

Compute changes per backend
Detect conflicts per backend
Apply changes per backend (in transaction if supported)
Create unified merge commit

Consistency Model

Within a Backend

Operations within a single backend use that backend’s consistency guarantees:

PostgreSQL: ACID transactions
S3/Delta: Serializable via Delta protocol

Across Backends

Cross-backend operations provide best-effort consistency:

┌─────────────┐     ┌─────────────┐
│ PostgreSQL  │     │     S3      │
│   commit    │     │   commit    │
└──────┬──────┘     └──────┬──────┘
       │                    │
       └────────┬───────────┘
                │
         ┌──────▼──────┐
         │   Metadata  │
         │   Commit    │
         └─────────────┘

If one backend fails:

The operation is marked as partial
Rollback is attempted where possible
User is notified of partial state

try:
    client.commit(message="Multi-backend update")
except PartialCommitError as e:
    print(f"Committed to: {e.successful_backends}")
    print(f"Failed on: {e.failed_backends}")
    # Manual intervention needed

Configuration Patterns

Separate Environments

# Development
[storage.postgres.dev_db]
url = "postgresql://localhost/dev"

# Staging
[storage.postgres.staging_db]
url = "postgresql://staging-db.internal/staging"

# Production
[storage.postgres.prod_db]
url = "postgresql://prod-db.internal/production"
aws_secret_id = "prod-db-credentials"

Mixed Workloads

# Transactional data in PostgreSQL
[storage.postgres.transactional]
url = "postgresql://oltp-db/production"

# Analytics in S3
[storage.s3.analytics]
bucket = "company-analytics"
region = "us-east-1"

# Archive in Glacier
[storage.s3.archive]
bucket = "company-archive"
region = "us-east-1"
storage_class = "GLACIER_IR"

Cross-Region

[storage.s3.us_data]
bucket = "data-us"
region = "us-east-1"

[storage.s3.eu_data]
bucket = "data-eu"
region = "eu-west-1"

Routing and Discovery

Explicit Routing

Specify backend when registering tables:

client.register_table("users", "postgresql://prod_users/public.users")

Pattern-Based Routing

Configure default routing patterns:

[routing]
# Tables starting with "raw_" go to S3
"raw_*" = "s3://analytics"

# Everything else goes to PostgreSQL
"*" = "postgresql://default"

Auto-Discovery

Discover tables from backends:

# List tables in a backend
tables = client.discover_tables("postgresql://prod_db")

# Register discovered tables
for table in tables:
    client.register_table(table.name, table.location)

Performance Considerations

Query Routing

Queries are routed to the appropriate backend:

# Routed to PostgreSQL
client.query("SELECT * FROM users")

# Routed to S3
client.query("SELECT * FROM events")

Cross-Backend Queries

Currently, joins across backends are not supported in a single query. Use application-level joining:

# Query each backend
users = client.query("SELECT * FROM users")
events = client.query("SELECT * FROM events WHERE user_id IN (...)")

# Join in application
result = join(users, events, on="user_id")

Caching

Per-backend connection pooling and caching:

[storage.postgres.prod_db]
pool_size = 20
cache_schema = true

[storage.s3.analytics]
cache_metadata = true
cache_ttl = 300

Limitations

No cross-backend transactions - ACID only within single backend
No cross-backend joins - Query each backend separately
Eventual consistency - Cross-backend commits may be partially applied
Network latency - Operations touch multiple backends

Best Practices

Group related tables - Tables that are often queried together should be in the same backend
Consider latency - Place backends close to where they’re accessed
Plan for failures - Have recovery procedures for partial commits
Monitor all backends - Track health and performance per backend
Document routing - Make it clear which tables are where

Schema Storage Across Backends

Horizon Epoch stores schemas consistently regardless of the underlying storage backend, but the level of enforcement varies. This document explains how schemas are stored, introspected, and enforced across different backends.

Storage Architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                           Metadata Database                               │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │  table_versions                                                     │  │
│  │  ┌──────────────────────────────────────────────────────────────┐  │  │
│  │  │ id | table_id | commit_id | schema (JSONB)                   │  │  │
│  │  │ ----------------------------------------------------------   │  │  │
│  │  │ 1  | users    | abc123    | {"fields": [...], ...}           │  │  │
│  │  │ 2  | users    | def456    | {"fields": [...], ...}           │  │  │
│  │  │ 3  | orders   | abc123    | {"fields": [...], ...}           │  │  │
│  │  └──────────────────────────────────────────────────────────────┘  │  │
│  └────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────┘
                                      │
    ┌─────────────┬─────────────┬─────┴─────┬─────────────┬─────────────┐
    │             │             │           │             │             │
    ▼             ▼             ▼           ▼             ▼             ▼
┌────────┐  ┌────────┐  ┌────────────┐  ┌────────┐  ┌────────┐  ┌────────┐
│  SQL   │  │  SQL   │  │   Object   │  │  File  │  │  File  │  │  File  │
│ (Full) │  │(Part.) │  │  Storage   │  │Storage │  │Storage │  │Storage │
├────────┤  ├────────┤  │ (Metadata) │  │(Meta.) │  │(Meta.) │  │(Meta.) │
│PostgreSQL │MySQL  │  ├────────────┤  ├────────┤  ├────────┤  ├────────┤
│  MySQL  │  │SQLite │  │     S3     │  │ Azure  │  │  GCS   │  │ Local  │
│  MSSQL  │  │       │  │            │  │  Blob  │  │        │  │   FS   │
└────────┘  └────────┘  └────────────┘  └────────┘  └────────┘  └────────┘

Key Principle: Schema is always stored in the metadata database as JSONB. The storage backend may or may not enforce the schema.

PostgreSQL: Full Schema Support

PostgreSQL provides ConstraintSupportLevel::Full - all schema elements are enforced by the database engine.

Schema Storage

Schemas are stored in two places:

Metadata database: JSONB in table_versions.schema
Live database: Actual DDL (tables, constraints, indexes)

Introspection

Horizon Epoch introspects PostgreSQL schemas using system catalogs:

-- Columns from information_schema
SELECT column_name, data_type, is_nullable, column_default
FROM information_schema.columns
WHERE table_name = 'users';

-- Constraints from pg_catalog
SELECT conname, contype, pg_get_constraintdef(oid)
FROM pg_constraint
WHERE conrelid = 'users'::regclass;

-- Indexes from pg_indexes
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users';

-- Enum types from pg_type
SELECT typname, enumlabel
FROM pg_type t
JOIN pg_enum e ON t.oid = e.enumtypid
WHERE typname = 'order_status';

What’s Captured

Element	Captured	Enforced	Notes
Column names	Yes	Yes
Column types	Yes	Yes	Full PostgreSQL type mapping
Nullability	Yes	Yes	NOT NULL constraints
Defaults	Yes	Yes	Literals, functions, sequences
Primary keys	Yes	Yes
Foreign keys	Yes	Yes	With referential actions
Unique constraints	Yes	Yes	Including NULLS NOT DISTINCT
Check constraints	Yes	Yes	Full expression capture
Exclusion constraints	Yes	Yes	GiST-based
Indexes	Yes	Yes	B-tree, GIN, GiST, BRIN
Enum types	Yes	Yes	With value order

CLI Examples

# Show schema from PostgreSQL table
epoch schema show users

# Output:
# Table: users (PostgreSQL)
# Source: Live database introspection
#
# Columns:
#   id         SERIAL        PRIMARY KEY
#   email      VARCHAR(255)  NOT NULL
#   name       VARCHAR(255)  NOT NULL
#   status     user_status   DEFAULT 'active'
#   created_at TIMESTAMP     DEFAULT CURRENT_TIMESTAMP
#
# Constraints:
#   pk_users              PRIMARY KEY (id)
#   uq_users_email        UNIQUE (email)
#   chk_users_email       CHECK (email ~ '^[^@]+@[^@]+$')
#   fk_users_org          FOREIGN KEY (org_id) REFERENCES orgs (id)
#
# Indexes:
#   idx_users_email       btree (email)
#   idx_users_status      btree (status) WHERE status = 'active'
#
# Enum Types:
#   user_status           (active, inactive, suspended)

Python Example

import asyncio
from horizon_epoch import Client

async def inspect_schema():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Get constraints for a table
        constraints = await client.get_constraints("users", branch="main")

        print(f"Unique Constraints: {len(constraints.unique_constraints)}")
        print(f"Foreign Keys: {len(constraints.foreign_keys)}")
        print(f"Check Constraints: {len(constraints.check_constraints)}")

asyncio.run(inspect_schema())

S3/Delta Lake: Metadata-Only Support

S3/Delta Lake provides ConstraintSupportLevel::MetadataOnly - schemas are stored for documentation and merge operations but not enforced at the storage level.

Schema Storage

Schemas are stored in:

Metadata database: JSONB in table_versions.schema
Delta Lake: Parquet schema (columns and types only)
Delta Lake table properties: Constraint metadata

What’s Stored vs Enforced

Element	Stored	Enforced	Notes
Column names	Yes	Yes	Via Arrow/Parquet schema
Column types	Yes	Yes	Mapped to Arrow types
Nullability	Yes	Partial	Arrow schema nullability
Defaults	Yes	No	Stored as metadata only
Primary keys	Yes	No	Used as merge key, not unique
Foreign keys	Yes	No	Stored for documentation
Unique constraints	Yes	No	Duplicates allowed
Check constraints	Yes	No	Expressions not evaluated
Exclusion constraints	Yes	No	PostgreSQL-specific
Indexes	Yes	No	Delta Lake handles indexing
Enum types	Yes	Partial	Arrow dictionary encoding

Schema Conversion

Horizon Epoch converts schemas to/from Arrow format:

Horizon Epoch Schema ←→ Apache Arrow Schema ←→ Parquet Files

Type Mapping:

Horizon Epoch	Arrow Type	Notes
int8	Int8
int16	Int16
int32	Int32
int64	Int64
float32	Float32
float64	Float64
decimal(p,s)	Decimal128	Precision preserved
varchar(n)	Utf8	Length not enforced
text	LargeUtf8
boolean	Boolean
timestamp	Timestamp	With timezone
date	Date32
time	Time64	Nanoseconds
uuid	FixedSizeBinary(16)
jsonb	LargeUtf8	Stored as JSON string
bytea	LargeBinary
enum	Dictionary	Values preserved

CLI Examples

# Show schema from S3/Delta table
epoch schema show events

# Output:
# Table: events (S3/Delta Lake)
# Source: Stored metadata (constraints not enforced)
#
# Columns:
#   id         UUID
#   user_id    INT64
#   event_type VARCHAR(100)
#   payload    JSONB
#   created_at TIMESTAMP
#
# Constraints (metadata only - not enforced):
#   pk_events             PRIMARY KEY (id)
#   fk_events_user        FOREIGN KEY (user_id) REFERENCES users (id)
#
# Indexes (not created - Delta Lake manages indexing):
#   idx_events_user       btree (user_id)
#   idx_events_type       btree (event_type)
#
# Note: S3/Delta Lake does not enforce constraints at the storage level.
# Constraints are stored for documentation and merge conflict detection.

Python Example

import asyncio
from horizon_epoch import Client

async def check_delta_constraints():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Get constraints from Delta table
        constraints = await client.get_constraints("events", branch="main")

        # Constraints are stored but not enforced by S3/Delta
        print(f"Foreign Keys: {len(constraints.foreign_keys)}")
        print(f"Unique Constraints: {len(constraints.unique_constraints)}")

        # For S3/Delta backends, constraints are metadata-only
        # Validation happens at write time (application level)

asyncio.run(check_delta_constraints())

Constraint Storage in Delta Lake

Constraints are stored as Delta Lake table properties:

{
  "horizon_epoch.constraints": {
    "primary_key": ["id"],
    "foreign_keys": [
      {
        "name": "fk_events_user",
        "columns": ["user_id"],
        "references_table": "users",
        "references_columns": ["id"]
      }
    ],
    "unique_constraints": [],
    "check_constraints": [
      {
        "name": "chk_events_type",
        "expression": "event_type IN ('click', 'view', 'purchase')"
      }
    ]
  }
}

Cross-Backend Operations

Merge Behavior

Constraint conflicts are detected regardless of enforcement level:

# Branch A (PostgreSQL) modifies constraint
epoch checkout main
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 18)"
epoch commit -m "Add age constraint"

# Branch B (same table) modifies same constraint
epoch checkout feature
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 21)"
epoch commit -m "Different age constraint"

# Merge detects conflict
epoch merge feature
# CONFLICT (constraint): chk_age modified in both branches

Even for S3/Delta tables where constraints aren’t enforced, merge conflicts are detected:

# Both branches modify the same constraint metadata
# Merge requires resolution, even though neither is enforced
epoch merge feature
# CONFLICT (constraint): chk_events_type modified in both branches

Generating DDL for S3 Schemas

You can generate PostgreSQL DDL from schemas stored with S3/Delta tables:

# Generate DDL from S3 table schema
epoch schema show events --generate-ddl

# Output:
# -- DDL for events (from S3/Delta metadata)
# CREATE TABLE events (
#     id UUID PRIMARY KEY,
#     user_id INT64 NOT NULL,
#     event_type VARCHAR(100),
#     payload JSONB,
#     created_at TIMESTAMP
# );
#
# ALTER TABLE events ADD CONSTRAINT fk_events_user
#     FOREIGN KEY (user_id) REFERENCES users (id);
#
# CREATE INDEX idx_events_user ON events (user_id);

This is useful for:

Creating PostgreSQL replicas of S3 data
Validating schema compatibility
Documentation

Constraint Support Level API

from horizon_epoch import ConstraintSupportLevel

# Check what a backend supports
level = client.get_constraint_support("postgresql://...")

# Methods available
level.can_enforce()       # True for Full only
level.can_store()         # True for Full, MetadataOnly, Partial
level.can_execute_ddl()   # True for Full only
level.validation_mode()   # "database", "application", or "none"

# Use in application logic
if level.can_enforce():
    # Let database handle constraint enforcement
    client.insert(data)
else:
    # Validate in application before insert
    violations = client.validate_constraints(data)
    if violations:
        raise ValueError(f"Constraint violations: {violations}")
    client.insert(data)

Best Practices

1. Know Your Enforcement Level

support = client.get_constraint_support(table_location)
if support != ConstraintSupportLevel.Full:
    logger.warning(f"Constraints not enforced for {table_location}")

2. Validate Before Writing to S3

# For S3/Delta tables, validate in application
def insert_to_s3_table(client, table, data):
    schema = client.get_schema(table)

    # Validate primary key uniqueness
    existing_keys = client.query(f"SELECT id FROM {table}")
    if data['id'] in existing_keys:
        raise ValueError("Duplicate primary key")

    # Validate foreign keys exist
    for fk in schema.foreign_keys:
        # Check referenced table
        pass

    # Then insert
    client.insert(table, data)

3. Use PostgreSQL for Critical Constraints

If constraint enforcement is critical, use PostgreSQL tables:

# Analytics data - S3 is fine (no strict constraints needed)
client.register_table("events", "s3://bucket/events")

# User data - PostgreSQL for enforcement
client.register_table("users", "postgresql://mydb/public.users")

4. Document Constraint Semantics

In mixed-backend environments, document which constraints are enforced:

## Data Model

| Table | Backend | Constraints Enforced |
|-------|---------|---------------------|
| users | PostgreSQL | All (FK, UK, CHECK) |
| events | S3/Delta | None (metadata only) |
| orders | PostgreSQL | All |
| analytics | S3/Delta | None |

Common Issues

Solutions to frequently encountered problems with Horizon Epoch.

Connection Problems

Could not connect to metadata database

Symptoms:

Error: Database connection failed: connection refused

Causes:

PostgreSQL not running
Wrong connection string
Firewall blocking port
Authentication failure

Solutions:

Check PostgreSQL is running:
```
pg_isready -h localhost -p 5432
```

Verify connection string format:

postgresql://user:password@host:port/database

Check firewall settings:

# Check if port is accessible
nc -zv localhost 5432

Test connection directly:

psql "postgresql://user:pass@localhost/horizon_epoch"

S3 bucket not accessible

Symptoms:

Error: S3 error: Access Denied

Causes:

Invalid credentials
Bucket doesn’t exist
Incorrect permissions
Wrong region

Solutions:

Verify AWS credentials:
```
aws sts get-caller-identity
```
Check bucket exists:
```
aws s3 ls s3://bucket-name
```
Review IAM permissions (need s3:GetObject, s3:PutObject, s3:ListBucket)

Verify region matches bucket location:

aws s3api get-bucket-location --bucket bucket-name

SSH tunnel connection failed

Symptoms:

Error: SSH tunnel failed: Connection refused

Solutions:

Test SSH connection manually:
```
ssh -i ~/.ssh/key user@bastion
```
Check SSH key permissions:
```
chmod 600 ~/.ssh/id_rsa
```
Verify bastion host is accessible

Check known_hosts:

ssh-keyscan bastion.example.com >> ~/.ssh/known_hosts

Merge Issues

Merge conflict in table X

Symptoms:

CONFLICT (content): Merge conflict in users
Automatic merge failed; fix conflicts and then commit.

Understanding the conflict:

Same record modified in both branches
Same field changed to different values

Resolution:

View conflicts:
```
epoch conflicts show
```
Resolve interactively:
```
epoch conflicts resolve --interactive
```

Or accept one side:

epoch conflicts resolve --ours    # Keep target branch
epoch conflicts resolve --theirs  # Keep source branch

Complete merge:
```
epoch merge --continue
```
Or abort:
```
epoch merge --abort
```

Merge base not found

Symptoms:

Error: Could not find common ancestor for branches

Causes:

Branches have no common history
Orphaned branch
Corrupted commit graph

Solutions:

Check branch history:

epoch log main --oneline
epoch log feature/branch --oneline

Find common commits:
```
epoch log --all --graph
```

If no common ancestor, use --allow-unrelated:

epoch merge feature/branch --allow-unrelated

Branch Issues

Branch not found

Symptoms:

Error: Branch 'feature/xyz' not found

Solutions:

List available branches:
```
epoch branch list
```
Check for typos in branch name
Verify branch exists in repository:
```
epoch branch list --all
```

Cannot delete branch

Symptoms:

Error: Cannot delete branch 'feature/xyz': not fully merged

Solutions:

Merge the branch first:

epoch merge feature/xyz
epoch branch delete feature/xyz

Force delete (loses unmerged changes):

epoch branch delete feature/xyz --force

Table Issues

Table not found

Symptoms:

Error: Table 'users' not found

Solutions:

List registered tables:
```
epoch table list
```

epoch table add users --location "postgresql://mydb/public.users"

Check table exists in storage backend

Schema mismatch

Symptoms:

Error: Schema mismatch for table 'users': expected 5 columns, found 6

Causes:

Table schema changed outside Horizon Epoch
Uncommitted schema changes

Solutions:

View current schema:
```
epoch table show users --schema
```
Refresh schema:
```
epoch table refresh users
```

Commit schema changes:

epoch commit -m "Update schema for users table"

Performance Issues

Slow queries on branches

Symptoms:

Queries on branches significantly slower than main
Query times increase with branch age

Causes:

Large overlay size
Deep branch hierarchy
Missing indexes

Solutions:

Check overlay size:

epoch branch info feature/branch --stats

Materialize long-lived branches:

epoch branch materialize feature/branch

Merge to reduce hierarchy depth
Add indexes to overlay tables

Slow commit operations

Symptoms:

Commits take longer than expected
Timeout during commit

Causes:

Large number of changes
Network latency
Lock contention

Solutions:

Commit in smaller batches:

epoch commit --tables users -m "Part 1"
epoch commit --tables orders -m "Part 2"

Check for locks:

SELECT * FROM pg_locks WHERE relation = 'epoch_commits'::regclass;

Increase timeout:

epoch commit -m "Large update" --timeout 600

Authentication Issues

Vault authentication failed

Symptoms:

Error: Vault authentication failed: permission denied

Solutions:

Check Vault connectivity:
```
vault status
```
Verify token/credentials:
```
vault token lookup
```
Check policy permissions:
```
vault policy read horizon-epoch
```
For AppRole, verify role_id and secret_id are correct

AWS credentials expired

Symptoms:

Error: ExpiredTokenException: The security token included in the request is expired

Solutions:

For IAM roles, credentials refresh automatically
For SSO:
```
aws sso login
```
For access keys, rotate them:
```
aws iam create-access-key
```

Data Integrity Issues

Corrupted commit

Symptoms:

Error: Commit 'abc123' is corrupted or missing

Solutions:

Check commit exists:
```
epoch show abc123
```

Verify metadata database integrity:

SELECT * FROM epoch_commits WHERE id = 'abc123';

If commit is referenced but missing, contact support

Orphaned records

Symptoms:

Records in overlay not connected to commits
Disk usage growing unexpectedly

Solutions:

Run garbage collection:
```
epoch gc
```
Check for orphaned overlays:
```
epoch gc --dry-run
```
Manually clean up if needed:
```
epoch gc --force
```

Certificate/TLS Issues

Certificate expired

Symptoms:

Error: SSL error: certificate has expired

Causes:

Client certificate expired
Server certificate expired
CA certificate expired

Solutions:

Check certificate expiry:

# Client certificate
openssl x509 -enddate -noout -in /path/to/client.crt

# Server certificate (remote)
openssl s_client -connect db.example.com:5432 -starttls postgres 2>/dev/null | \
  openssl x509 -noout -enddate

Renew the certificate:

# If using Vault PKI
vault write pki/issue/my-role common_name="client"

# Manual renewal - contact your CA

If using Vault dynamic certificates, check renewal is working:
```
epoch doctor --check vault
```

Certificate verification failed

Symptoms:

Error: SSL error: certificate verify failed
Error: CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate

Causes:

Wrong CA certificate
Incomplete certificate chain
Self-signed certificate not trusted
Server hostname mismatch

Solutions:

Verify CA certificate is correct:

# Check certificate chain
openssl verify -CAfile /path/to/ca.crt /path/to/client.crt

Check certificate chain is complete:

# View full chain
openssl crl2pkcs7 -nocrl -certfile /path/to/cert.pem | \
  openssl pkcs7 -print_certs -noout

For hostname mismatch, check server certificate SANs:

openssl x509 -noout -text -in server.crt | grep -A1 "Subject Alternative Name"

Verify you’re connecting to the correct hostname matching the certificate

Permission denied reading certificate files

Symptoms:

Error: could not load private key file: Permission denied
Error: could not load certificate file: No such file or directory

Solutions:

Check file permissions:

ls -la /path/to/client.key
# Should show -rw------- (600) or -r-------- (400)

chmod 600 /path/to/client.key
chmod 644 /path/to/client.crt

Verify file exists and path is correct:

ls -la /path/to/client.crt /path/to/client.key /path/to/ca.crt

Check process user has access:

# Run as same user
sudo -u epoch_user cat /path/to/client.crt

Key doesn’t match certificate

Symptoms:

Error: key values mismatch
Error: SSL error: private key does not match certificate

Solutions:

Verify key matches certificate:

# Compare modulus hashes - they should match
openssl x509 -noout -modulus -in client.crt | openssl md5
openssl rsa -noout -modulus -in client.key | openssl md5

If mismatched, regenerate key/cert pair or locate correct files

RDS IAM Authentication Issues

Failed to generate IAM auth token

Symptoms:

Error: Failed to generate RDS IAM authentication token
Error: The security token included in the request is invalid

Causes:

IAM role doesn’t have rds-db:connect permission
Wrong AWS region
Invalid instance endpoint

Solutions:

Verify IAM policy allows rds-db:connect:

{
  "Effect": "Allow",
  "Action": "rds-db:connect",
  "Resource": "arn:aws:rds-db:REGION:ACCOUNT:dbuser:DBI_RESOURCE_ID/DB_USER"
}

Check AWS region matches RDS instance:

aws rds describe-db-instances --db-instance-identifier mydb \
  --query 'DBInstances[0].DBInstanceArn'

Verify you’re using the correct endpoint:

# epoch.toml - use the actual endpoint, not a custom DNS
[storage.postgres.mydb]
host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com"
use_iam_auth = true

RDS IAM token expired immediately

Symptoms:

Error: PAM authentication failed
Error: password authentication failed for user "iam_user"

Causes:

System clock skew
Token generated too far in advance
Wrong database user

Solutions:

Check system time is accurate:

date -u
# Compare with actual UTC time

# Sync time if needed
sudo ntpdate pool.ntp.org

Verify database user is configured for IAM:

-- In RDS
CREATE USER iam_user WITH LOGIN;
GRANT rds_iam TO iam_user;

Check the token is being generated correctly:

# Generate token manually for testing
aws rds generate-db-auth-token \
  --hostname mydb.cluster-xxx.us-east-1.rds.amazonaws.com \
  --port 5432 \
  --username iam_user

Cannot assume role for RDS IAM

Symptoms:

Error: User is not authorized to perform: sts:AssumeRole
Error: Access denied when assuming role for RDS authentication

Solutions:

Check assume role trust policy:

aws iam get-role --role-name MyRdsRole \
  --query 'Role.AssumeRolePolicyDocument'

Verify trust policy allows your principal:

{
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::ACCOUNT:role/YourExecutionRole"
  },
  "Action": "sts:AssumeRole"
}

Check for external_id requirement:

[storage.s3.mydb]
assume_role_arn = "arn:aws:iam::123456789012:role/RdsRole"
external_id = "required-external-id"  # If configured

Credential Refresh Issues

Credential refresh failed

Symptoms:

Error: Credential refresh failed: provider returned error
Error: Failed to refresh credentials: connection refused

Causes:

Credential provider unavailable
Network connectivity issues
Token/secret expired

Solutions:

Check provider connectivity:

# For Vault
vault status

# For AWS
aws sts get-caller-identity

Verify credentials haven’t expired beyond refresh:

# Check Vault token
vault token lookup

# Check current credentials are valid
epoch doctor --check credentials

Force credential refresh:

# Clear cached credentials
epoch config cache clear

# Re-authenticate
epoch auth login

Credentials expired and refresh not supported

Symptoms:

Error: Credentials expired
Error: Refresh not supported for static credentials

Causes:

Using static credentials without refresh capability
Credential cache disabled
Refresh interval too long

Solutions:

Use a provider that supports refresh:

# Instead of static password
[storage.postgres.mydb]
vault_path = "secret/data/mydb"  # Dynamic refresh

# Or environment variables (refresh on re-read)
[storage.postgres.mydb]
url = "${DB_URL}"

Enable credential caching with refresh:

[credentials]
cache_enabled = true
cache_ttl = 300
refresh_before_expiry = 60  # Refresh 60s before expiry

Token lease expired (Vault)

Symptoms:

Error: Vault token expired
Error: permission denied (token expired)

Solutions:

Check token status:
```
vault token lookup
```
Renew token if renewable:
```
vault token renew
```

For non-renewable tokens, re-authenticate:

# AppRole
vault write auth/approle/login \
  role_id=$ROLE_ID \
  secret_id=$SECRET_ID

Configure automatic renewal:

[vault]
auto_renew_token = true
renew_threshold = 0.7  # Renew when 70% of TTL elapsed

Multi-Backend Routing Issues

Storage backend not found

Symptoms:

Error: Storage backend 'mydb' not found
Error: Unknown storage location: postgresql://mydb/...

Causes:

Backend not configured
Typo in backend name
Configuration not loaded

Solutions:

List configured backends:
```
epoch config show storage
```

Check configuration file:

# epoch.toml
[storage.postgres.mydb]  # Backend name is 'mydb'
url = "postgresql://..."

Verify configuration is loaded:
```
epoch doctor --check config
```

Wrong backend selected for table

Symptoms:

Error: Table 'users' not found in storage 'datalake'
Error: Cannot access PostgreSQL table through S3 backend

Causes:

Table registered with wrong backend
Backend mismatch in storage location

Solutions:

Check table registration:
```
epoch table show users
```

Re-register with correct backend:

epoch table remove users
epoch table add users \
  --location "postgresql://correct_backend/public.users"

Verify storage location format:

PostgreSQL: postgresql://backend_name/schema.table
S3:         s3://backend_name/path/to/table

Cannot connect to multiple backends simultaneously

Symptoms:

Error: Connection pool exhausted
Error: Too many connections

Causes:

Pool size too small for multi-backend operations
Connection leak
Long-running transactions

Solutions:

Increase pool sizes:

[storage.postgres.backend1]
pool_size = 20

[storage.postgres.backend2]
pool_size = 20

Check for connection leaks:

-- PostgreSQL
SELECT * FROM pg_stat_activity
WHERE application_name LIKE '%epoch%';

Set idle connection timeouts:

[storage.postgres.mydb]
idle_timeout = 300  # Close idle connections after 5 min

Cross-backend operations failing

Symptoms:

Error: Cannot merge tables from different storage backends
Error: Cross-storage operation not supported

Causes:

Tables on different backends can’t be joined directly
Merge requires compatible storage types

Solutions:

Understand cross-backend limitations:
- Merges work within same storage type
- Queries are executed per-backend
- Results merged in memory

For cross-backend data access, use Python SDK:

# Read from both backends
users = client.query_table("users", backend="postgres1")
orders = client.query_table("orders", backend="postgres2")

# Join in Python/Pandas
merged = users.merge(orders, on="user_id")

Consider consolidating frequently-joined tables on same backend

Backend credentials conflict

Symptoms:

Error: Credential 'default' conflicts with existing credential
Warning: Multiple credentials found for backend 'mydb'

Solutions:

Use unique credential names per backend:

[storage.postgres.backend1]
vault_path = "secret/data/db1"

[storage.postgres.backend2]
vault_path = "secret/data/db2"  # Different path

Clear credential cache if switching configurations:
```
epoch config cache clear
```

Getting Help

If these solutions don’t resolve your issue:

Check logs:
```
epoch --verbose command
```
Enable debug logging:
```
RUST_LOG=debug epoch command
```
Report issues:
- Contact Horizon Analytic Studios for support
- Include: version (epoch --version), error message, steps to reproduce

FAQ

Frequently asked questions about Horizon Epoch.

General Questions

What is Horizon Epoch?

Horizon Epoch is a Git-like version control system for data. It provides branching, merging, commits, and history tracking for data stored in PostgreSQL, MySQL, SQL Server, SQLite, S3/Delta Lake, Azure Blob, GCS, and local filesystems.

How is it different from Git?

Git tracks changes to files (text). Horizon Epoch tracks changes to structured data (rows/records). Key differences:

Record-level tracking instead of line-level
Field-level conflict detection instead of text diff
Works with live databases instead of files
Storage-agnostic - supports multiple backends

Do I need to migrate my data?

No. Horizon Epoch works with data where it lives. You register existing tables and start versioning. Your data stays in its original location.

Is my data copied when I create a branch?

No. Branching is zero-copy. Only modified records are stored separately. Creating a branch is instant regardless of data size.

Architecture

Where is the metadata stored?

Metadata (commits, branches, change tracking) is stored in a dedicated PostgreSQL database. This is separate from your data storage.

Can I use different databases for metadata and data?

Yes. The metadata database and data storage are completely separate. For example:

Metadata: PostgreSQL on localhost
Data: PostgreSQL on AWS RDS + S3 for analytics

Does Horizon Epoch modify my existing tables?

Horizon Epoch creates overlay tables alongside your existing tables to track changes. Your original table structure is not modified.

What happens if I modify data directly without using Epoch?

Changes made outside of Horizon Epoch’s context won’t be tracked. For full version control, all changes should go through Epoch or you should periodically sync.

Performance

How fast is branch creation?

Branch creation is O(1) - constant time regardless of data size. Creating a branch from a 1TB dataset takes the same time as from a 1KB dataset (~milliseconds).

What’s the storage overhead?

Storage overhead is proportional to changes, not data size:

Base data: Original size (no overhead)
Per branch: Only modified records are stored
Typical overhead: 1-10% for active branches

Are queries slower on branches?

There’s a small overhead for branch queries due to overlay resolution. For typical branch sizes (< 1% of data modified), overhead is negligible (<10%). For large overlays, consider materializing the branch.

Can it handle large datasets?

Yes. Horizon Epoch is designed for large-scale data:

Zero-copy branching works at any scale
Only changed records are processed
Streaming for large operations
Pagination for history

Operations

Can I undo a merge?

Yes. Use epoch revert HEAD to create a new commit that undoes the merge. The original merge commit remains in history.

How do I see what changed?

# Changes since last commit
epoch status

# Changes between branches
epoch diff main feature/branch

# Commit history
epoch log

Can I query historical data?

Yes. Use time-travel queries:

# At specific commit
epoch query "SELECT * FROM users" --at abc123

# At specific time
epoch query "SELECT * FROM users" --at "2024-01-15 10:00:00"

# At tag
epoch query "SELECT * FROM users" --at v1.0.0

How do I resolve merge conflicts?

# See conflicts
epoch conflicts show

# Interactive resolution
epoch conflicts resolve --interactive

# Accept one side
epoch conflicts resolve --ours   # or --theirs

# Complete merge
epoch merge --continue

Storage Backends

Which storage systems are supported?

Horizon Epoch supports 8 storage backends:

Relational Databases (Full Constraint Support):

PostgreSQL
MySQL
Microsoft SQL Server
SQLite (partial constraint support)

Object Storage (Delta Lake Format):

AWS S3 (and S3-compatible like MinIO)
Azure Blob Storage
Google Cloud Storage
Local Filesystem

Can I use multiple storage backends?

Yes. You can register tables from different backends and manage them in a single repository. Operations like branching and merging work across backends.

Does it work with AWS RDS?

Yes. Use the PostgreSQL or MySQL adapter with your RDS connection string. IAM authentication is supported for enhanced security.

What about cloud data warehouses?

Snowflake, BigQuery, and Redshift are not currently supported. These have unique architectures that would require specialized adapters.

Security

Are credentials stored securely?

Credentials can be sourced from:

Environment variables
Encrypted files
HashiCorp Vault
AWS Secrets Manager

Never store credentials in plain text configuration files.

Does it support SSO?

AWS SSO/Identity Center is supported for S3 access. Database SSO depends on your database’s support.

What access controls are available?

Horizon Epoch includes comprehensive security features:

Branch protection rules - Prevent direct commits to protected branches
Role-based access control (RBAC) - Define roles with specific permissions
Row-level security (RLS) - Fine-grained data access policies
Audit logging - Track all operations for compliance
Commit signing - PGP/OpenPGP signature verification
Secret scanning - Detect accidentally committed secrets
Field-level masking - Redact sensitive data

from horizon_epoch import Client

async def branch_and_transform():
    async with Client.connect("postgresql://...") as client:
        await client.branch("etl/daily-run")
        # ... run transformations ...
        await client.commit("Daily ETL run")
        await client.merge("etl/daily-run")

Full branching and merging
Efficient storage (only changes)
Any storage backend

vs. Delta Lake time travel?

Delta Lake provides time travel within a single table. Horizon Epoch adds:

Cross-table consistency
Branching and merging
Multi-backend support
Git-like workflow

vs. lakeFS?

Both provide Git-like versioning for data. Key differences:

Horizon Epoch: Works with databases (PostgreSQL, MySQL, etc.) and object storage
lakeFS: Focused on object storage

vs. Nessie/Project Nessie?

Nessie provides Git-like versioning for Iceberg/Delta tables. Horizon Epoch:

Supports more storage backends (including relational databases)
Provides record-level tracking
Works with transactional databases

Troubleshooting

Where are logs?

# Verbose output
epoch --verbose command

# Debug logging
RUST_LOG=debug epoch command

How do I check system health?

# Run diagnostics
epoch doctor

# Check local services
epoch local status

Where can I get help?

Documentation: This site
Run epoch tips for getting started help
Run epoch --help for command reference

Licensing

Licensing terms to be announced. Please contact Horizon Analytic Studios for licensing inquiries.

Keyboard shortcuts

Horizon Epoch Documentation