Horizon Epoch
Git-like version control for data across heterogeneous storage systems.
Horizon Epoch brings the power of Git-style version control to your data infrastructure. Whether your data lives in relational databases, cloud object storage, or local filesystems, Horizon Epoch provides unified branching, merging, and history tracking.
Supported Storage Backends
- Relational Databases - PostgreSQL, MySQL, SQL Server, SQLite
- Cloud Object Storage - AWS S3, Azure Blob Storage, Google Cloud Storage
- Local Storage - Local filesystem with Delta Lake format
Key Features
- Multi-Storage Support - Work with data across 8 different storage backends
- Zero-Copy Branching - Create instant branches without duplicating data
- True Three-Way Merge - Smart conflict detection and resolution at the field level
- Copy-on-Write Semantics - Efficient storage with minimal overhead
- Git-Like Commands - Familiar CLI interface for data engineers
Who Is This For?
Horizon Epoch is designed for:
- Data Engineers who need to safely test ETL changes before production
- Analytics Teams who want to experiment with transformations in isolation
- DevOps Engineers managing data across multiple environments
- Anyone who has wished for Git-like workflows for their data
Quick Example
# Initialize a repository
epoch init my-data-repo --metadata-url "postgresql://localhost/horizon_epoch"
# Register a table to track
epoch table add users --location "postgresql://localhost/mydb/public.users"
# Create a feature branch
epoch branch create feature/new-schema
# Make changes, then commit
epoch commit -m "Add user preferences column"
# Merge back to main
epoch merge feature/new-schema
How It Works
Horizon Epoch tracks data versions through a metadata layer that sits alongside your existing storage:
- Metadata Database - Stores commit history, branch pointers, and version indices
- Storage Adapters - Connect to your data where it lives (PostgreSQL, MySQL, SQL Server, SQLite, S3, Azure, GCS, local filesystem)
- Copy-on-Write - Only modified records are stored separately on branches
This architecture means:
- No data migration required
- Works with your existing infrastructure
- Minimal storage overhead for branching
Getting Started
Ready to try Horizon Epoch? Start with the Installation Guide and then follow the Quickstart tutorial.
Documentation Structure
This documentation is organized into several sections:
- Getting Started - Installation and your first steps
- Tutorials - Step-by-step guided walkthroughs
- How-To Guides - Task-oriented recipes for specific goals
- Reference - Complete API and CLI documentation
- Concepts - Deep dives into how Horizon Epoch works
- Troubleshooting - Solutions to common problems
Installation
Horizon Epoch is currently in private development.
Installation packages and instructions will be available upon public release. For early access inquiries, please contact Horizon Analytic Studios.
Requirements
When released, Horizon Epoch will require:
- PostgreSQL 14+ (for metadata storage)
- One or more supported storage backends
Supported Storage Backends
- PostgreSQL
- MySQL
- Microsoft SQL Server
- SQLite
- AWS S3 / S3-compatible (MinIO, etc.)
- Azure Blob Storage
- Google Cloud Storage
- Local filesystem
Next Steps
- Core Concepts - Understand how Horizon Epoch works
- FAQ - Common questions
Quickstart
Installation instructions coming soon.
For early access inquiries, please contact Horizon Analytic Studios.
What is Horizon Epoch?
Horizon Epoch is a Git-like version control system for data. It provides:
- Branching - Create isolated environments for testing changes
- Commits - Track changes with full history
- Merging - Combine changes with conflict detection
- Multi-backend support - Works with PostgreSQL, MySQL, SQL Server, SQLite, S3, Azure, GCS, local filesystem
Next Steps
- Core Concepts - Understand how Horizon Epoch works
- FAQ - Common questions
Core Concepts
Repositories
A repository is the top-level container for versioned data. It contains:
- Multiple tables (data sources)
- Multiple branches (parallel versions)
- A history of commits (snapshots in time)
Branches
Branches are independent lines of development. Like Git:
mainis the default branch- Create branches for features, experiments, or environments
- Branches are zero-copy - creating one is instant
Commits
A commit is a snapshot of one or more tables at a point in time:
- Contains a message describing the changes
- Links to parent commit(s) forming a history graph
- Tracks which records changed
Tables
Tables are data sources that Horizon Epoch tracks:
- Can be database tables (PostgreSQL, MySQL, SQL Server, SQLite) or object storage (S3, Azure, GCS, local)
- Each table has a schema and primary key
- Changes are tracked at the record level
Copy-on-Write
Horizon Epoch uses copy-on-write (CoW) semantics:
- When you create a branch, no data is copied
- Only modified records are stored separately
- Queries automatically overlay changes on base data
This makes operations fast and storage efficient.
Three-Way Merge
When merging branches, Horizon Epoch uses three-way merge:
- Find the common ancestor
- Identify changes from ancestor to each branch
- Combine non-conflicting changes
- Report conflicts for overlapping changes
Conflicts can be resolved at the field level.
Storage Adapters
Horizon Epoch works with your existing storage through adapters:
Relational Databases:
- PostgreSQL - Full constraint support, TLS/SSL
- MySQL - Full constraint support, SSH tunnels
- SQL Server - Full constraint support, Windows auth
- SQLite - File-based or in-memory, partial constraint support
Object Storage (Delta Lake format):
- AWS S3 - And S3-compatible (MinIO, etc.)
- Azure Blob Storage - Multiple auth methods
- Google Cloud Storage - Service account or ADC
- Local Filesystem - For development and edge deployments
Each adapter implements the same interface, so version control operations work identically across storage types.
Metadata Layer
All versioning information is stored in a metadata database (PostgreSQL):
- Commit graph (directed acyclic graph of commits)
- Branch pointers (references to commits)
- Table registrations and schema versions
- Change tracking indices
The metadata layer is separate from your data storage, meaning Horizon Epoch doesn’t modify how your data is stored.
Tutorial: Your First Repository
In this tutorial, you’ll:
- Create a Horizon Epoch repository
- Register a PostgreSQL table
- Make your first commit
- Explore the history
Prerequisites
- Horizon Epoch installed (Installation Guide)
- PostgreSQL running with a sample database
Step 1: Create Sample Data
First, let’s create a sample table in PostgreSQL:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO users (name, email) VALUES
('Alice', 'alice@example.com'),
('Bob', 'bob@example.com'),
('Charlie', 'charlie@example.com');
Step 2: Initialize Repository
epoch init my-first-repo \
--metadata-url "postgresql://localhost/horizon_epoch" \
--description "My first Horizon Epoch repository"
Output:
Initialized repository 'my-first-repo' on branch 'main'
Step 3: Register the Table
epoch table add users \
--location "postgresql://localhost/mydb/public.users"
Output:
Registered table 'users' with 3 records
Schema: id (int), name (text), email (text), created_at (timestamp)
Primary key: id
Step 4: Create Initial Commit
epoch commit -m "Initial import of users table"
Output:
Created commit abc123f
1 table, 3 records
Step 5: View the Log
epoch log
Output:
commit abc123f
Author: Your Name <you@example.com>
Date: Thu Dec 12 10:30:00 2025
Initial import of users table
Tables: users
Stats: 3 records added
Step 6: Check Status
epoch status
Output:
On branch main
nothing to commit, working tree clean
Congratulations!
You’ve created your first Horizon Epoch repository. Next steps:
- Branching Workflow - Learn about branches
- Merging Data - Combine changes
Tutorial: Branching Workflow
Learn how to use branches effectively in Horizon Epoch for parallel development and safe experimentation.
Prerequisites
- Completed Your First Repository tutorial
- A repository with at least one registered table
Understanding Branches
Branches in Horizon Epoch work like Git branches:
- Each branch is an independent line of development
- Branches are zero-copy - creating one doesn’t duplicate data
- Changes on one branch don’t affect others until merged
How Zero-Copy Branching Works
When you create a branch in Horizon Epoch, no data is actually copied. Instead, the new branch simply points to the same data as its parent. Only when you make changes does Horizon Epoch store the modified records separately.
flowchart LR
subgraph "Before Branch Creation"
M1[("main branch")] --> D1[("Shared Data<br/>users: 1000 rows<br/>orders: 5000 rows")]
end
flowchart LR
subgraph "After Branch Creation (Zero-Copy)"
M2[("main")] --> D2[("Shared Data<br/>users: 1000 rows<br/>orders: 5000 rows")]
F2[("feature")] --> D2
end
flowchart LR
subgraph "After Modifications on Feature Branch"
M3[("main")] --> D3[("Shared Data<br/>users: 1000 rows<br/>orders: 5000 rows")]
F3[("feature")] --> D3
F3 --> C3[("Branch Changes<br/>users: 5 modified<br/>users: 2 added")]
end
This architecture means:
- Instant branch creation - No waiting for data to copy
- Efficient storage - Only changed records are stored twice
- Fast queries - Branch-specific changes are overlaid on base data
Common Branching Patterns
Pattern 1: Feature Branch
Create a branch for a new feature or experiment:
gitGraph
commit id: "Initial"
commit id: "Add users table"
branch feature/user-preferences
checkout feature/user-preferences
commit id: "Add preferences column"
commit id: "Populate default values"
checkout main
merge feature/user-preferences id: "Merge feature"
commit id: "Continue development"
epoch branch create feature/user-preferences
epoch checkout feature/user-preferences
Pattern 2: Environment Branches
Maintain branches for different environments:
gitGraph
commit id: "v1.0"
branch staging
branch development
checkout development
commit id: "New feature A"
commit id: "New feature B"
checkout staging
merge development id: "Promote to staging"
checkout main
commit id: "v1.1 Hotfix"
checkout staging
merge main id: "Sync hotfix"
checkout main
merge staging id: "Release v1.2"
epoch branch create development
epoch branch create staging
epoch branch create production
Pattern 3: Hotfix Branch
Quick fixes that need to go directly to production:
gitGraph
commit id: "v1.0 Release"
commit id: "v1.1 Release"
branch hotfix/fix-email
checkout hotfix/fix-email
commit id: "Fix email format"
checkout main
merge hotfix/fix-email id: "v1.1.1 Hotfix"
epoch branch create hotfix/fix-email-format --from production
Pattern 4: Multiple Feature Branches
Working on multiple features in parallel:
gitGraph
commit id: "Base"
branch feature/auth
branch feature/reporting
checkout feature/auth
commit id: "Add login"
commit id: "Add logout"
checkout feature/reporting
commit id: "Add daily report"
commit id: "Add weekly report"
checkout main
merge feature/auth id: "Merge auth"
merge feature/reporting id: "Merge reporting"
Step-by-Step: Feature Branch Workflow
1. Create a Feature Branch
epoch branch create feature/add-status-field
2. Switch to the Branch
epoch checkout feature/add-status-field
3. Make Changes to Your Data
Modify the underlying data (e.g., add records, update schema):
-- In your PostgreSQL database
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';
UPDATE users SET status = 'active';
4. Review Changes
epoch diff
Output:
Modified: users
Schema changes:
+ status (varchar)
Record changes:
~ 3 records modified (new field values)
5. Commit Your Changes
epoch commit -m "Add status field to users table"
6. View Branch History
epoch log --oneline
Output:
def456g Add status field to users table
abc123f Initial import of users table
7. Compare with Main
epoch diff main..feature/add-status-field
8. Merge to Main
epoch checkout main
epoch merge feature/add-status-field
9. Clean Up
epoch branch delete feature/add-status-field
Branch Lifecycle Visualization
The complete lifecycle of a feature branch:
stateDiagram-v2
[*] --> Created: epoch branch create
Created --> Active: epoch checkout
Active --> Modified: Make data changes
Modified --> Committed: epoch commit
Committed --> Modified: More changes
Committed --> Reviewed: epoch diff main..branch
Reviewed --> Merged: epoch merge
Merged --> Deleted: epoch branch delete
Deleted --> [*]
Environment Promotion Flow
A typical flow for promoting changes through environments:
flowchart TD
subgraph Development
D1[Feature Branch] --> D2[Development Branch]
D2 --> D3{Tests Pass?}
D3 -->|No| D1
end
subgraph Staging
D3 -->|Yes| S1[Promote to Staging]
S1 --> S2{QA Approved?}
S2 -->|No| D1
end
subgraph Production
S2 -->|Yes| P1[Promote to Production]
P1 --> P2[Tag Release]
end
# Promote development to staging
epoch checkout staging
epoch merge development
# After QA approval, promote to production
epoch checkout production
epoch merge staging
epoch tag v1.2.0
Branch Comparison Summary
| Pattern | Use Case | Lifetime | Merge Target |
|---|---|---|---|
| Feature | New functionality | Days to weeks | development or main |
| Environment | Deployment stages | Permanent | Next environment |
| Hotfix | Critical fixes | Hours to days | production + main |
| Release | Version preparation | Days | main |
Best Practices
- Keep branches short-lived - Merge frequently to avoid conflicts
- Use descriptive names -
feature/user-authis better thanbranch1 - One feature per branch - Makes merging and rollback easier
- Test before merging - Validate changes on the branch first
- Delete merged branches - Keep your branch list clean
Visualizing Your Repository
You can visualize your repository’s branch structure:
# List all branches with their relationships
epoch branch list --graph
# Show commit history across branches
epoch log --graph --all
Next Steps
- Merging Data - Learn about merge strategies and conflict resolution
- Environment Promotion - Deploy changes safely across environments
Tutorial: Merging Data
Learn how to merge branches, handle conflicts, and understand merge strategies in Horizon Epoch.
Prerequisites
- Completed Branching Workflow tutorial
- Understanding of basic branch operations
Merge Types
Fast-Forward Merge
When the target branch hasn’t changed since the source branch was created:
Before:
main: A---B
\
feature: C---D
After (fast-forward):
main: A---B---C---D
epoch merge feature/simple-change
# Fast-forward merge: main -> def456g
Three-Way Merge
When both branches have diverged:
Before:
main: A---B---C
\
feature: D---E
After (three-way merge):
main: A---B---C---F
\ /
feature: D---E
epoch merge feature/complex-change
# Merged feature/complex-change into main (commit: ghi789h)
Handling Conflicts
Conflicts occur when both branches modify the same record’s fields.
Detecting Conflicts
epoch merge feature/conflicting-changes
# CONFLICT: Merge conflict in users
# Automatic merge failed; fix conflicts and then commit.
Viewing Conflicts
epoch status
Output:
On branch main
You have unmerged paths.
(fix conflicts and run "epoch merge --continue")
(use "epoch merge --abort" to abort the merge)
Unmerged paths:
both modified: users (3 conflicts)
Inspecting Conflict Details
epoch conflicts show
Output:
Table: users
Record ID: 42
Field: email
BASE: alice@example.com
OURS: alice@company.com
THEIRS: alice.smith@example.com
Field: status
BASE: active
OURS: active
THEIRS: inactive
Resolving Conflicts
Interactive Resolution
epoch conflicts resolve --interactive
Accept One Side
# Accept all changes from target branch (ours)
epoch conflicts resolve --ours
# Accept all changes from source branch (theirs)
epoch conflicts resolve --theirs
Field-Level Resolution
epoch conflicts resolve users --record 42 --field email --value "alice.smith@company.com"
Completing the Merge
epoch merge --continue
Aborting the Merge
If you need to start over:
epoch merge --abort
Merge Strategies
Default Strategy
The default strategy uses three-way merge with field-level conflict detection:
epoch merge feature/my-branch
Ours Strategy
Keep all conflicting values from the target branch:
epoch merge feature/my-branch --strategy ours
Theirs Strategy
Accept all conflicting values from the source branch:
epoch merge feature/my-branch --strategy theirs
Best Practices
- Merge frequently - Small, frequent merges have fewer conflicts
- Review diffs first - Use
epoch diffbefore merging - Test after merge - Validate data integrity post-merge
- Document conflicts - Add context in merge commit messages
Example: Complete Merge Workflow
# 1. Ensure you're on the target branch
epoch checkout main
# 2. Pull latest changes (if using remote)
epoch pull
# 3. Preview what will be merged
epoch diff main..feature/my-feature
# 4. Perform the merge
epoch merge feature/my-feature
# 5. If conflicts, resolve them
epoch conflicts show
epoch conflicts resolve --interactive
# 6. Complete the merge
epoch merge --continue
# 7. Verify the result
epoch log --oneline -5
epoch status
Next Steps
- Environment Promotion - Promote changes across environments
- Resolve Conflicts - Detailed conflict resolution guide
Tutorial: Environment Promotion
Learn how to safely promote data changes from development through staging to production using Horizon Epoch’s branching model.
Prerequisites
- Completed Merging Data tutorial
- Understanding of environment-based workflows
The Environment Branch Model
A typical setup uses long-lived branches for each environment:
production A---B-----------F---G
\ /
staging C---D---E
\
development H---I---J
Setting Up Environment Branches
1. Create Environment Branches
# Start from main (which represents production)
epoch branch create staging --from main
epoch branch create development --from staging
2. Protect Production
Configure branch protection to prevent direct commits:
epoch branch protect production --require-review --no-force-push
Development Workflow
1. Work on Development
All feature work starts on development or feature branches:
epoch checkout development
epoch branch create feature/new-report --from development
2. Develop and Test
Make changes and commit:
# Make data changes
epoch commit -m "Add sales report aggregations"
3. Merge to Development
epoch checkout development
epoch merge feature/new-report
Promoting to Staging
1. Review Changes
epoch diff staging..development
2. Create a Release Branch (Optional)
For complex releases:
epoch branch create release/v1.2 --from development
3. Promote to Staging
epoch checkout staging
epoch merge development # or release/v1.2
4. Run Staging Tests
Validate the data in staging environment before proceeding.
Promoting to Production
1. Final Review
epoch diff production..staging
2. Promote to Production
epoch checkout production
epoch merge staging
3. Tag the Release
epoch tag v1.2.0 --message "Release 1.2.0: Sales report aggregations"
Rollback Procedures
Quick Rollback
If issues are found immediately after promotion:
# Revert the last merge
epoch revert HEAD
Rollback to Specific Version
# Find the commit to rollback to
epoch log --oneline -10
# Reset to that commit
epoch reset --hard abc123f
Emergency Production Rollback
# Create a hotfix from the last known good state
epoch checkout production
epoch reset --hard v1.1.0
epoch tag v1.1.1-rollback --message "Emergency rollback from v1.2.0"
Hotfix Workflow
For urgent production fixes:
# 1. Create hotfix from production
epoch branch create hotfix/critical-fix --from production
# 2. Make the fix
epoch checkout hotfix/critical-fix
# ... apply fix ...
epoch commit -m "Fix critical data issue"
# 3. Merge directly to production
epoch checkout production
epoch merge hotfix/critical-fix
# 4. Backport to other environments
epoch checkout staging
epoch merge hotfix/critical-fix
epoch checkout development
epoch merge hotfix/critical-fix
Automation Tips
CI/CD Integration
# Example GitHub Actions workflow
name: Promote to Staging
on:
workflow_dispatch:
inputs:
source_branch:
description: 'Branch to promote'
default: 'development'
jobs:
promote:
runs-on: ubuntu-latest
steps:
- name: Promote to staging
run: |
epoch checkout staging
epoch merge ${{ inputs.source_branch }}
epoch push
Scheduled Promotions
# Promote development to staging every night
0 2 * * * epoch checkout staging && epoch merge development && epoch push
Best Practices
- Never commit directly to production - Always promote through the pipeline
- Tag production releases - Makes rollback and auditing easier
- Keep environments in sync - Regularly merge production back to development
- Automate where possible - Reduces human error
- Document promotions - Keep a changelog of what was promoted and when
Next Steps
- ETL Branch Testing - Test ETL changes safely
- Configuration Reference - Environment configuration options
Tutorial: ETL Branch Testing
Learn how to safely test ETL (Extract, Transform, Load) changes using Horizon Epoch branches before deploying to production.
Prerequisites
- Completed Environment Promotion tutorial
- Understanding of ETL workflows
- Python environment with Horizon Epoch SDK
The Problem
ETL changes are risky:
- Transformations might produce incorrect results
- Schema changes might break downstream systems
- Volume changes might cause performance issues
Traditional approaches:
- Test on copied data (expensive, often stale)
- Test in staging (still risky, shared environment)
- Hope for the best (not recommended)
The Solution: Branch-Based ETL Testing
With Horizon Epoch, you can:
- Create a branch from production data
- Run your ETL changes against the branch
- Validate the results
- Merge only when confident
Step-by-Step Guide
1. Set Up Your Repository
import asyncio
from horizon_epoch import Client, StorageBackend
async def setup():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
await client.init("my-repo")
# Add warehouse storage
await client.add_storage(
name="warehouse",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://localhost/data_warehouse"}
)
return client
# Note: In practice, you'd keep the client connection open throughout
2. Create a Test Branch
# Create branch from production
test_branch = await client.branch(
name="etl-test/sales-aggregation-v2",
start_point="main" # production
)
print(f"Created test branch: {test_branch.name}")
3. Run Your ETL Transformation
import pandas as pd
from horizon_epoch import Author
# Checkout the test branch
await client.checkout("etl-test/sales-aggregation-v2")
# Your ETL logic here
df = pd.read_sql("""
SELECT
date_trunc('month', sale_date) as month,
product_category,
SUM(amount) as total_sales
FROM sales
GROUP BY 1, 2
""", connection)
# Write transformed data
df.to_sql(
"monthly_sales_summary",
connection,
if_exists="replace",
index=False
)
# Stage and commit the transformation
await client.stage_all()
await client.commit(
message="Apply new sales aggregation logic v2",
author=Author(name="ETL Pipeline", email="etl@example.com")
)
4. Validate the Results
# Compare with expected results
async def validate_transformation():
# Already on the test branch from checkout
# Check row counts
result = connection.execute(
"SELECT COUNT(*) FROM monthly_sales_summary"
).scalar()
assert result > 0, "No rows in summary table"
# Check for nulls in required fields
nulls = connection.execute("""
SELECT COUNT(*) FROM monthly_sales_summary
WHERE month IS NULL OR total_sales IS NULL
""").scalar()
assert nulls == 0, f"Found {nulls} rows with null values"
# Verify totals match
branch_total = connection.execute(
"SELECT SUM(total_sales) FROM monthly_sales_summary"
).scalar()
# Compare with production
prod_total = connection.execute(
"SELECT SUM(amount) FROM sales"
).scalar()
assert abs(branch_total - prod_total) < 0.01, \
f"Total mismatch: {branch_total} vs {prod_total}"
print("All validations passed!")
validate_transformation()
5. Compare with Production
# Get detailed diff
diff = client.diff(
source="main",
target="etl-test/sales-aggregation-v2"
)
for table_diff in diff.table_diffs:
print(f"\nTable: {table_diff.table_name}")
print(f" Added: {table_diff.added_count}")
print(f" Modified: {table_diff.modified_count}")
print(f" Deleted: {table_diff.deleted_count}")
6. Review and Merge
If validations pass:
# Merge to production
result = client.merge(
source="etl-test/sales-aggregation-v2",
target="main"
)
if result.status == "success":
print("ETL changes merged to production!")
# Clean up test branch
client.delete_branch("etl-test/sales-aggregation-v2")
else:
print(f"Merge failed: {result.conflicts}")
Complete Example Script
#!/usr/bin/env python3
"""
ETL Branch Testing Example
Usage:
python etl_branch_test.py
"""
import asyncio
from horizon_epoch import Client, Author, StorageBackend
from datetime import datetime
async def main():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
# Add warehouse storage if needed
await client.add_storage(
name="warehouse",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://localhost/data_warehouse"}
)
branch_name = f"etl-test/transform-{datetime.now():%Y%m%d-%H%M%S}"
try:
# 1. Create test branch
print(f"Creating branch: {branch_name}")
await client.branch(branch_name, start_point="main")
# 2. Run transformation
print("Running ETL transformation...")
await client.checkout(branch_name)
await run_transformation()
await client.stage_all()
await client.commit(
message="Apply transformation",
author=Author(name="ETL", email="etl@example.com")
)
# 3. Validate
print("Validating results...")
await validate_results()
# 4. Review diff
print("\nChanges summary:")
diff = await client.diff(base="main", target=branch_name)
for td in diff.table_diffs:
print(f" {td.table_name}: {td.total_changes} changes")
# 5. Prompt for merge
if input("\nMerge to production? (y/n): ").lower() == 'y':
result = await client.merge(source=branch_name)
if result.has_conflicts:
print(f"Conflicts: {result.conflict_count}")
else:
print(f"Merged: {result.result_commit_id}")
finally:
# Clean up
branches = await client.branches()
if any(b.name == branch_name for b in branches):
await client.delete_branch(branch_name)
print(f"Cleaned up branch: {branch_name}")
def run_transformation():
# Your ETL logic here
pass
def validate_results():
# Your validation logic here
pass
if __name__ == "__main__":
main()
Best Practices
- Name branches descriptively - Include ETL name and timestamp
- Automate validation - Write comprehensive checks
- Keep transformations atomic - One logical change per branch
- Document expected changes - Record what the ETL should do
- Time-box testing - Don’t let test branches linger
Next Steps
- Branch-Aware Queries - Query across branches
- Multi-Backend Setup - ETL across storage types
Tutorial: Schema and Constraint Versioning
This tutorial demonstrates how to version database constraints alongside your data, including schema introspection, constraint diffing, merge conflict resolution, and DDL migration generation.
Prerequisites
- Horizon Epoch installed (Installation Guide)
- PostgreSQL running with a sample database
- Basic familiarity with Git-like version control concepts
Part 1: Schema Introspection with Constraints
Step 1: Create a Table with Constraints
First, let’s create a table with various constraints in PostgreSQL:
-- Create enum type
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered');
-- Create customers table
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT uq_customers_email UNIQUE (email),
CONSTRAINT chk_customers_email CHECK (email ~ '@')
);
-- Create orders table with foreign key
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL,
status order_status DEFAULT 'pending',
total NUMERIC(10,2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id)
REFERENCES customers (id) ON DELETE CASCADE,
CONSTRAINT chk_orders_total CHECK (total >= 0)
);
-- Create index
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_status ON orders (status) WHERE status != 'delivered';
Step 2: Register Tables with Horizon Epoch
# Initialize repository
epoch init ecommerce-repo \
--metadata-url "postgresql://localhost/horizon_epoch"
# Register tables (constraints are automatically detected)
epoch table add customers \
--location "postgresql://localhost/mydb/public.customers"
epoch table add orders \
--location "postgresql://localhost/mydb/public.orders"
# Initial commit captures schema with constraints
epoch commit -m "Initial schema with constraints"
Step 3: View Introspected Schema
# Show schema including constraints
epoch schema show customers
# Output:
# Table: customers
#
# Columns:
# id SERIAL PRIMARY KEY
# email VARCHAR(255) NOT NULL
# name VARCHAR(255) NOT NULL
# created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
#
# Constraints:
# uq_customers_email UNIQUE (email)
# chk_customers_email CHECK (email ~ '@')
Using Python:
import asyncio
from horizon_epoch import Client
async def view_constraints():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
# Get constraints for a table
constraints = await client.get_constraints("customers", branch="main")
print(f"\nUnique Constraints: {len(constraints.unique_constraints)}")
print(f"Check Constraints: {len(constraints.check_constraints)}")
print(f"Foreign Keys: {len(constraints.foreign_keys)}")
asyncio.run(view_constraints())
Part 2: Constraint Diffing Between Branches
Step 1: Create a Feature Branch
epoch branch create feature/add-phone
Step 2: Modify Constraints on the Branch
Add a phone number column with constraints:
-- On feature/add-phone branch
ALTER TABLE customers ADD COLUMN phone VARCHAR(20);
ALTER TABLE customers ADD CONSTRAINT uq_customers_phone UNIQUE (phone);
ALTER TABLE customers ADD CONSTRAINT chk_customers_phone
CHECK (phone ~ '^\+?[0-9]{10,15}$');
epoch commit -m "Add phone column with constraints"
Step 3: View the Constraint Diff
# Compare branches
epoch diff main feature/add-phone --schema
# Output:
# Schema changes for 'customers':
#
# Added columns:
# + phone VARCHAR(20)
#
# Added constraints:
# + uq_customers_phone UNIQUE (phone)
# + chk_customers_phone CHECK (phone ~ '^\+?[0-9]{10,15}$')
Using Python:
# Get constraint diff
diff = client.diff_schema("main", "feature/add-phone", table="customers")
print("Added constraints:")
for c in diff.constraints.added:
print(f" + {c.constraint_type}: {c.name}")
print("\nRemoved constraints:")
for c in diff.constraints.removed:
print(f" - {c.constraint_type}: {c.name}")
print("\nModified constraints:")
for m in diff.constraints.modified:
print(f" ~ {m.name}")
for change in m.changes:
print(f" {change}")
Part 3: Constraint Merge Conflict Resolution
Constraint conflicts occur when the same constraint is modified differently on two branches.
Step 1: Create Conflicting Changes
# On main branch
epoch checkout main
-- Modify the email check constraint on main
ALTER TABLE customers DROP CONSTRAINT chk_customers_email;
ALTER TABLE customers ADD CONSTRAINT chk_customers_email
CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');
epoch commit -m "Stricter email validation"
# On feature branch (already has different changes)
epoch checkout feature/add-phone
-- Also modify the email check constraint differently
ALTER TABLE customers DROP CONSTRAINT chk_customers_email;
ALTER TABLE customers ADD CONSTRAINT chk_customers_email
CHECK (email LIKE '%@%.%');
epoch commit -m "Simpler email validation"
Step 2: Attempt Merge
epoch checkout main
epoch merge feature/add-phone
# Output:
# CONFLICT (constraint): chk_customers_email modified in both branches
#
# On main:
# CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')
#
# On feature/add-phone:
# CHECK (email LIKE '%@%.%')
#
# Automatic merge failed. Fix conflicts and commit.
Step 3: View Conflicts
epoch status
# Output:
# On branch main
# You have unmerged paths.
#
# Constraint conflicts:
# (use "epoch resolve <constraint>" to mark resolution)
#
# both modified: customers.chk_customers_email
Using Python:
# Get merge conflicts
merge_result = client.merge("feature/add-phone", "main", dry_run=True)
for conflict in merge_result.constraint_conflicts:
print(f"Conflict: {conflict.constraint_name}")
print(f" Type: {conflict.constraint_type}")
print(f" Base: {conflict.base}")
print(f" Ours: {conflict.ours}")
print(f" Theirs: {conflict.theirs}")
print(f" Differences: {conflict.differences}")
Step 4: Resolve Conflicts
Option A: Use ours (main branch version)
epoch resolve customers.chk_customers_email --ours
epoch commit -m "Merge feature/add-phone with stricter email validation"
Option B: Use theirs (feature branch version)
epoch resolve customers.chk_customers_email --theirs
epoch commit -m "Merge feature/add-phone with simpler email validation"
Option C: Use custom resolution
# Create custom constraint
epoch resolve customers.chk_customers_email --custom \
"CHECK (email ~ '^[^@]+@[^@]+\.[^@]+$')"
epoch commit -m "Merge feature/add-phone with balanced email validation"
Using Python:
from horizon_epoch import ConstraintResolution
# Resolve using ours
client.resolve_constraint(
"customers",
"chk_customers_email",
resolution=ConstraintResolution.UseOurs
)
# Or use theirs
client.resolve_constraint(
"customers",
"chk_customers_email",
resolution=ConstraintResolution.UseTheirs
)
# Or provide custom
from horizon_epoch import CheckConstraint
custom_constraint = CheckConstraint(
name="chk_customers_email",
expression="email ~ '^[^@]+@[^@]+\\.[^@]+$'"
)
client.resolve_constraint(
"customers",
"chk_customers_email",
resolution=ConstraintResolution.UseCustom(custom_constraint)
)
# Complete the merge
client.commit(message="Merge with resolved constraint conflict")
Part 4: DDL Migration Generation
Horizon Epoch can generate migration scripts for schema changes.
Step 1: View Pending Changes
# Show what migrations would be generated
epoch schema diff main feature/new-indexes --ddl
# Output:
# -- Migration: main -> feature/new-indexes
#
# -- UP
# CREATE INDEX CONCURRENTLY idx_customers_name ON customers (name);
# ALTER TABLE orders ADD CONSTRAINT chk_orders_status
# CHECK (status IN ('pending', 'processing', 'shipped', 'delivered'));
#
# -- DOWN
# DROP INDEX CONCURRENTLY IF EXISTS idx_customers_name;
# ALTER TABLE orders DROP CONSTRAINT IF EXISTS chk_orders_status;
Step 2: Generate Migration Script
# Generate migration file
epoch schema migrate main feature/new-indexes \
--output migrations/V2__add_indexes.sql
# Output:
# Generated migration: migrations/V2__add_indexes.sql
# UP: 2 statements
# DOWN: 2 statements
# Warnings: 0
Using Python:
from horizon_epoch import PostgresDdlGenerator
# Get the schema diff
diff = client.diff_schema("main", "feature/new-indexes")
# Generate migration
generator = PostgresDdlGenerator()
migration = generator.generate_migration("customers", diff.constraints)
print("UP statements:")
for stmt in migration.up:
print(f" {stmt}")
print("\nDOWN statements:")
for stmt in migration.down:
print(f" {stmt}")
if migration.warnings:
print("\nWarnings:")
for warn in migration.warnings:
print(f" ! {warn}")
# Write to file
with open("migrations/V2__add_indexes.sql", "w") as f:
f.write("-- UP\n")
for stmt in migration.up:
f.write(f"{stmt};\n")
f.write("\n-- DOWN\n")
for stmt in migration.down:
f.write(f"{stmt};\n")
Step 3: Understanding Migration Ordering
The DDL generator follows a specific order to handle dependencies:
UP (Apply) Order:
- Create new enum types (before columns using them)
- Add values to modified enums
- Drop constraints being modified/removed (FK first)
- Drop indexes being modified/removed
- Handle primary key changes
- Add new/modified constraints (check, unique, then FK last)
- Create new indexes
DOWN (Reverse) Order:
- Drop new indexes
- Drop new constraints
- Re-add dropped indexes
- Re-add dropped constraints
- Re-add dropped enum types (with warnings)
Step 4: Check for Destructive Changes
# Check if migration has destructive changes
migration = generator.generate_migration("orders", diff.constraints)
if migration.has_destructive_changes():
print("WARNING: Migration contains destructive changes!")
for warn in migration.warnings:
print(f" {warn}")
Destructive changes include:
- Dropping enum values (data may reference them)
- Dropping foreign keys to tables being dropped
- Removing NOT NULL on columns with NULL values
Part 5: Handling Enum Type Changes
Enum types require special handling because PostgreSQL only allows additive changes.
Step 1: Add Enum Values
epoch checkout main
-- Add new enum value (safe operation)
ALTER TYPE order_status ADD VALUE 'cancelled' AFTER 'delivered';
epoch commit -m "Add cancelled status"
Step 2: Check Enum Safety
# Get enum diff
diff = client.diff_schema("main", "feature-branch")
for enum_diff in diff.enum_types:
if enum_diff.is_additive():
print(f"Enum {enum_diff.name}: Safe to merge (additive only)")
else:
print(f"Enum {enum_diff.name}: UNSAFE - values removed or reordered")
print(f" Removed: {enum_diff.removed}")
print(f" Reordered: {enum_diff.reordered}")
Step 3: Handle Non-Additive Enum Changes
If you need to remove or reorder enum values, you must:
- Create a new enum type
- Migrate data to the new type
- Drop the old type
- Rename the new type
-- Create new enum
CREATE TYPE order_status_new AS ENUM ('pending', 'processing', 'shipped');
-- Migrate data (handle 'delivered' and 'cancelled' values)
UPDATE orders SET status = 'shipped'
WHERE status IN ('delivered', 'cancelled');
-- Change column type
ALTER TABLE orders ALTER COLUMN status TYPE order_status_new
USING status::text::order_status_new;
-- Drop old, rename new
DROP TYPE order_status;
ALTER TYPE order_status_new RENAME TO order_status;
Best Practices
1. Always Review Constraint Diffs Before Merge
# Check what constraints changed
epoch diff main feature-branch --schema --constraints-only
2. Use Dry-Run for Merges
# Preview merge without applying
result = client.merge("feature", "main", dry_run=True)
if result.has_conflicts:
print("Resolve these conflicts before merging:")
for c in result.conflicts:
print(f" {c}")
3. Generate and Review Migrations
# Always review generated DDL
epoch schema migrate main feature --output migration.sql --dry-run
cat migration.sql
4. Test Migrations on Branch Data
# Create test branch
epoch branch create test/migration
# Apply migration
psql -f migration.sql
# Verify data integrity
epoch validate test/migration
5. Use Semantic Constraint Names
Follow a consistent naming convention:
- Foreign keys:
fk_{table}_{column}orfk_{table}_{referenced_table} - Unique:
uq_{table}_{columns} - Check:
chk_{table}_{description} - Index:
idx_{table}_{columns}
See Also
- Constraint Reference - Complete constraint type documentation
- Merge Algorithm - How three-way merge works
- Resolve Conflicts - Conflict resolution strategies
How to Connect PostgreSQL
Basic Connection
import asyncio
from horizon_epoch import Client
async def main():
async with Client.connect("postgresql://user:pass@localhost/horizon_epoch") as client:
# Client is now connected to metadata database
await client.init("my-repo")
asyncio.run(main())
CLI Connection
# Set connection in environment
export EPOCH_METADATA_URL="postgresql://user:pass@localhost/horizon_epoch"
# Or use connection string directly
epoch init my-repo --metadata-url "postgresql://localhost/horizon_epoch"
# Register PostgreSQL table
epoch table add users \
--location "postgresql://localhost/mydb/public.users"
Register a PostgreSQL Table
from horizon_epoch.client import _native
# Track a PostgreSQL table
loc = _native.StorageLocation.postgresql("main", "public", "users")
await client.track_table("users", loc)
# From a specific schema
loc = _native.StorageLocation.postgresql("main", "sales", "orders")
await client.track_table("orders", loc)
Add PostgreSQL as Storage Backend
# CLI - add storage backend
epoch storage add mydb \
--type postgresql \
--url "postgresql://user:pass@localhost/production"
from horizon_epoch import StorageBackend
await client.add_storage(
name="mydb",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://user:pass@localhost/production"}
)
With SSL
# CLI with SSL
epoch storage add mydb \
--type postgresql \
--url "postgresql://user:pass@host/db?sslmode=require"
For stricter SSL verification:
epoch storage add mydb \
--type postgresql \
--url "postgresql://user:pass@host/db?sslmode=verify-full&sslrootcert=/path/to/ca.crt"
Connection String Format
postgresql://[user[:password]@][host][:port][/database][?param=value&...]
Common parameters:
sslmode- disable, require, verify-ca, verify-fullsslrootcert- Path to root certificatesslcert- Path to client certificatesslkey- Path to client keyconnect_timeout- Connection timeout in secondsapplication_name- Application name for logging
Advanced Authentication
For enterprise environments requiring additional security:
- SSH Tunnels: See SSH Tunnel Guide
- Client Certificates (mTLS): See mTLS Authentication
- AWS RDS IAM Auth: See AWS Secrets Guide
- HashiCorp Vault: See Vault Integration
Troubleshooting
Connection Refused
Error: connection refused (os error 111)
- Verify PostgreSQL is running:
pg_isready -h localhost - Check host and port in connection string
- Verify firewall rules
Authentication Failed
Error: password authentication failed
- Verify username and password
- Check
pg_hba.conffor allowed authentication methods - Ensure user has access to the database
SSL Errors
Error: SSL connection required
Add ?sslmode=require to connection string, or for self-signed certificates:
?sslmode=verify-ca&sslrootcert=/path/to/ca.crt
Connect MySQL
Connect Horizon Epoch to MySQL databases.
CLI Configuration
# Add MySQL storage backend
epoch storage add mydb \
--type mysql \
--url "mysql://user:pass@localhost:3306/database"
Connection String Format
mysql://[user[:password]@][host][:port]/database[?param=value&...]
Common parameters:
ssl-mode- disabled, preferred, required, verify_ca, verify_identityssl-ca- Path to CA certificateconnect_timeout- Connection timeout in seconds
SSL/TLS Modes
| Mode | Description |
|---|---|
disabled | No SSL |
preferred | Use SSL if available (default) |
required | Require SSL, don’t verify certificate |
verify_ca | Require SSL, verify CA |
verify_identity | Require SSL, verify CA and hostname |
With SSL
epoch storage add mydb \
--type mysql \
--url "mysql://user:pass@host/db?ssl-mode=required"
With Certificate Verification
epoch storage add mydb \
--type mysql \
--url "mysql://user:pass@host/db?ssl-mode=verify_ca&ssl-ca=/path/to/ca.pem"
Register Tables
epoch table add customers \
--location "mysql://mydb/customers"
SSH Tunnel Support
MySQL connections support SSH tunneling for secure access:
epoch storage add mydb \
--type mysql \
--url "mysql://user:pass@localhost:3306/db" \
--ssh-host "bastion.example.com" \
--ssh-user "ec2-user" \
--ssh-key "~/.ssh/id_rsa"
See SSH Tunnels for more details.
AWS RDS MySQL
# Standard connection
epoch storage add rds \
--type mysql \
--url "mysql://admin:pass@mydb.cluster.us-east-1.rds.amazonaws.com:3306/mydb?ssl-mode=required"
For IAM authentication, see AWS Secrets.
Troubleshooting
Connection Refused
- Verify MySQL is running and accepting connections
- Check host, port, and firewall rules
- Verify user has remote access permissions
SSL Errors
- Ensure SSL mode matches server configuration
- For self-signed certs, use
verify_cawithssl-caparameter
Authentication Failed
- Verify username and password
- Check user has access from your host (
'user'@'%'vs'user'@'localhost')
Connect Microsoft SQL Server
Connect Horizon Epoch to Microsoft SQL Server databases.
CLI Configuration
# Add SQL Server storage backend
epoch storage add mydb \
--type mssql \
--url "mssql://user:pass@localhost:1433/database"
Connection String Format
mssql://[user[:password]@][host][:port]/database[?param=value&...]
Common parameters:
encrypt- off, on, requiredtrust_server_certificate- true/falseapplication_name- Application name for logging
Encryption Modes
| Mode | Description |
|---|---|
off | No encryption |
on | Encrypt if server supports it |
required | Require encryption (recommended) |
With Encryption
epoch storage add mydb \
--type mssql \
--url "mssql://user:pass@host/db?encrypt=required"
Trust Self-Signed Certificate
epoch storage add mydb \
--type mssql \
--url "mssql://user:pass@host/db?encrypt=required&trust_server_certificate=true"
Authentication Methods
SQL Server Authentication
epoch storage add mydb \
--type mssql \
--url "mssql://sa:password@localhost/database"
Windows Authentication
epoch storage add mydb \
--type mssql \
--url "mssql://localhost/database?integrated_security=true"
Register Tables
epoch table add customers \
--location "mssql://mydb/dbo.customers"
Azure SQL Database
epoch storage add azure \
--type mssql \
--url "mssql://user@server:pass@server.database.windows.net:1433/database?encrypt=required"
Troubleshooting
Connection Refused
- Verify SQL Server is running and TCP/IP is enabled
- Check SQL Server Configuration Manager for network settings
- Verify firewall allows port 1433
Login Failed
- Verify SQL Server authentication is enabled (mixed mode)
- Check username and password
- Verify user has access to the database
Encryption Errors
- Ensure
encryptsetting matches server configuration - For self-signed certs, use
trust_server_certificate=true
Connect SQLite
Connect Horizon Epoch to SQLite databases.
CLI Configuration
# File-based database
epoch storage add local \
--type sqlite \
--path "/path/to/database.db"
# In-memory database (for testing)
epoch storage add test \
--type sqlite \
--path ":memory:"
Configuration Options
| Option | Description | Default |
|---|---|---|
journal_mode | WAL, DELETE, TRUNCATE, etc. | WAL |
foreign_keys | Enable foreign key enforcement | true |
busy_timeout | Lock wait timeout (ms) | 5000 |
With WAL Mode (Recommended)
epoch storage add local \
--type sqlite \
--path "/data/mydb.db" \
--journal-mode wal
Enable Foreign Keys
epoch storage add local \
--type sqlite \
--path "/data/mydb.db" \
--foreign-keys true
Register Tables
epoch table add users \
--location "sqlite://local/users"
Use Cases
Development/Testing
SQLite is ideal for local development:
# Create a local test database
epoch storage add dev \
--type sqlite \
--path "./test.db"
epoch init my-repo --storage dev
Embedded Applications
For applications that need embedded data versioning without external databases.
Edge/Offline
SQLite works well for edge deployments or offline-first applications.
Constraint Support
SQLite has partial constraint support:
| Constraint | Support |
|---|---|
| Primary Key | Full |
| Unique | Full |
| Foreign Key | Requires foreign_keys=true |
| Check | Full |
| Index | Full |
Foreign keys must be explicitly enabled:
epoch storage add local \
--type sqlite \
--path "/data/mydb.db" \
--foreign-keys true
Troubleshooting
Database Locked
- Increase
busy_timeoutfor high-concurrency scenarios - Use WAL mode for better concurrent access
- Ensure only one process writes at a time
Foreign Key Violations Not Detected
- Ensure
foreign_keys=trueis set - Foreign key enforcement is off by default in SQLite
Permission Denied
- Check file and directory permissions
- Ensure the directory exists
How to Connect S3
Connect Horizon Epoch to S3-compatible storage for Delta Lake tables.
CLI Configuration
# Environment variables
export EPOCH_S3_ENDPOINT="http://localhost:9000"
export EPOCH_S3_BUCKET="horizon-data"
export EPOCH_S3_ACCESS_KEY="minioadmin"
export EPOCH_S3_SECRET_KEY="minioadmin"
# Add S3 storage backend
epoch storage add datalake \
--type s3 \
--bucket my-data-bucket \
--region us-east-1
# Register a Delta table
epoch table add sales \
--location "s3://my-bucket/delta/sales"
Python SDK
import asyncio
from horizon_epoch import Client, StorageBackend
async def main():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
await client.init("my-repo")
# Add S3 storage backend
await client.add_storage(
name="datalake",
backend=StorageBackend.S3,
config={
"bucket": "my-data-bucket",
"region": "us-east-1"
}
)
asyncio.run(main())
Using Environment Variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"
The SDK will use AWS environment variables automatically.
MinIO / S3-Compatible Endpoints
# CLI
epoch storage add minio \
--type s3 \
--bucket horizon-data \
--endpoint "http://localhost:9000" \
--force-path-style
await client.add_storage(
name="minio",
backend=StorageBackend.S3,
config={
"bucket": "horizon-data",
"endpoint": "http://localhost:9000",
"access_key": "minioadmin",
"secret_key": "minioadmin",
"force_path_style": True
}
)
IAM Roles (EC2/ECS/EKS)
When running on AWS infrastructure with IAM roles:
await client.add_storage(
name="datalake",
backend=StorageBackend.S3,
config={
"bucket": "my-data-bucket",
"use_instance_credentials": True
}
)
Assume Role
For cross-account access:
await client.add_storage(
name="cross-account",
backend=StorageBackend.S3,
config={
"bucket": "other-account-bucket",
"assume_role_arn": "arn:aws:iam::123456789012:role/HorizonEpochAccess",
"external_id": "optional-external-id"
}
)
Registering Delta Tables
# CLI
epoch table add sales \
--location "s3://my-bucket/delta/sales"
from horizon_epoch.client import _native
loc = _native.StorageLocation.s3("my-bucket", "delta/sales")
await client.track_table("sales", loc)
Table Paths
Tables are stored as Delta Lake tables:
s3://my-bucket/
delta/
sales/
_delta_log/ # Delta transaction log
part-00000.parquet
part-00001.parquet
users/
_delta_log/
part-00000.parquet
Configuration File
# ~/.epoch/config.toml
[storage.s3.default]
bucket = "horizon-data"
endpoint = "http://localhost:9000"
access_key = "minioadmin"
secret_key = "minioadmin"
Advanced Authentication
For enterprise environments:
- AWS Secrets Manager: See AWS Secrets Guide
- HashiCorp Vault: See Vault Integration
- IAM Identity Center (SSO): See AWS Secrets Guide
Required Bucket Permissions
Minimum IAM permissions needed:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
]
}
]
}
Troubleshooting
Access Denied
- Verify IAM permissions include all required actions
- Check bucket policy allows access from your account/role
- Ensure bucket region matches configuration
Endpoint Not Found
- For MinIO, ensure
force_path_style=Trueis set - Verify the endpoint URL is correct
- Check network connectivity to the endpoint
Slow Operations
- Consider enabling transfer acceleration for large files
- Use regional endpoints when possible
- Check for network latency to the S3 region
Connect Azure Blob Storage
Connect Horizon Epoch to Azure Blob Storage for Delta Lake tables.
CLI Configuration
# Add Azure storage backend
epoch storage add datalake \
--type azure \
--account "mystorageaccount" \
--container "data"
Authentication Methods
Account Key
epoch storage add datalake \
--type azure \
--account "mystorageaccount" \
--container "data" \
--account-key "your-account-key"
Or via environment variable:
export AZURE_STORAGE_ACCOUNT_KEY="your-account-key"
epoch storage add datalake \
--type azure \
--account "mystorageaccount" \
--container "data"
SAS Token
epoch storage add datalake \
--type azure \
--account "mystorageaccount" \
--container "data" \
--sas-token "?sv=2021-06-08&ss=b&srt=sco..."
Service Principal
export AZURE_CLIENT_ID="your-client-id"
export AZURE_CLIENT_SECRET="your-client-secret"
export AZURE_TENANT_ID="your-tenant-id"
epoch storage add datalake \
--type azure \
--account "mystorageaccount" \
--container "data" \
--auth-method service_principal
Managed Identity
For Azure VMs, AKS, or App Service with managed identity:
epoch storage add datalake \
--type azure \
--account "mystorageaccount" \
--container "data" \
--auth-method managed_identity
Register Delta Tables
epoch table add events \
--location "azure://datalake/delta/events"
Table Storage Structure
Tables are stored as Delta Lake format:
container/
delta/
events/
_delta_log/
part-00000.parquet
part-00001.parquet
Configuration File
# ~/.epoch/config.toml
[storage.azure.datalake]
account = "mystorageaccount"
container = "data"
prefix = "horizon-epoch/"
auth_method = "account_key"
Required Permissions
Minimum permissions for the storage account:
- Storage Blob Data Reader (read operations)
- Storage Blob Data Contributor (read/write operations)
Troubleshooting
AuthorizationFailure
- Verify account name and credentials
- Check container exists
- Ensure identity has proper RBAC roles
Container Not Found
- Verify container name is correct
- Check storage account name
Slow Operations
- Use regional endpoints when possible
- Consider Azure Private Link for private network access
Connect Google Cloud Storage
Connect Horizon Epoch to Google Cloud Storage for Delta Lake tables.
CLI Configuration
# Add GCS storage backend
epoch storage add datalake \
--type gcs \
--bucket "my-gcs-bucket" \
--project "my-project-id"
Authentication Methods
Service Account Key
# Set credentials file path
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
epoch storage add datalake \
--type gcs \
--bucket "my-gcs-bucket" \
--project "my-project-id"
Or specify directly:
epoch storage add datalake \
--type gcs \
--bucket "my-gcs-bucket" \
--project "my-project-id" \
--credentials-file "/path/to/service-account.json"
Application Default Credentials (ADC)
For GCE, GKE, Cloud Run, or local development with gcloud auth:
# Local development: authenticate with gcloud
gcloud auth application-default login
epoch storage add datalake \
--type gcs \
--bucket "my-gcs-bucket" \
--project "my-project-id" \
--auth-method adc
Workload Identity (GKE)
For GKE with Workload Identity configured:
epoch storage add datalake \
--type gcs \
--bucket "my-gcs-bucket" \
--project "my-project-id" \
--auth-method adc
Register Delta Tables
epoch table add events \
--location "gcs://datalake/delta/events"
Table Storage Structure
Tables are stored as Delta Lake format:
bucket/
delta/
events/
_delta_log/
part-00000.parquet
part-00001.parquet
Configuration File
# ~/.epoch/config.toml
[storage.gcs.datalake]
bucket = "my-gcs-bucket"
project_id = "my-project-id"
prefix = "horizon-epoch/"
auth_method = "service_account"
credentials_file = "/path/to/service-account.json"
Required Permissions
Minimum IAM permissions for the service account:
roles/storage.objectViewer # Read operations
roles/storage.objectCreator # Write operations
roles/storage.objectAdmin # Full access (recommended)
Or custom role with:
storage.objects.getstorage.objects.liststorage.objects.createstorage.objects.delete
Troubleshooting
Permission Denied
- Verify service account has required IAM roles
- Check bucket-level permissions
- Ensure project ID is correct
Bucket Not Found
- Verify bucket name is correct
- Check bucket is in the correct project
Credentials Not Found
- Set
GOOGLE_APPLICATION_CREDENTIALSenvironment variable - Or use
--credentials-fileparameter - For ADC, run
gcloud auth application-default login
Connect Local Filesystem
Connect Horizon Epoch to local filesystem storage for Delta Lake tables.
CLI Configuration
# Add local filesystem storage backend
epoch storage add local \
--type local \
--path "/data/horizon-epoch"
Use Cases
Development and Testing
Local storage is ideal for development:
epoch storage add dev \
--type local \
--path "./data"
epoch init my-repo --storage dev
Single-Machine Deployments
For applications that don’t need distributed storage:
epoch storage add data \
--type local \
--path "/var/lib/horizon-epoch"
Air-Gapped Environments
For environments without cloud access:
epoch storage add offline \
--type local \
--path "/mnt/data/epoch"
Register Delta Tables
epoch table add events \
--location "local://local/events"
Table Storage Structure
Tables are stored as Delta Lake format:
/data/horizon-epoch/
events/
_delta_log/
00000000000000000000.json
part-00000-*.parquet
part-00001-*.parquet
users/
_delta_log/
part-00000-*.parquet
Configuration File
# ~/.epoch/config.toml
[storage.local.dev]
path = "/data/horizon-epoch"
Features
- Delta Lake format - Same format as S3/Azure/GCS
- SQL queries - Query via DataFusion
- Copy-on-write - Efficient branching
- Time travel - Access historical versions
Permissions
Ensure the Epoch process has read/write access:
# Create directory with proper permissions
mkdir -p /data/horizon-epoch
chown -R epoch:epoch /data/horizon-epoch
chmod 750 /data/horizon-epoch
Troubleshooting
Permission Denied
- Check directory exists and is writable
- Verify process user has access
Disk Full
- Monitor disk space
- Use
epoch gcto clean up old versions - Configure retention policies
Path Not Found
- Ensure parent directory exists
- Use absolute paths to avoid ambiguity
How to Resolve Conflicts
When merging branches, conflicts occur when both branches modify the same record’s fields. This guide explains how to identify, understand, and resolve conflicts.
Understanding Conflicts
A conflict happens when:
- Both branches modify the same record (identified by primary key)
- Both branches change the same field to different values
Example:
- Branch A changes user 42’s email to
alice@company.com - Branch B changes user 42’s email to
alice.smith@example.com
Detecting Conflicts
When you run a merge:
epoch merge feature/branch
If conflicts exist:
CONFLICT: Merge conflict in users
Automatic merge failed; fix conflicts and then commit.
Viewing Conflicts
List Conflicting Tables
epoch status
Output:
On branch main
You have unmerged paths.
Unmerged paths:
both modified: users (3 conflicts)
both modified: orders (1 conflict)
View Conflict Details
epoch conflicts show
Output:
Table: users
------------------------------------------------------------
Record ID: 42
Field: email
BASE: alice@example.com
OURS: alice@company.com
THEIRS: alice.smith@example.com
Field: phone
BASE: 555-0100
OURS: 555-0100
THEIRS: 555-0199
Record ID: 87
Field: status
BASE: active
OURS: inactive
THEIRS: suspended
------------------------------------------------------------
Export Conflicts to File
epoch conflicts show --format json > conflicts.json
Resolution Methods
1. Interactive Resolution (Recommended)
epoch conflicts resolve --interactive
This walks you through each conflict:
Conflict 1 of 4: users.email (record 42)
BASE: alice@example.com
OURS: alice@company.com
THEIRS: alice.smith@example.com
Choose resolution:
[o] Keep ours (alice@company.com)
[t] Keep theirs (alice.smith@example.com)
[b] Keep base (alice@example.com)
[c] Enter custom value
[s] Skip (resolve later)
>
2. Accept All from One Side
Accept all changes from your branch (target):
epoch conflicts resolve --ours
Accept all changes from the source branch:
epoch conflicts resolve --theirs
3. Per-Table Resolution
# Accept theirs for users table
epoch conflicts resolve users --theirs
# Accept ours for orders table
epoch conflicts resolve orders --ours
4. Per-Record Resolution
# Keep our version of record 42
epoch conflicts resolve users --record 42 --ours
# Keep their version of record 87
epoch conflicts resolve users --record 87 --theirs
5. Per-Field Resolution
# Set specific value for a field
epoch conflicts resolve users \
--record 42 \
--field email \
--value "alice.final@company.com"
6. Python SDK Resolution
# Get conflicts
conflicts = client.get_conflicts()
for conflict in conflicts:
print(f"Table: {conflict.table_name}")
print(f"Record: {conflict.record_id}")
for field in conflict.fields:
print(f" {field.name}:")
print(f" base={field.base_value}")
print(f" ours={field.ours_value}")
print(f" theirs={field.theirs_value}")
# Resolve programmatically
client.resolve_conflict(
table=conflict.table_name,
record_id=conflict.record_id,
field=field.name,
value=determine_winner(field) # Your logic
)
# Continue merge
client.merge_continue()
Completing the Merge
After resolving all conflicts:
epoch merge --continue
Or if you want to abandon the merge:
epoch merge --abort
Conflict Prevention Strategies
- Merge frequently - Smaller, more frequent merges have fewer conflicts
- Communicate - Coordinate with team on who owns which data
- Use feature flags - Test changes without modifying shared records
- Partition work - Different team members work on different record sets
Best Practices
- Understand the context - Read the commit messages from both branches
- Check business rules - Some conflicts have a “correct” answer based on business logic
- Document decisions - Add context in your merge commit message
- Verify after merge - Run validation queries after resolving conflicts
Example Workflow
# 1. Start merge
epoch merge feature/user-updates
# 2. See what conflicted
epoch conflicts show
# 3. Resolve interactively
epoch conflicts resolve --interactive
# 4. Verify resolutions
epoch status
epoch diff HEAD
# 5. Complete merge with descriptive message
epoch merge --continue -m "Merge user-updates: resolved email conflicts using company domain policy"
How to Migrate from Snapshots
If you’re currently using database snapshots or manual backups for versioning, this guide helps you migrate to Horizon Epoch.
Current Snapshot-Based Workflow
Many teams use approaches like:
- Daily database dumps
- PostgreSQL
pg_dumpsnapshots - AWS RDS snapshots
- Manual
CREATE TABLE ... AS SELECTcopies
These approaches have limitations:
- Full data copies are expensive
- No granular change tracking
- Difficult to compare versions
- Merging changes is manual
Migration Overview
- Initialize a Horizon Epoch repository
- Import your existing data as the initial commit
- Set up tables for tracking
- Train your team on the new workflow
Step-by-Step Migration
1. Prepare Your Environment
# Start services
docker compose -f docker/docker-compose.yml up -d
# Build Horizon Epoch from source (see Installation guide)
cargo build --release
2. Initialize Repository
epoch init production-data \
--metadata-url "postgresql://localhost/horizon_epoch" \
--description "Migrated from daily snapshots"
3. Register Existing Tables
# Register each table you want to version
epoch table add customers \
--location "postgresql://localhost/mydb/public.customers"
epoch table add orders \
--location "postgresql://localhost/mydb/public.orders"
epoch table add products \
--location "postgresql://localhost/mydb/public.products"
4. Create Initial Commit
epoch commit -m "Initial import from production snapshot 2024-01-15"
5. Tag Important Snapshots
If you have historical snapshots you want to reference:
# Tag the current state
epoch tag snapshot-2024-01-15 \
--message "Migration baseline from daily snapshot"
6. Import Historical Snapshots (Optional)
If you have historical data you want to preserve:
import asyncio
from horizon_epoch import Client, Author
async def import_snapshots():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
# For each historical snapshot
for snapshot_date in historical_dates:
# Restore snapshot to a temporary database
restore_snapshot(snapshot_date, temp_db)
# Create a branch for this point in time
branch_name = f"history/{snapshot_date}"
await client.branch(branch_name)
# Register tables from restored snapshot
# (pointing to temp database)
# Commit
await client.commit(
message=f"Historical snapshot: {snapshot_date}",
author=Author(name="Migration", email="ops@example.com")
)
# Tag for easy reference
await client.tag_create(
name=f"snapshot-{snapshot_date}",
message="Imported from backup"
)
asyncio.run(import_snapshots())
Mapping Snapshot Workflows
Daily Backup Replacement
Before:
# Nightly cron job
pg_dump mydb > /backups/mydb-$(date +%Y%m%d).sql
After:
# Nightly cron job
epoch commit -m "Daily snapshot $(date +%Y-%m-%d)"
epoch tag daily-$(date +%Y%m%d)
Pre-Change Backup
Before:
# Before making changes
pg_dump mydb > /backups/pre-migration.sql
# Make changes
# If something goes wrong, restore from pre-migration.sql
After:
# Before making changes
epoch tag pre-migration-$(date +%Y%m%d)
# Make changes
epoch commit -m "Applied migration X"
# If something goes wrong
epoch reset --hard pre-migration-$(date +%Y%m%d)
Environment Copies
Before:
# Create staging from production
pg_dump prod_db | psql staging_db
After:
# Create staging branch from production
epoch branch create staging --from main
# Staging now has zero-copy access to production data
Handling Large Datasets
For very large databases:
1. Incremental Registration
Register tables in batches:
# Critical tables first
epoch table add customers orders products
epoch commit -m "Core business tables"
# Then secondary tables
epoch table add logs analytics events
epoch commit -m "Add operational tables"
2. Exclude Large/Non-Critical Tables
Some tables might not need versioning:
# Configure exclusions via CLI
# epoch config set exclude_patterns "audit_logs_*,*_archive,temp_*"
# Or skip these tables when registering
# Only register the tables you want to version
3. Use Partitioned Commits
For very large initial imports:
# Commit in chunks
epoch commit --tables customers,orders -m "Batch 1: customer data"
epoch commit --tables products,inventory -m "Batch 2: product data"
Validation
After migration, verify everything works:
# Check repository status
epoch status
# View commit history
epoch log
# Verify tables are registered
epoch table list
# Test branching
epoch branch create test-migration
epoch checkout test-migration
# Make a small change, commit, merge back
epoch checkout main
epoch merge test-migration
epoch branch delete test-migration
Team Training
Key concepts to communicate:
- Branches replace copies - No more copying entire databases
- Commits are lightweight - Only changes are stored
- Tags mark important points - Like naming a backup
- Merging combines changes - No more manual comparison
Rollback Plan
If migration doesn’t go smoothly:
- Your existing snapshot workflow still works
- Horizon Epoch metadata is separate from your data
- Remove Horizon Epoch without affecting production:
# Remove metadata database dropdb horizon_epoch
Next Steps
- Branching Workflow - Learn the new workflow
- Environment Promotion - Replace environment copies
Multi-Backend Setup
Configure Horizon Epoch to work with multiple storage backends simultaneously.
Overview
Horizon Epoch can manage data across different storage systems:
- Multiple PostgreSQL databases
- Multiple S3 buckets
- Mixed PostgreSQL + S3 environments
- Cross-region or cross-account configurations
Architecture
┌─────────────────────────────────────────────────┐
│ Horizon Epoch Metadata │
│ (Single PostgreSQL Database) │
└─────────────────────┬───────────────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ PostgreSQL│ │ PostgreSQL│ │ S3 │
│ (users) │ │ (orders) │ │ (events) │
└───────────┘ └───────────┘ └───────────┘
Configuration
Multiple PostgreSQL Databases
import asyncio
from horizon_epoch import Client, StorageBackend
async def setup_multi_backend():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
await client.init("multi-backend-repo")
# Add multiple PostgreSQL backends
await client.add_storage(
name="users_db",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://localhost/users"}
)
await client.add_storage(
name="orders_db",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://prod-orders.cluster.rds.amazonaws.com/orders"}
)
await client.add_storage(
name="analytics_db",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://analytics.internal/warehouse"}
)
asyncio.run(setup_multi_backend())
Multiple S3 Buckets
# Add multiple S3 backends
await client.add_storage(
name="raw_data",
backend=StorageBackend.S3,
config={"bucket": "company-raw-data", "region": "us-east-1"}
)
await client.add_storage(
name="processed_data",
backend=StorageBackend.S3,
config={"bucket": "company-processed", "region": "us-west-2"}
)
await client.add_storage(
name="archive",
backend=StorageBackend.S3,
config={"bucket": "company-archive", "region": "eu-west-1"}
)
Mixed PostgreSQL + S3
# Add both PostgreSQL and S3 backends
await client.add_storage(
name="transactional",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://localhost/production"}
)
await client.add_storage(
name="datalake",
backend=StorageBackend.S3,
config={"bucket": "company-datalake"}
)
Registering Tables
Specify the storage backend when registering tables:
from horizon_epoch.client import _native
# PostgreSQL table
loc = _native.StorageLocation.postgresql("users_db", "public", "users")
await client.track_table("users", loc)
# Different PostgreSQL database
loc = _native.StorageLocation.postgresql("orders_db", "public", "orders")
await client.track_table("orders", loc)
# S3 Delta table
loc = _native.StorageLocation.s3("datalake", "delta/events")
await client.track_table("events", loc)
CLI Registration
# PostgreSQL
epoch table add users --location "postgresql://users_db/public.users"
# S3
epoch table add events --location "s3://datalake/delta/events"
Configuration File
# epoch.toml
[metadata]
url = "postgresql://localhost/horizon_epoch"
# PostgreSQL backends
[storage.postgres.users_db]
url = "postgresql://localhost/users"
[storage.postgres.orders_db]
url = "postgresql://prod-orders.cluster.rds.amazonaws.com/orders"
aws_secret_id = "horizon-epoch/orders-db" # Credentials from Secrets Manager
[storage.postgres.analytics]
host = "analytics.internal"
database = "warehouse"
vault_path = "secret/data/analytics-db" # Credentials from Vault
# S3 backends
[storage.s3.raw]
bucket = "company-raw-data"
region = "us-east-1"
[storage.s3.processed]
bucket = "company-processed"
region = "us-west-2"
assume_role_arn = "arn:aws:iam::123456789012:role/DataAccess"
[storage.s3.archive]
bucket = "company-archive"
region = "eu-west-1"
endpoint = "https://s3.eu-west-1.amazonaws.com"
Cross-Storage Operations
Branching
Branches span all registered tables:
# Creates branch affecting all tables
epoch branch create feature/new-schema
Commits
Commits can include changes across backends:
# Commit changes from any backend
epoch commit -m "Update user preferences and events schema"
Diff
Compare changes across storage types:
epoch diff main feature/new-schema
Output:
PostgreSQL (users_db):
users: 5 modified, 2 added
S3 (datalake):
events: schema changed, 1000 added
Merge
Merges coordinate across all backends:
epoch merge feature/new-schema
Storage Routing
Default Backend
Set a default storage backend via configuration:
epoch config set default_storage users_db
Explicit Routing
Always specify backend in table location:
# Explicit backend reference via StorageLocation
loc = _native.StorageLocation.postgresql("users_db", "public", "users")
await client.track_table("users", loc)
Cross-Account S3
await client.add_storage(
name="partner_data",
backend=StorageBackend.S3,
config={
"bucket": "partner-shared-bucket",
"assume_role_arn": "arn:aws:iam::999888777666:role/HorizonEpochAccess",
"external_id": "partner-integration-id"
}
)
Best Practices
1. Separate Metadata from Data
The metadata database should be separate from data storage:
# Connect to dedicated metadata database
async with Client.connect("postgresql://metadata.internal/horizon_epoch") as client:
# Add production data storage (different server)
await client.add_storage(
name="production",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://data.internal/production"}
)
2. Use Consistent Naming
# Good: Clear, consistent names
await client.add_storage("prod_users", StorageBackend.POSTGRESQL, {...})
await client.add_storage("prod_orders", StorageBackend.POSTGRESQL, {...})
await client.add_storage("prod_events", StorageBackend.S3, {...})
# Avoid: Inconsistent or unclear names like "db1" or "bucket"
3. Document Backend Purpose
# epoch.toml
# Users database - primary transactional storage
[storage.postgres.users]
url = "postgresql://..."
# Orders database - order processing system
[storage.postgres.orders]
url = "postgresql://..."
# Event lake - analytics events from all services
[storage.s3.events]
bucket = "company-events"
4. Handle Partial Failures
Operations across multiple backends may partially fail:
try:
result = client.commit(message="Multi-backend update")
except PartialCommitError as e:
print(f"Committed to: {e.successful_backends}")
print(f"Failed on: {e.failed_backends}")
# Handle recovery
Troubleshooting
Backend Not Found
Error: Storage backend 'mydb' not found
- Verify backend is configured in config
- Check backend name spelling
Cross-Backend Transaction
Warning: Cross-backend operations are not atomic
- Horizon Epoch provides best-effort consistency
- Use single backend for strict atomicity requirements
Permission Mismatch
Error: Access denied on backend 'orders_db'
- Verify credentials for each backend
- Check each backend has required permissions
Next Steps
- Branch-Aware Queries - Query across backends
- Configuration Reference - Full options
- Architecture - Design details
Branch-Aware Queries
Query data from specific branches, compare between branches, and understand how branch context affects data access.
Overview
When you query data through Horizon Epoch:
- Each branch shows its own version of the data
- Changes on a branch don’t affect other branches
- You can query multiple branches simultaneously
Setting Branch Context
CLI
# Set current branch
epoch checkout feature/new-schema
# All subsequent operations use this branch
epoch query "SELECT * FROM users"
Python SDK
import asyncio
from horizon_epoch import Client
async def query_branch():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
# Checkout the branch
await client.checkout("feature/new-schema")
# Get status to see current branch
status = await client.status()
print(f"On branch: {status.branch}")
# Get data diff between branches
diff = await client.diff(
base="main",
target="feature/new-schema"
)
for table_diff in diff.table_diffs:
print(f"{table_diff.table_name}: {table_diff.total_changes}")
asyncio.run(query_branch())
How Branch Queries Work
Copy-on-Write Overlay
When you query a branch:
- Base data comes from the parent branch
- Branch-specific changes overlay the base
- You see the combined result
main: users = [Alice, Bob, Charlie]
│
feature: overlay = [Charlie → Charles]
│
query: result = [Alice, Bob, Charles]
Query Flow
-- On branch 'feature/update-names'
SELECT * FROM users WHERE id = 3;
-- Horizon Epoch executes:
-- 1. Check branch overlay for id=3
-- 2. If found, return overlay version
-- 3. If not found, return base version
Comparing Branches
Diff Query
Compare data between two branches:
# Get records that differ between branches
diff = client.diff_query(
table="users",
source="main",
target="feature/updates"
)
for change in diff:
print(f"{change.type}: {change.record}")
Side-by-Side Comparison
# Query both branches
main_data = client.query("SELECT * FROM users", branch="main")
feature_data = client.query("SELECT * FROM users", branch="feature/updates")
# Compare programmatically
import pandas as pd
main_df = pd.DataFrame(main_data)
feature_df = pd.DataFrame(feature_data)
comparison = main_df.compare(feature_df)
CLI Comparison
# Show diff
epoch diff main feature/updates --table users
# Show records only in one branch
epoch diff main feature/updates --only-in-target
Querying Multiple Branches
Union Across Branches
# Get all unique records across branches
results = client.query_branches(
query="SELECT * FROM users",
branches=["main", "feature/a", "feature/b"],
mode="union"
)
Compare Aggregate Results
# Run same query on multiple branches
for branch in ["main", "staging", "production"]:
result = client.query(
"SELECT COUNT(*) as user_count FROM users",
branch=branch
)
print(f"{branch}: {result[0]['user_count']} users")
Time-Travel Queries
Query data at a specific point in time:
By Commit ID
# Query at specific commit
client.query(
"SELECT * FROM users",
at_commit="abc123f"
)
By Tag
# Query at tagged version
client.query(
"SELECT * FROM users",
at_tag="v1.0.0"
)
By Timestamp
from datetime import datetime, timedelta
# Query data as of yesterday
yesterday = datetime.now() - timedelta(days=1)
client.query(
"SELECT * FROM users",
at_time=yesterday
)
CLI Time-Travel
# Query at commit
epoch query "SELECT * FROM users" --at abc123f
# Query at tag
epoch query "SELECT * FROM users" --at v1.0.0
# Query at time
epoch query "SELECT * FROM users" --at "2024-01-15 10:00:00"
Performance Considerations
Branch Depth
Deeply nested branches may be slower:
main → dev → feature → sub-feature → experiment
Each level adds overlay resolution. Keep branches shallow when possible.
Indexing
Branch overlays use primary keys for efficient lookup:
# Fast: uses primary key
client.query("SELECT * FROM users WHERE id = 123", branch="feature")
# Slower: scans overlay + base
client.query("SELECT * FROM users WHERE name = 'Alice'", branch="feature")
Caching
Query caching can be configured via the CLI:
epoch config set query_cache_enabled true
epoch config set query_cache_ttl 300
Branch Isolation Guarantees
Read Isolation
Each branch sees consistent data:
- Changes on other branches are invisible
- Commits create point-in-time snapshots
- Queries always return consistent results
Write Isolation
Changes on a branch don’t affect others until merged:
# On feature branch
await client.checkout("feature")
# Make changes via your database connection...
await client.stage_all()
await client.commit(message="Deactivate user 1")
# Main branch is unchanged - checkout and diff to compare
await client.checkout("main")
# Main still has the original data
Common Patterns
A/B Testing
# Set up test branches
await client.branch("experiment/new-algo", start_point="main")
# Run algorithm on each branch
for branch in ["main", "experiment/new-algo"]:
await client.checkout(branch)
run_algorithm()
results = measure_metrics()
print(f"{branch}: {results}")
Data Validation
# Validate changes before merge
await client.checkout("feature/migration")
# Use diff to check for issues
diff = await client.diff(base="main", target="feature/migration")
for table_diff in diff.table_diffs:
if table_diff.total_changes > 1000:
raise ValidationError(f"Too many changes in {table_diff.table_name}")
Environment Comparison
# Compare data across environments
for env in ["development", "staging", "production"]:
result = client.query(
"SELECT COUNT(*) as count, MAX(updated_at) as latest FROM orders",
branch=env
)
print(f"{env}: {result}")
Troubleshooting
Wrong Branch Data
Expected data from 'feature' but got 'main' data
- Verify branch context is set correctly
- Check if query specifies explicit branch
- Verify commit was made on correct branch
Missing Changes
Can't see changes I just made
- Changes must be committed to be visible in queries
- Uncommitted changes are in working state only
- Use
epoch statusto see uncommitted changes
Slow Queries
- Reduce branch nesting depth
- Ensure queries use primary keys when possible
- Consider materializing long-lived branches
Next Steps
- Copy-on-Write - How branch isolation works
- Merge Algorithm - Combining branch changes
- Time-Travel Queries - More on historical queries
How to: Schema Versioning
This guide covers schema versioning workflows including upgrading existing repositories, migrating schemas, handling rollbacks, and working with schemas across multiple branches.
Prerequisites
- Horizon Epoch installed (Installation Guide)
- PostgreSQL running with a repository configured
- Familiarity with basic version control concepts
Understanding Schema Versioning
Horizon Epoch versions database schemas alongside your data. Every commit captures:
- Column definitions: Name, type, nullability, defaults
- Constraints: Foreign keys, unique, check, exclusion
- Indexes: B-tree, GIN, GiST, BRIN, hash
- Enum types: Custom PostgreSQL enums
Schema snapshots are stored as JSONB in the table_versions table, enabling full schema history and three-way merge for schema changes.
Upgrading Existing Repositories
If you have an existing Horizon Epoch repository created before schema versioning was added, you can backfill schema information for historical commits.
Step 1: Check Current Status
# See which tables have schema data
epoch schema status
# Output:
# Table | Latest Schema | Historical | Live DB | Status
# -----------|---------------|------------|----------|--------
# users | Yes | Partial | Synced | OK
# orders | Yes | No | Synced | Needs backfill
# products | No | No | Drifted | Needs migration
Step 2: Backfill Historical Schemas
For tables where live database is accessible, backfill historical commits:
# Backfill schema for specific table
epoch migrate-schemas users --backfill
# Backfill all tables with missing schemas
epoch migrate-schemas --backfill --all
# Preview what would be backfilled (dry-run)
epoch migrate-schemas --backfill --all --dry-run
What happens during backfill:
- For each commit without schema data, Epoch checks if the table location is accessible
- If accessible, it introspects the current schema and stores it
- If not accessible, the commit is marked as “schema unavailable”
Step 3: Verify Migration
# Verify all tables have schema data
epoch schema status
# View schema at a specific commit
epoch schema show users --at abc123f
Migration Guide for Existing Repositories
Scenario: Adding Schema Versioning to Production
If you have a production repository and want to enable full schema versioning:
1. Create a Baseline
# Create a new commit that captures current schema
epoch commit -m "Capture schema baseline" --include-schema
This creates a commit with full schema snapshots for all registered tables.
2. Update Your Workflow
From this point forward, all commits automatically include schema snapshots.
# Normal workflow - schema is captured automatically
epoch branch create feature/add-column
# ... make schema changes ...
epoch commit -m "Add email_verified column"
3. Handle Historical Data (Optional)
If you need schema history for older commits:
# Option A: Backfill from live database (assumes schema hasn't changed)
epoch migrate-schemas --backfill --since "2024-01-01"
# Option B: Import from external migrations
epoch migrate-schemas --import-from flyway ./migrations
epoch migrate-schemas --import-from liquibase ./changelog.xml
Rollback Procedures
Rolling Back Schema Changes
If a schema change causes issues, you can roll back using several methods:
Method 1: Revert the Commit
Create a new commit that reverses the schema change:
# Revert the problematic commit
epoch revert abc123f
# This creates a new commit with:
# - Data changes reversed
# - Schema changes reversed
# - DDL generated to undo schema modifications
Method 2: Reset to Previous Commit
Reset the branch to a known-good state:
# Soft reset (keep changes for review)
epoch reset --soft HEAD~1
# Hard reset (discard changes)
epoch reset --hard HEAD~1
# Reset to specific commit
epoch reset --hard v1.0.0
After reset, apply the DDL to synchronize the live database:
# Generate rollback DDL
epoch schema diff HEAD v1.0.0 --show-ddl > rollback.sql
# Review and apply
psql -f rollback.sql
Method 3: Checkout and Apply
Create a new branch from a known-good state:
# Create branch from good state
epoch branch create hotfix/rollback --from v1.0.0
# Generate DDL to bring live DB to that state
epoch schema apply users --from v1.0.0 --dry-run
# Apply after review
epoch schema apply users --from v1.0.0
Handling Rollback Failures
Some schema changes cannot be automatically rolled back:
| Change Type | Rollback Difficulty | Notes |
|---|---|---|
| Add column | Easy | DROP COLUMN |
| Add index | Easy | DROP INDEX |
| Add constraint | Easy | DROP CONSTRAINT |
| Drop column | Data Loss | Column data is gone |
| Change column type | May fail | Data may not fit old type |
| Remove enum value | May fail | Rows may use removed value |
For destructive changes, consider:
# 1. Check for data dependencies
epoch schema diff HEAD~1 HEAD --check-data-impact
# 2. Create data backup before proceeding
epoch export users --at HEAD~1 > users_backup.parquet
# 3. Apply rollback with force flag
epoch schema apply users --from HEAD~1 --force
Multi-Branch Schema Workflows
Feature Branch Schema Changes
When developing schema changes on a feature branch:
# 1. Create feature branch
epoch branch create feature/add-preferences
# 2. Make schema changes
psql -c "ALTER TABLE users ADD COLUMN preferences JSONB"
# 3. Commit (schema captured automatically)
epoch commit -m "Add user preferences column"
# 4. Preview merge
epoch merge main --dry-run
# Shows: Schema changes to apply on main
# 5. Merge when ready
epoch merge main
Parallel Schema Development
When multiple branches modify the same table:
# Branch A adds column
epoch checkout feature-a
psql -c "ALTER TABLE users ADD COLUMN theme VARCHAR(20)"
epoch commit -m "Add theme column"
# Branch B adds different column
epoch checkout feature-b
psql -c "ALTER TABLE users ADD COLUMN locale VARCHAR(10)"
epoch commit -m "Add locale column"
# Merge A into main
epoch checkout main
epoch merge feature-a # Success
# Merge B into main
epoch merge feature-b # Also success - no conflict
Both columns are added because they don’t conflict.
Schema Conflict Resolution
When branches modify the same constraint:
# Branch A modifies check constraint
epoch checkout feature-a
psql -c "ALTER TABLE users DROP CONSTRAINT chk_age"
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 18)"
epoch commit -m "Require adult users"
# Branch B modifies same constraint differently
epoch checkout feature-b
psql -c "ALTER TABLE users DROP CONSTRAINT chk_age"
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 13)"
epoch commit -m "Allow teen users"
# Attempt merge
epoch checkout main
epoch merge feature-a # Success
epoch merge feature-b # CONFLICT
# Resolve conflict
epoch conflicts show
# CONFLICT (constraint): chk_age modified in both branches
# Choose resolution
epoch conflicts resolve --constraint chk_age --custom "CHECK (age >= 16)"
epoch commit -m "Merge feature-b with compromise age check"
Environment Promotion with Schema
When promoting schema changes between environments:
# 1. Develop on dev branch
epoch checkout dev
# ... make changes ...
epoch commit -m "Schema updates for v2.0"
# 2. Generate migration for staging
epoch schema diff staging dev --show-ddl > staging_migration.sql
# 3. Review migration
cat staging_migration.sql
# 4. Apply to staging environment
PGHOST=staging-db psql -f staging_migration.sql
# 5. Merge to staging branch
epoch checkout staging
epoch merge dev
# 6. Repeat for production
epoch schema diff production staging --show-ddl > prod_migration.sql
Best Practices
1. Always Review Schema Diffs
# Before merging, check schema changes
epoch diff main feature-branch --schema
# Generate DDL preview
epoch schema diff main feature-branch --show-ddl
2. Use Descriptive Commit Messages
# Bad
epoch commit -m "Update schema"
# Good
epoch commit -m "Add email_verified column with default false for GDPR compliance"
3. Test Migrations on Branch Data
# Create test branch
epoch branch create test/migration-validation
# Apply migration
epoch schema apply users --from feature-branch --dry-run
# Validate
epoch validate --table users
4. Handle Large Tables Carefully
# For large tables, use online mode
epoch schema apply users --from feature-branch --online
# Monitor progress
epoch schema apply --status
5. Version Your Enum Changes Carefully
# Adding values is safe
psql -c "ALTER TYPE status ADD VALUE 'archived'"
# Removing values requires migration
# See: docs/how-to/ddl-migrations.md
Troubleshooting
Schema Not Captured
If schema isn’t being captured in commits:
# Check table is registered
epoch table list
# Re-register with schema introspection
epoch table add users --location "postgresql://mydb/public.users" --introspect-schema
Schema Drift Detected
If live database differs from stored schema:
# See differences
epoch schema diff users --live
# Capture current state
epoch commit -m "Sync schema from live database" --rescan-schema
# Or apply stored schema to database
epoch schema apply users --from HEAD --force
Backfill Fails
If schema backfill fails:
# Check connectivity
epoch doctor
# Try specific table
epoch migrate-schemas users --backfill --verbose
# Skip inaccessible tables
epoch migrate-schemas --backfill --skip-inaccessible
See Also
- DDL Migrations - Generating and applying DDL
- Constraint Versioning Tutorial - Step-by-step guide
- CLI Schema Commands - Command reference
- Constraint Reference - Constraint type documentation
How to: DDL Migrations
This guide covers generating DDL migration scripts, integrating with external migration tools, and safely applying schema changes to production databases.
Prerequisites
- Horizon Epoch installed (Installation Guide)
- PostgreSQL database with schema versioning enabled
- Understanding of DDL and database migrations
Generating Migrations with PostgresDdlGenerator
Horizon Epoch includes a DDL generator that produces properly ordered PostgreSQL migration scripts.
Basic Migration Generation
# Generate migration between branches
epoch schema diff main feature-branch --show-ddl
# Output migration to file
epoch schema diff main feature-branch --ddl-output migrations/V002__add_preferences.sql
Migration Script Structure
Generated migrations include both UP and DOWN sections:
-- Migration: main -> feature/add-preferences
-- Generated: 2024-12-18T10:30:00Z
-- ============================================================
-- UP: Apply changes
-- ============================================================
-- Create new enum types first
CREATE TYPE notification_type AS ENUM ('email', 'sms', 'push');
-- Drop constraints that will be modified
ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;
-- Add new columns
ALTER TABLE users ADD COLUMN preferences JSONB;
ALTER TABLE users ADD COLUMN notification_pref notification_type DEFAULT 'email';
-- Add new constraints
ALTER TABLE users ADD CONSTRAINT chk_users_email
CHECK (email ~ '^[^@]+@[^@]+\.[^@]+$');
-- Create indexes
CREATE INDEX CONCURRENTLY idx_users_preferences
ON users USING gin (preferences);
-- ============================================================
-- DOWN: Reverse changes
-- ============================================================
DROP INDEX CONCURRENTLY IF EXISTS idx_users_preferences;
ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;
ALTER TABLE users DROP COLUMN IF EXISTS notification_pref;
ALTER TABLE users DROP COLUMN IF EXISTS preferences;
-- WARNING: Cannot remove enum values in PostgreSQL
-- Enum 'notification_type' preserved (manual cleanup required)
Statement Ordering
The DDL generator follows a specific order to handle dependencies:
UP (Apply) Order:
- Create new enum types (before columns using them)
- Add values to modified enums
- Drop constraints being modified/removed (FK first, then others)
- Drop indexes being modified/removed
- Handle primary key changes
- Add/modify columns
- Add new constraints (check, unique, exclusion, then FK last)
- Create new indexes
DOWN (Reverse) Order:
- Drop new indexes
- Drop new constraints (FK first)
- Drop new columns
- Re-add dropped columns (if data preserved)
- Re-add dropped constraints
- Re-add dropped indexes
DDL Safety Features
Dry-Run Mode
Preview changes without applying them:
# Show what would be executed
epoch schema apply users --from feature-branch --dry-run
# Output:
# Dry-run mode - no changes will be applied
#
# Statements to execute:
# 1. ALTER TABLE users ADD COLUMN preferences JSONB
# 2. CREATE INDEX CONCURRENTLY idx_users_preferences ON users USING gin (preferences)
#
# Estimated impact:
# Table size: 2.3 GB
# Row count: 15,000,000
# Recommended: Use --online mode for large table
Destructive Operation Warnings
The DDL generator identifies and warns about destructive operations:
epoch schema diff main feature-branch --show-ddl
# WARNING: Destructive operations detected:
# - DROP COLUMN: users.legacy_field
# - ALTER COLUMN TYPE: orders.amount (numeric -> integer, may lose precision)
# - DROP INDEX: idx_users_legacy
#
# Use --force to generate DDL anyway
Destructive Operation Types:
| Operation | Risk Level | Description |
|---|---|---|
| DROP COLUMN | High | Data permanently lost |
| DROP TABLE | Critical | Table and all data lost |
| ALTER COLUMN TYPE | Medium | May fail or lose precision |
| DROP CONSTRAINT | Low | Constraint removed |
| DROP INDEX | Low | Index removed (can rebuild) |
| DROP TYPE | Medium | Enum type removed |
| TRUNCATE TABLE | Critical | All data removed |
Lock Timeout Configuration
Prevent long-running locks on production tables:
# Set lock timeout (default: 30 seconds)
epoch schema apply users --from feature-branch --lock-timeout 10s
# Set statement timeout (default: 5 minutes)
epoch schema apply users --from feature-branch --statement-timeout 2m
Retry Logic
Configure automatic retries for transient failures:
# Retry up to 5 times with exponential backoff
epoch schema apply users --from feature-branch --retries 5
# Configure backoff
epoch schema apply users --from feature-branch \
--retries 5 \
--initial-delay 500ms \
--max-delay 30s
Integration with External Tools
pt-online-schema-change (Percona Toolkit for MySQL)
While Horizon Epoch primarily targets PostgreSQL, you can generate compatible scripts for MySQL migrations:
# Generate MySQL-compatible DDL
epoch schema diff main feature-branch --dialect mysql --ddl-output migration.sql
# Use with pt-online-schema-change
pt-online-schema-change \
--execute \
--alter "ADD COLUMN preferences JSON" \
D=mydb,t=users
Recommended workflow:
- Generate migration with Epoch
- Extract ALTER TABLE statements
- Apply using pt-online-schema-change for large tables
- Update Epoch commit after successful application
# After applying via pt-osc
epoch commit -m "Applied preferences column via pt-osc" --rescan-schema
pg_repack (PostgreSQL)
For table reorganization and bloat reduction:
# Identify tables needing repack
epoch schema status --check-bloat
# Output:
# Table | Size | Dead Tuples | Bloat % | Recommendation
# -----------|---------|-------------|---------|----------------
# users | 2.3 GB | 150,000 | 15% | Consider pg_repack
# orders | 8.1 GB | 2,500,000 | 28% | Recommended
# Run pg_repack
pg_repack -d mydb -t users
# Rescan schema after repack
epoch commit -m "Reorganized users table" --rescan-schema
Using pg_repack with index changes:
# Generate index DDL
epoch schema diff main feature-branch --indexes-only --ddl-output indexes.sql
# Create indexes with pg_repack (minimal locking)
pg_repack -d mydb -t users --only-indexes
# Or use CONCURRENTLY option in Epoch
epoch schema apply users --from feature-branch --online
Flyway Integration
Export migrations for Flyway:
# Generate Flyway-compatible migration
epoch schema diff main feature-branch \
--format flyway \
--ddl-output db/migration/V2__add_preferences.sql
# Migration file follows Flyway naming convention
cat db/migration/V2__add_preferences.sql
Liquibase Integration
Export migrations for Liquibase:
# Generate Liquibase changeset
epoch schema diff main feature-branch \
--format liquibase \
--ddl-output db/changelog/changes/002-add-preferences.xml
Example output:
<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog">
<changeSet id="002-add-preferences" author="epoch">
<addColumn tableName="users">
<column name="preferences" type="JSONB"/>
</addColumn>
<createIndex indexName="idx_users_preferences" tableName="users">
<column name="preferences" type="gin"/>
</createIndex>
<rollback>
<dropIndex indexName="idx_users_preferences" tableName="users"/>
<dropColumn tableName="users" columnName="preferences"/>
</rollback>
</changeSet>
</databaseChangeLog>
Online Mode for Production
Using CONCURRENTLY
For indexes on large tables, use online mode to avoid blocking:
# Create indexes without blocking writes
epoch schema apply users --from feature-branch --online
# This generates:
# CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
#
# Instead of:
# CREATE INDEX idx_users_email ON users (email);
Limitations of CONCURRENTLY
- Cannot run inside a transaction
- May take longer than regular index creation
- Requires more resources (CPU, I/O)
- Index must be validated after creation
# Check index validity after CONCURRENTLY creation
epoch schema status users --check-indexes
# Output:
# Index | Valid | Size | Last Used
# -------------------------|-------|--------|----------
# idx_users_email | Yes | 45 MB | 2 min ago
# idx_users_preferences | No | 0 MB | Never
If an index is invalid:
# Reindex with CONCURRENTLY
epoch schema apply users --reindex idx_users_preferences --online
Transactional DDL
PostgreSQL supports transactional DDL, allowing atomic migrations:
# Apply migration in a transaction (default)
epoch schema apply users --from feature-branch
# Disable transaction (required for CONCURRENTLY)
epoch schema apply users --from feature-branch --no-transaction --online
What happens on failure:
# With transaction (default):
# - All statements run in one transaction
# - On failure, all changes rolled back
# - Database remains unchanged
# Without transaction (--no-transaction):
# - Statements run individually
# - On failure, partial changes may remain
# - Manual cleanup may be needed
Table Size Considerations
Small Tables (< 100MB)
Standard migration is safe:
epoch schema apply users --from feature-branch
Medium Tables (100MB - 1GB)
Use lock timeouts:
epoch schema apply users --from feature-branch \
--lock-timeout 10s \
--retries 3
Large Tables (1GB - 10GB)
Use online mode:
epoch schema apply users --from feature-branch \
--online \
--statement-timeout 30m
Very Large Tables (> 10GB)
Consider specialized tools:
# 1. Generate DDL
epoch schema diff main feature-branch --ddl-output migration.sql
# 2. Review for large table operations
epoch schema diff main feature-branch --analyze-impact
# 3. Use pg_repack or manual process
pg_repack -d mydb -t users --no-superuser-check
Python SDK Usage
import asyncio
from horizon_epoch import Client
async def generate_migration():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
# Get constraint diff between branches
diff = await client.diff_constraints("main", "feature-branch", table_name="users")
# Generate migration
generator = PostgresDdlGenerator()
migration = generator.generate_migration("users", diff.constraints)
# Print statements
print("UP statements:")
for stmt in migration.up:
print(f" {stmt};")
print("\nDOWN statements:")
for stmt in migration.down:
print(f" {stmt};")
# Check for warnings
if migration.warnings:
print("\nWARNINGS:")
for warn in migration.warnings:
print(f" - {warn}")
# Write to file
migration.write_to_file("migrations/V002__update_users.sql")
# Execute with safety features
from horizon_epoch.ddl import DdlExecutor, ExecutionConfig
executor = DdlExecutor(client.connection)
config = ExecutionConfig(
dry_run=True,
lock_timeout_seconds=10,
statement_timeout_seconds=300,
online=True,
)
result = executor.execute(migration.up, config)
print(f"Executed {result.statements_run} statements")
Best Practices
1. Always Use Dry-Run First
epoch schema apply users --from feature-branch --dry-run
2. Backup Before Destructive Changes
# Export table data
epoch export users --format parquet > backup/users_$(date +%Y%m%d).parquet
# Then apply
epoch schema apply users --from feature-branch --force
3. Use Online Mode for Production
epoch schema apply users --from feature-branch --online
4. Set Appropriate Timeouts
epoch schema apply users --from feature-branch \
--lock-timeout 10s \
--statement-timeout 5m
5. Monitor Long-Running Migrations
# In another terminal
epoch schema apply --status
# Or query PostgreSQL directly
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE query LIKE 'ALTER%' OR query LIKE 'CREATE INDEX%';
See Also
- Schema Versioning - Schema versioning workflows
- Constraint Reference - Constraint types
- CLI Reference - Schema commands
Credential Providers
Horizon Epoch supports multiple methods for providing credentials to storage backends. This guide covers when to use each provider and how to configure them.
Overview
Credential providers abstract how secrets are retrieved:
- Static - Hardcoded credentials (dev/testing only)
- Environment - From environment variables
- File - From files on disk
- Vault - From HashiCorp Vault
- AWS - From AWS Secrets Manager or IAM
Static Credentials
Use for development and testing only.
from horizon_epoch import Config
config = Config(
metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
name="mydb",
url="postgresql://user:password@localhost/db"
)
# epoch.toml
[storage.postgres.mydb]
url = "postgresql://user:password@localhost/db"
Environment Variables
Recommended for containers and CI/CD.
export EPOCH_POSTGRES_MYDB_URL="postgresql://user:pass@host/db"
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
config.add_postgres(
name="mydb",
url="${EPOCH_POSTGRES_MYDB_URL}" # Variable substitution
)
# epoch.toml - variables are substituted at runtime
[storage.postgres.mydb]
url = "${EPOCH_POSTGRES_MYDB_URL}"
File-Based Credentials
For credentials managed by external tools (e.g., Kubernetes secrets).
config.add_postgres(
name="mydb",
credentials_file="/run/secrets/db-password"
)
# epoch.toml
[storage.postgres.mydb]
host = "db.example.com"
database = "production"
username = "app_user"
password_file = "/run/secrets/db-password"
File format (JSON):
{
"password": "secret-value"
}
Or plain text:
secret-value
Credential Caching
Credentials are cached to reduce external calls:
config = Config(
credential_cache_ttl=300, # 5 minutes
credential_cache_max_size=100
)
For frequently rotating credentials:
config = Config(
credential_cache_ttl=60, # 1 minute
credential_refresh_before_expiry=30 # Refresh 30s before expiry
)
Credential Refresh
Some providers support automatic refresh:
# AWS credentials auto-refresh when using IAM roles
config.add_s3(
name="datalake",
bucket="my-bucket",
use_instance_credentials=True # Auto-refresh enabled
)
Provider Priority
When multiple credential sources are available:
- Explicit configuration (URL, password)
- Environment variables
- Credential files
- External providers (Vault, AWS)
- Instance credentials (IAM roles)
Security Best Practices
1. Never Commit Credentials
# .gitignore
epoch.toml
.env
*.pem
*.key
2. Use Separate Credentials Per Environment
# epoch.dev.toml
[storage.postgres.mydb]
url = "${DEV_DATABASE_URL}"
# epoch.prod.toml
[storage.postgres.mydb]
url = "${PROD_DATABASE_URL}"
3. Rotate Credentials Regularly
Use providers that support rotation:
- AWS Secrets Manager with rotation
- Vault dynamic database credentials
- Short-lived IAM credentials
4. Audit Credential Access
Enable logging for credential operations:
config = Config(
log_credential_access=True, # Log when credentials are accessed
log_level="info"
)
5. Use Least Privilege
Create dedicated credentials for Horizon Epoch with only required permissions.
Troubleshooting
Credential Not Found
Error: Credential 'EPOCH_POSTGRES_MYDB_URL' not found
- Check environment variable is set
- Verify file path exists and is readable
- Check Vault/AWS connectivity
Credential Expired
Error: Credential expired and refresh failed
- Check IAM role permissions
- Verify Vault token is valid
- Check system clock sync (for time-based auth)
Permission Denied
Error: Permission denied reading credential file
- Check file permissions:
chmod 600 /path/to/credential - Verify process user has read access
Next Steps
- Vault Integration - HashiCorp Vault setup
- AWS Secrets - AWS credential management
- Configuration Reference - Full configuration options
HashiCorp Vault Integration
Integrate Horizon Epoch with HashiCorp Vault for secure credential management.
Overview
Vault integration supports:
- Static secrets (KV v2)
- Dynamic database credentials
- PKI certificates for mTLS
- Multiple authentication methods
Prerequisites
- HashiCorp Vault server (1.x or later)
- Vault CLI (for setup)
- Network access from Horizon Epoch to Vault
Authentication Methods
Token Authentication
Simplest method, suitable for development:
from horizon_epoch import Config
config = Config(
vault_addr="https://vault.example.com:8200",
vault_token="hvs.xxx..."
)
export VAULT_ADDR="https://vault.example.com:8200"
export VAULT_TOKEN="hvs.xxx..."
AppRole Authentication
Recommended for applications:
config = Config(
vault_addr="https://vault.example.com:8200",
vault_auth_method="approle",
vault_role_id="xxx-xxx-xxx",
vault_secret_id="${VAULT_SECRET_ID}" # From env or file
)
Setup in Vault:
# Enable AppRole
vault auth enable approle
# Create policy
vault policy write horizon-epoch - <<EOF
path "secret/data/horizon-epoch/*" {
capabilities = ["read"]
}
path "database/creds/horizon-epoch-*" {
capabilities = ["read"]
}
EOF
# Create role
vault write auth/approle/role/horizon-epoch \
token_policies="horizon-epoch" \
token_ttl=1h \
token_max_ttl=4h
Kubernetes Authentication
For Kubernetes deployments:
config = Config(
vault_addr="https://vault.example.com:8200",
vault_auth_method="kubernetes",
vault_role="horizon-epoch"
)
AWS IAM Authentication
For AWS environments:
config = Config(
vault_addr="https://vault.example.com:8200",
vault_auth_method="aws",
vault_role="horizon-epoch-ec2"
)
KV Secrets (Static)
Store database credentials in KV v2:
# Store secret
vault kv put secret/horizon-epoch/postgres \
username=epoch_user \
password=secret123
config.add_postgres(
name="mydb",
host="db.example.com",
database="production",
vault_path="secret/data/horizon-epoch/postgres"
)
Dynamic Database Credentials
Vault can generate short-lived database credentials:
# Configure database secrets engine
vault secrets enable database
vault write database/config/postgres \
plugin_name=postgresql-database-plugin \
allowed_roles="horizon-epoch-*" \
connection_url="postgresql://{{username}}:{{password}}@db:5432/production" \
username="vault" \
password="vault-password"
vault write database/roles/horizon-epoch-readonly \
db_name=postgres \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
default_ttl="1h" \
max_ttl="24h"
config.add_postgres(
name="mydb",
host="db.example.com",
database="production",
vault_role="database/creds/horizon-epoch-readonly"
)
PKI Certificates
For mTLS authentication:
# Enable PKI
vault secrets enable pki
# Configure PKI (simplified)
vault write pki/root/generate/internal \
common_name="Horizon Epoch CA"
vault write pki/roles/horizon-epoch-client \
allowed_domains="epoch.internal" \
allow_subdomains=true \
max_ttl="720h"
config.add_postgres(
name="mydb",
host="db.example.com",
vault_pki_role="pki/issue/horizon-epoch-client",
sslmode="verify-full"
)
Configuration File
# epoch.toml
[vault]
addr = "https://vault.example.com:8200"
auth_method = "approle"
role_id = "xxx-xxx-xxx"
secret_id_file = "/run/secrets/vault-secret-id"
[storage.postgres.production]
host = "db.example.com"
database = "production"
vault_path = "secret/data/horizon-epoch/postgres"
[storage.postgres.readonly]
host = "db.example.com"
database = "production"
vault_role = "database/creds/horizon-epoch-readonly"
Troubleshooting
Permission Denied
Error: permission denied on path "secret/data/..."
- Check policy is attached to token/role
- Verify path is correct (KV v2 uses
secret/data/prefix) - Check Vault audit log for details
Token Expired
Error: Vault token expired
- Token TTL may be too short
- Enable token renewal in config
- Use AppRole for automatic re-authentication
Connection Failed
Error: failed to connect to Vault
- Verify VAULT_ADDR is correct
- Check network connectivity
- Verify TLS certificates if using HTTPS
Certificate Issues
Error: x509: certificate signed by unknown authority
config = Config(
vault_addr="https://vault.example.com:8200",
vault_ca_cert="/path/to/ca.crt"
)
Best Practices
- Use AppRole or Kubernetes auth in production
- Enable audit logging in Vault
- Use dynamic credentials when possible
- Set appropriate TTLs - short enough for security, long enough for operations
- Rotate static secrets regularly
Next Steps
- AWS Secrets - Alternative cloud-native solution
- mTLS Authentication - Certificate-based auth
- Credential Providers - Overview of all providers
AWS Secrets Integration
Integrate Horizon Epoch with AWS Secrets Manager, IAM authentication, and other AWS credential services.
AWS Secrets Manager
Basic Configuration
from horizon_epoch import Config
config = Config(
aws_region="us-east-1"
).add_postgres(
name="production",
host="mydb.cluster-xxx.us-east-1.rds.amazonaws.com",
database="production",
aws_secret_id="horizon-epoch/production/postgres"
)
Secret Format
Store credentials in Secrets Manager:
{
"username": "epoch_user",
"password": "secret-password",
"host": "mydb.cluster-xxx.us-east-1.rds.amazonaws.com",
"port": 5432,
"database": "production"
}
aws secretsmanager create-secret \
--name horizon-epoch/production/postgres \
--secret-string '{"username":"epoch_user","password":"secret-password"}'
Secret Rotation
Enable automatic rotation:
aws secretsmanager rotate-secret \
--secret-id horizon-epoch/production/postgres \
--rotation-lambda-arn arn:aws:lambda:us-east-1:xxx:function:SecretsRotation \
--rotation-rules AutomaticallyAfterDays=30
Horizon Epoch automatically uses refreshed credentials.
RDS IAM Authentication
Connect to RDS without passwords using IAM:
Prerequisites
- Enable IAM authentication on RDS:
aws rds modify-db-instance \
--db-instance-identifier mydb \
--enable-iam-database-authentication
- Create IAM policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "rds-db:connect",
"Resource": "arn:aws:rds-db:us-east-1:123456789012:dbuser:mydb/epoch_user"
}
]
}
- Create database user:
CREATE USER epoch_user WITH LOGIN;
GRANT rds_iam TO epoch_user;
Configuration
config.add_postgres(
name="production",
host="mydb.cluster-xxx.us-east-1.rds.amazonaws.com",
database="production",
username="epoch_user",
use_iam_auth=True
)
IAM Roles for S3
Instance Credentials (EC2/ECS/Lambda)
config.add_s3(
name="datalake",
bucket="my-data-bucket",
use_instance_credentials=True # Uses instance profile
)
Assume Role
For cross-account access:
config.add_s3(
name="partner-data",
bucket="partner-bucket",
assume_role_arn="arn:aws:iam::999888777666:role/HorizonEpochAccess",
external_id="partner-external-id" # If required
)
Role Chaining
For complex permission structures:
config.add_s3(
name="restricted",
bucket="restricted-bucket",
assume_roles=[
"arn:aws:iam::111222333444:role/JumpRole",
"arn:aws:iam::555666777888:role/TargetRole"
]
)
AWS SSO / Identity Center
For development with SSO:
# Configure SSO profile
aws configure sso
# Set profile
export AWS_PROFILE=my-sso-profile
# Or in config
config = Config(
aws_profile="my-sso-profile"
)
Configuration File
# epoch.toml
[aws]
region = "us-east-1"
profile = "production" # Optional: use specific profile
[storage.postgres.production]
host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com"
database = "production"
aws_secret_id = "horizon-epoch/production/postgres"
[storage.postgres.analytics]
host = "analytics.cluster-xxx.us-east-1.rds.amazonaws.com"
database = "analytics"
username = "epoch_user"
use_iam_auth = true
[storage.s3.datalake]
bucket = "company-datalake"
assume_role_arn = "arn:aws:iam::123456789012:role/DataLakeAccess"
Required IAM Permissions
Secrets Manager Access
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret"
],
"Resource": "arn:aws:secretsmanager:us-east-1:*:secret:horizon-epoch/*"
}
]
}
S3 Access
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
]
}
]
}
Assume Role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::*:role/HorizonEpoch*"
}
]
}
Troubleshooting
Access Denied (Secrets Manager)
Error: AccessDeniedException when calling GetSecretValue
- Check IAM policy allows
secretsmanager:GetSecretValue - Verify secret ARN matches policy
- Check if secret has resource-based policy
IAM Auth Token Failed
Error: PAM authentication failed for user
- Verify IAM authentication is enabled on RDS
- Check user has
rds_iamrole granted - Verify IAM policy has correct resource ARN
Credentials Expired
Error: ExpiredTokenException
- Instance credentials refresh automatically
- For assumed roles, check session duration
- For SSO, run
aws sso login
Best Practices
- Use IAM authentication over passwords when possible
- Enable secret rotation in Secrets Manager
- Use instance roles instead of access keys
- Scope permissions narrowly
- Enable CloudTrail for audit logging
Next Steps
- Vault Integration - Alternative secrets management
- Multi-Backend Setup - Multiple AWS accounts
- Configuration Reference - Full options
SSH Tunnels
Connect to databases through SSH tunnels for secure access to private networks.
When to Use SSH Tunnels
- Database is in a private network (no direct access)
- Connecting through a bastion/jump host
- Security policy requires encrypted connections
- Accessing on-premises databases from cloud
Basic SSH Tunnel
from horizon_epoch import Config
config = Config(
metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
name="production",
host="internal-db.private", # Internal hostname
database="production",
username="epoch_user",
password="secret",
ssh_tunnel={
"host": "bastion.example.com",
"user": "ubuntu",
"key_file": "~/.ssh/id_rsa"
}
)
Configuration Options
SSH Key Authentication
ssh_tunnel={
"host": "bastion.example.com",
"port": 22, # Default
"user": "ubuntu",
"key_file": "~/.ssh/id_rsa",
"key_passphrase": "${SSH_KEY_PASSPHRASE}" # If key is encrypted
}
SSH Agent
Use keys loaded in SSH agent:
ssh_tunnel={
"host": "bastion.example.com",
"user": "ubuntu",
"use_agent": True
}
SSH Certificate Authentication
ssh_tunnel={
"host": "bastion.example.com",
"user": "ubuntu",
"key_file": "~/.ssh/id_rsa",
"cert_file": "~/.ssh/id_rsa-cert.pub"
}
Jump Hosts (ProxyJump)
For multi-hop connections:
ssh_tunnel={
"host": "internal-bastion.private",
"user": "ubuntu",
"key_file": "~/.ssh/id_rsa",
"jump_host": {
"host": "external-bastion.example.com",
"user": "ubuntu",
"key_file": "~/.ssh/id_rsa"
}
}
CLI Configuration
# Environment variables
export EPOCH_SSH_HOST="bastion.example.com"
export EPOCH_SSH_USER="ubuntu"
export EPOCH_SSH_KEY_FILE="~/.ssh/id_rsa"
# epoch.toml
[storage.postgres.production]
host = "internal-db.private"
database = "production"
username = "epoch_user"
password_file = "/run/secrets/db-password"
[storage.postgres.production.ssh_tunnel]
host = "bastion.example.com"
user = "ubuntu"
key_file = "~/.ssh/id_rsa"
Host Key Verification
Strict (Recommended)
ssh_tunnel={
"host": "bastion.example.com",
"user": "ubuntu",
"key_file": "~/.ssh/id_rsa",
"known_hosts_file": "~/.ssh/known_hosts",
"strict_host_key_checking": True
}
Accept on First Use
ssh_tunnel={
"host": "bastion.example.com",
"user": "ubuntu",
"key_file": "~/.ssh/id_rsa",
"strict_host_key_checking": "accept-new"
}
Disable (Not Recommended)
# Only for testing/development
ssh_tunnel={
"host": "bastion.example.com",
"user": "ubuntu",
"key_file": "~/.ssh/id_rsa",
"strict_host_key_checking": False
}
Connection Management
Keep-Alive
ssh_tunnel={
"host": "bastion.example.com",
"user": "ubuntu",
"key_file": "~/.ssh/id_rsa",
"keepalive_interval": 60, # Send keepalive every 60 seconds
"keepalive_count_max": 3 # Disconnect after 3 missed keepalives
}
Auto-Reconnection
ssh_tunnel={
"host": "bastion.example.com",
"user": "ubuntu",
"key_file": "~/.ssh/id_rsa",
"auto_reconnect": True,
"reconnect_delay": 5, # Wait 5 seconds between attempts
"max_reconnect_attempts": 3
}
Connection Timeout
ssh_tunnel={
"host": "bastion.example.com",
"user": "ubuntu",
"key_file": "~/.ssh/id_rsa",
"connect_timeout": 30 # 30 second timeout
}
Multiple Tunnels
For accessing multiple databases:
config.add_postgres(
name="prod_users",
host="users-db.private",
database="users",
ssh_tunnel=bastion_config
).add_postgres(
name="prod_orders",
host="orders-db.private",
database="orders",
ssh_tunnel=bastion_config # Same tunnel reused
)
Troubleshooting
Connection Refused
Error: ssh: connect to host bastion.example.com port 22: Connection refused
- Verify bastion host is running
- Check security groups/firewall allow port 22
- Verify DNS resolution
Authentication Failed
Error: ssh: handshake failed: ssh: unable to authenticate
- Verify SSH key is correct
- Check key has correct permissions:
chmod 600 ~/.ssh/id_rsa - Verify user exists on bastion
- Try connecting manually:
ssh -i ~/.ssh/id_rsa ubuntu@bastion.example.com
Host Key Verification Failed
Error: Host key verification failed
- Add host to known_hosts:
ssh-keyscan bastion.example.com >> ~/.ssh/known_hosts - Or use
strict_host_key_checking: "accept-new"
Tunnel Connection Lost
Error: SSH tunnel disconnected
- Enable keepalive settings
- Enable auto-reconnect
- Check bastion stability
- Review SSH server
ClientAliveIntervalsetting
Slow Connection
- Use connection pooling (tunnel is reused)
- Consider running Horizon Epoch closer to the bastion
- Check network latency between client and bastion
Security Best Practices
- Use key-based authentication - Never use passwords
- Protect private keys - Use encrypted keys, proper permissions
- Verify host keys - Enable strict checking in production
- Limit bastion access - Restrict who can SSH to bastion
- Audit connections - Enable logging on bastion host
- Rotate keys regularly - Use short-lived certificates when possible
Next Steps
- mTLS Authentication - Certificate-based database auth
- Vault Integration - SSH key management with Vault
- Configuration Reference - Full options
mTLS Authentication
Configure mutual TLS (client certificate) authentication for database connections.
Overview
mTLS provides:
- Strong authentication via X.509 certificates
- Encrypted connections (TLS)
- No password management required
- Certificate-based identity
Prerequisites
- TLS-enabled database server
- Client certificate and key
- CA certificate (for verification)
Basic Configuration
from horizon_epoch import Config
config = Config(
metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
name="production",
host="db.example.com",
database="production",
username="epoch_user",
sslmode="verify-full",
ssl_cert="/path/to/client.crt",
ssl_key="/path/to/client.key",
ssl_rootcert="/path/to/ca.crt"
)
TLS Modes
| Mode | Server Cert | Client Cert | Description |
|---|---|---|---|
disable | No | No | Unencrypted connection |
require | No | No | Encrypted, no verification |
verify-ca | Yes | Optional | Verify server cert against CA |
verify-full | Yes | Optional | Verify cert + hostname |
For mTLS, use verify-full with client certificates.
Certificate Files
Client Certificate
PEM-encoded X.509 certificate:
-----BEGIN CERTIFICATE-----
MIIDXTCCAkWgAwIBAgIJAJC1HiIAZAiU...
-----END CERTIFICATE-----
Client Key
PEM-encoded private key:
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA7Zq7k...
-----END RSA PRIVATE KEY-----
Or encrypted key:
-----BEGIN ENCRYPTED PRIVATE KEY-----
MIIFHzBJBgkqhkiG9w0BBQ0wPDA...
-----END ENCRYPTED PRIVATE KEY-----
CA Certificate
For verifying server certificate:
-----BEGIN CERTIFICATE-----
MIIDrzCCApegAwIBAgIQCDvg...
-----END CERTIFICATE-----
Configuration File
# epoch.toml
[storage.postgres.production]
host = "db.example.com"
database = "production"
username = "epoch_user"
sslmode = "verify-full"
ssl_cert = "/etc/epoch/certs/client.crt"
ssl_key = "/etc/epoch/certs/client.key"
ssl_rootcert = "/etc/epoch/certs/ca.crt"
Using Vault PKI
Dynamically issue certificates from Vault:
config = Config(
vault_addr="https://vault.example.com:8200",
vault_token="${VAULT_TOKEN}"
).add_postgres(
name="production",
host="db.example.com",
database="production",
username="epoch_user",
sslmode="verify-full",
vault_pki_role="pki/issue/epoch-client",
ssl_rootcert="/etc/epoch/certs/ca.crt"
)
Certificates are automatically renewed before expiry.
PostgreSQL Server Setup
Enable SSL on PostgreSQL server:
# postgresql.conf
ssl = on
ssl_cert_file = '/var/lib/postgresql/server.crt'
ssl_key_file = '/var/lib/postgresql/server.key'
ssl_ca_file = '/var/lib/postgresql/ca.crt'
Configure client certificate authentication:
# pg_hba.conf
# TYPE DATABASE USER ADDRESS METHOD
hostssl all epoch_user 0.0.0.0/0 cert clientcert=verify-full
Create user with certificate CN:
CREATE USER epoch_user;
-- The CN in the client cert must match the username
Encrypted Private Keys
If your key is password-protected:
config.add_postgres(
name="production",
host="db.example.com",
ssl_key="/path/to/encrypted.key",
ssl_key_password="${SSL_KEY_PASSWORD}"
)
Certificate Rotation
Manual Rotation
- Generate new certificate
- Update configuration
- Restart/reload application
Automatic with Vault
Certificates are automatically rotated:
config = Config(
vault_pki_role="pki/issue/epoch-client",
cert_renewal_threshold=0.7 # Renew when 70% of TTL elapsed
)
Troubleshooting
Certificate Expired
Error: SSL error: certificate has expired
- Check certificate expiry:
openssl x509 -enddate -noout -in client.crt - Renew certificate
- If using Vault, check renewal is working
Certificate Verification Failed
Error: SSL error: certificate verify failed
- Verify CA certificate is correct
- Check certificate chain is complete
- Verify server hostname matches certificate
Permission Denied on Key File
Error: could not load private key file
- Check file permissions:
chmod 600 client.key - Verify file is readable by process user
Key Doesn’t Match Certificate
Error: key values mismatch
- Verify key matches certificate:
openssl x509 -noout -modulus -in client.crt | md5sum openssl rsa -noout -modulus -in client.key | md5sum
Wrong Password for Encrypted Key
Error: bad decrypt
- Verify password is correct
- Check key file format (PKCS#8 vs traditional)
Security Best Practices
- Protect private keys - Use encrypted keys, restrict file permissions
- Use short-lived certificates - Rotate frequently
- Verify server certificates - Always use
verify-full - Audit certificate usage - Log certificate fingerprints
- Use separate certificates - Don’t share between environments
CLI Usage
# Using environment variables
export EPOCH_SSL_CERT="/path/to/client.crt"
export EPOCH_SSL_KEY="/path/to/client.key"
export EPOCH_SSL_ROOTCERT="/path/to/ca.crt"
epoch init my-repo \
--metadata-url "postgresql://user@db.example.com/horizon_epoch?sslmode=verify-full"
Next Steps
- Vault Integration - Dynamic certificate issuance
- SSH Tunnels - Alternative secure connection
- Credential Providers - Overview of auth methods
CLI Reference
Complete reference for the epoch command-line interface.
Global Options
| Option | Description |
|---|---|
-C, --directory <PATH> | Run as if started in PATH |
--format <FORMAT> | Output format: text, json |
-q, --quiet | Suppress non-essential output |
-v, --verbose | Enable verbose output |
--config <FILE> | Path to config file |
--version | Show version |
--help | Show help |
Commands
epoch init
Initialize a new repository.
epoch init <NAME> [OPTIONS]
Arguments:
<NAME>- Repository name (required)
Options:
-d, --description <DESC>- Repository description-b, --default-branch <BRANCH>- Default branch name (default: main)--metadata-url <URL>- Metadata database URL
Examples:
epoch init my-repo
epoch init my-repo --description "Production data"
epoch init my-repo --metadata-url "postgresql://localhost/horizon_epoch"
epoch branch
Manage branches.
epoch branch [SUBCOMMAND]
Subcommands:
epoch branch list
List all branches.
epoch branch list
epoch branch list --format json
epoch branch create
Create a new branch.
epoch branch create <NAME> [--from <SOURCE>]
Options:
--from <SOURCE>- Source branch (default: current branch)
Examples:
epoch branch create feature-x
epoch branch create hotfix --from production
epoch branch delete
Delete a branch.
epoch branch delete <NAME> [--force]
Options:
-f, --force- Force delete even if unmerged
epoch checkout
Switch to a branch.
epoch checkout <BRANCH>
Examples:
epoch checkout main
epoch checkout feature/new-schema
epoch commit
Create a new commit.
epoch commit -m <MESSAGE> [OPTIONS]
Options:
-m, --message <MSG>- Commit message (required)-a, --all- Commit all changed tables--tables <TABLES>- Comma-separated list of tables to commit--author <NAME>- Override author name--email <EMAIL>- Override author email
Examples:
epoch commit -m "Add user preferences"
epoch commit -m "Update orders" --tables orders,order_items
epoch merge
Merge a branch into current branch.
epoch merge <BRANCH> [OPTIONS]
Options:
--strategy <STRATEGY>- Merge strategy:default,ours,theirs--no-commit- Merge but don’t commit--message <MSG>- Custom merge commit message--continue- Continue after conflict resolution--abort- Abort in-progress merge
Examples:
epoch merge feature/updates
epoch merge feature/updates --strategy theirs
epoch merge --continue
epoch merge --abort
epoch diff
Show differences between branches or commits.
epoch diff [SOURCE] [TARGET] [OPTIONS]
Arguments:
<SOURCE>- Source branch/commit (default: HEAD)<TARGET>- Target branch/commit
Options:
--table <TABLE>- Show diff for specific table--stat- Show statistics only--name-only- Show only changed table names
Examples:
epoch diff # Uncommitted changes
epoch diff main # Current branch vs main
epoch diff main feature/updates # Between two branches
epoch diff abc123f def456g # Between commits
epoch diff --stat # Summary only
epoch log
Show commit history.
epoch log [OPTIONS]
Options:
-n, --max-count <N>- Limit output to N commits--oneline- One line per commit--since <DATE>- Show commits since date--until <DATE>- Show commits until date--author <PATTERN>- Filter by author--table <TABLE>- Show commits affecting table
Examples:
epoch log
epoch log --oneline -10
epoch log --since "2024-01-01"
epoch log --table users
epoch status
Show working tree status.
epoch status [OPTIONS]
Options:
--short- Short format--porcelain- Machine-readable format
Examples:
epoch status
epoch status --short
epoch table
Manage tables.
epoch table [SUBCOMMAND]
epoch table add
Register a table for tracking.
epoch table add <NAME> --location <LOCATION> [OPTIONS]
Options:
--location <URL>- Table location (required)--primary-key <COLS>- Primary key columns--storage <NAME>- Storage backend name (alternative to location URL)
Location Formats by Backend:
| Backend | Format | Example |
|---|---|---|
| PostgreSQL | postgresql://[user:pass@]host[:port]/database/schema.table | postgresql://localhost/mydb/public.users |
| MySQL | mysql://[user:pass@]host[:port]/database/table | mysql://localhost/analytics/events |
| SQL Server | mssql://[user:pass@]host[:port]/database/schema.table | mssql://sqlserver/warehouse/dbo.orders |
| SQLite | sqlite:///path/to/file.db/table | sqlite:///data/local.db/users |
| S3 | s3://bucket/prefix/path | s3://datalake/delta/events |
| Azure Blob | azure://account/container/prefix | azure://myaccount/data/tables/users |
| GCS | gs://bucket/prefix/path | gs://my-bucket/delta/products |
| Local | file:///path/to/directory | file:///data/parquet/users |
Examples:
# PostgreSQL
epoch table add users --location "postgresql://localhost/mydb/public.users" \
--primary-key id
# MySQL
epoch table add orders --location "mysql://mysql-host/shop/orders" \
--primary-key order_id
# SQL Server
epoch table add inventory --location "mssql://sqlserver/warehouse/dbo.inventory" \
--primary-key sku
# SQLite
epoch table add config --location "sqlite:///data/app.db/config" \
--primary-key key
# S3/Delta Lake
epoch table add events --location "s3://datalake/delta/events" \
--primary-key event_id
# Azure Blob
epoch table add telemetry --location "azure://myaccount/data/telemetry" \
--primary-key id
# GCS
epoch table add products --location "gs://my-bucket/delta/products" \
--primary-key product_id
# Local filesystem
epoch table add local_data --location "file:///data/parquet/sales" \
--primary-key transaction_id
# Using storage backend name (when storage is already registered)
epoch table add users --storage prod-pg --primary-key id
epoch table list
List registered tables.
epoch table list [--format json]
epoch table remove
Unregister a table.
epoch table remove <NAME> [--force]
epoch tag
Manage tags.
epoch tag [SUBCOMMAND]
epoch tag create
Create a tag.
epoch tag <NAME> [COMMIT] [OPTIONS]
Options:
-m, --message <MSG>- Tag message
Examples:
epoch tag v1.0.0
epoch tag v1.0.0 abc123f -m "Release 1.0.0"
epoch tag list
List all tags.
epoch tag list
epoch tag delete
Delete a tag.
epoch tag delete <NAME>
epoch reset
Reset current branch to a commit.
epoch reset [OPTIONS] <COMMIT>
Options:
--soft- Keep changes staged--hard- Discard all changes
Examples:
epoch reset --soft HEAD~1
epoch reset --hard v1.0.0
epoch revert
Create a commit that undoes a previous commit.
epoch revert <COMMIT> [OPTIONS]
Options:
--no-commit- Revert but don’t commit
Examples:
epoch revert abc123f
epoch revert HEAD
epoch stash
Stash uncommitted changes.
epoch stash [SUBCOMMAND]
epoch stash push
Save changes to stash.
epoch stash push [-m <MESSAGE>]
epoch stash pop
Apply and remove latest stash.
epoch stash pop [<STASH>]
epoch stash list
List stashes.
epoch stash list
epoch storage
Manage storage backends.
epoch storage [SUBCOMMAND]
epoch storage add
Register a new storage backend.
epoch storage add <NAME> --type <TYPE> [OPTIONS]
Arguments:
<NAME>- Unique name for the storage backend (required)
Options:
--type <TYPE>- Backend type (required):postgresql,mysql,mssql,sqlite,s3,azureblob,gcs,local--description <DESC>- Optional description--set-default- Set as the default storage backend
PostgreSQL/MySQL/MSSQL Options:
--host <HOST>- Database host (required)--port <PORT>- Database port (PostgreSQL: 5432, MySQL: 3306, MSSQL: 1433)--database <DB>- Database name (required)--schema <SCHEMA>- Schema name (PostgreSQL only)--username <USER>- Username--ssl-mode <MODE>- SSL mode
SQLite Options:
--database <PATH>- Database file path (required)
S3 Options:
--bucket <BUCKET>- Bucket name (required)--region <REGION>- AWS region--endpoint <URL>- Custom endpoint (for MinIO, etc.)--path-prefix <PREFIX>- Key prefix--path-style- Use path-style URLs
Azure Blob Options:
--account <ACCOUNT>- Storage account name (required)--container <CONTAINER>- Container name (required)--path-prefix <PREFIX>- Blob prefix
GCS Options:
--bucket <BUCKET>- Bucket name (required)--project <PROJECT>- GCP project ID--path-prefix <PREFIX>- Object prefix
Local Filesystem Options:
--base-path <PATH>- Base directory path (required)
Credential Options:
--credentials <TYPE>- Credential resolution:environment,aws-profile,gcp-service-account,azure-managed-identity,vault--profile <NAME>- AWS profile name (withaws-profile)--key-file <PATH>- GCP service account key file (withgcp-service-account)--client-id <ID>- Azure client ID (withazure-managed-identity)--vault-path <PATH>- Vault secret path (withvault)--vault-role <ROLE>- Vault role name (withvault)
Examples:
# PostgreSQL
epoch storage add prod-pg --type postgresql \
--host db.example.com --database production
# MySQL
epoch storage add analytics-mysql --type mysql \
--host mysql.example.com --port 3306 --database analytics
# SQL Server
epoch storage add warehouse-mssql --type mssql \
--host sqlserver.example.com --database warehouse
# SQLite
epoch storage add local-sqlite --type sqlite --database ./data/local.db
# S3
epoch storage add datalake --type s3 \
--bucket company-datalake --region us-west-2
# S3-compatible (MinIO)
epoch storage add dev-minio --type s3 \
--bucket local-data --endpoint http://localhost:9000 --path-style
# Azure Blob
epoch storage add azure-store --type azureblob \
--account myaccount --container data
# GCS
epoch storage add gcs-store --type gcs \
--bucket my-bucket --project my-project
# Local Filesystem
epoch storage add local-fs --type local --base-path /data/epoch
# With credentials
epoch storage add prod-s3 --type s3 --bucket prod-data \
--credentials aws-profile --profile production
epoch storage list
List configured storage backends.
epoch storage list [--type <TYPE>] [--format json]
Options:
--type <TYPE>- Filter by backend type
epoch storage show
Show detailed information about a storage backend.
epoch storage show <NAME>
epoch storage update
Update a storage backend configuration.
epoch storage update <NAME> [OPTIONS]
Options:
--description <DESC>- Update description--disable- Disable the backend--enable- Enable the backend
epoch storage remove
Remove a storage backend.
epoch storage remove <NAME> [--force]
epoch storage default
Manage the default storage backend.
epoch storage default set <NAME> # Set default
epoch storage default show # Show current default
epoch storage default clear # Clear default
epoch storage test-credentials
Test credential provider for a storage backend.
epoch storage test-credentials <NAME>
epoch storage test-credentials --all
epoch storage credentials
Show credential status for storage backends.
epoch storage credentials <NAME>
epoch storage credentials --all
epoch storage tunnels
Show SSH tunnel status for storage backends.
epoch storage tunnels [NAME]
Schema Commands
epoch schema show
Display schema for a table at a specific commit or branch.
epoch schema show <TABLE> [OPTIONS]
Arguments:
<TABLE>- Table name (required)
Options:
--at <REF>- Show schema at specific commit, branch, or tag--live- Show live database schema (not stored snapshot)--generate-ddl- Output as PostgreSQL DDL--format <FORMAT>- Output format:text,json
Examples:
# Show current schema
epoch schema show users
# Show schema at specific commit
epoch schema show users --at abc123f
# Show schema on a branch
epoch schema show users --at feature/new-fields
# Compare with live database
epoch schema show users --live
# Generate DDL from schema
epoch schema show users --generate-ddl
epoch schema diff
Compare schemas between two refs and optionally generate migration DDL.
epoch schema diff <TABLE> <SOURCE> <TARGET> [OPTIONS]
Arguments:
<TABLE>- Table name (required)<SOURCE>- Source ref (branch, commit, or tag)<TARGET>- Target ref (branch, commit, or tag)
Options:
--show-ddl- Show DDL migration statements--ddl-output <FILE>- Write DDL to file--format <FORMAT>- Output format:text,json,flyway,liquibase--columns-only- Show only column changes--constraints-only- Show only constraint changes--indexes-only- Show only index changes
Examples:
# Compare schemas between branches
epoch schema diff users main feature/updates
# Generate migration DDL
epoch schema diff users main feature/updates --show-ddl
# Write DDL to file
epoch schema diff users main feature/updates --ddl-output migrations/V2.sql
# Generate Flyway-compatible migration
epoch schema diff users main feature/updates --format flyway --ddl-output db/migration/V2__update_users.sql
epoch schema apply
Apply schema changes from a ref to the live database.
epoch schema apply <TABLE> --from <REF> [OPTIONS]
Arguments:
<TABLE>- Table name (required)
Options:
--from <REF>- Source ref with desired schema (required)--dry-run- Show what would be executed without applying--force- Allow destructive operations--online- Use CONCURRENTLY for index operations--lock-timeout <DURATION>- Lock timeout (default: 30s)--statement-timeout <DURATION>- Statement timeout (default: 5m)--retries <N>- Number of retries on failure (default: 3)--no-transaction- Run statements individually (required for CONCURRENTLY)
Examples:
# Preview changes (dry-run)
epoch schema apply users --from feature/updates --dry-run
# Apply schema changes
epoch schema apply users --from feature/updates
# Apply with online mode for large tables
epoch schema apply users --from feature/updates --online
# Apply with custom timeouts
epoch schema apply users --from feature/updates \
--lock-timeout 10s \
--statement-timeout 10m
# Force destructive changes (after backup)
epoch schema apply users --from feature/updates --force
epoch schema status
Show schema drift status for tables.
epoch schema status [TABLES...] [OPTIONS]
Arguments:
[TABLES...]- Specific tables to check (optional, all if omitted)
Options:
--check-bloat- Also check table bloat--check-indexes- Check index validity--format <FORMAT>- Output format:text,json
Examples:
# Check all tables
epoch schema status
# Check specific tables
epoch schema status users orders
# Check with bloat analysis
epoch schema status --check-bloat
# JSON output for automation
epoch schema status --format json
Output:
Table | Latest Schema | Historical | Live DB | Status
-----------|---------------|------------|----------|--------
users | Yes | Yes | Synced | OK
orders | Yes | Partial | Synced | OK
products | Yes | No | Drifted | Needs sync
events | No | No | N/A | S3 table
epoch migrate-schemas
Backfill schema data for existing commits.
epoch migrate-schemas [OPTIONS]
Options:
--backfill- Backfill missing schemas from live database--all- Process all tables--table <TABLE>- Process specific table--since <DATE>- Only backfill commits since date--skip-inaccessible- Skip tables that can’t be introspected--import-from <TOOL>- Import from external tool (flyway, liquibase)--dry-run- Show what would be done
Examples:
# Backfill all tables (dry-run)
epoch migrate-schemas --backfill --all --dry-run
# Backfill specific table
epoch migrate-schemas --backfill --table users
# Backfill since a date
epoch migrate-schemas --backfill --all --since "2024-01-01"
# Import from Flyway migrations
epoch migrate-schemas --import-from flyway ./db/migration
epoch conflicts
Manage merge conflicts.
epoch conflicts [SUBCOMMAND]
epoch conflicts show
Show current conflicts.
epoch conflicts show [--format json]
epoch conflicts resolve
Resolve conflicts.
epoch conflicts resolve [OPTIONS]
Options:
--interactive- Interactive resolution--ours- Accept our version--theirs- Accept their version--table <TABLE>- Resolve specific table--record <ID>- Resolve specific record--field <FIELD>- Resolve specific field--value <VALUE>- Set specific value
epoch query
Run a query against the repository.
epoch query <SQL> [OPTIONS]
Options:
--branch <BRANCH>- Query specific branch--at <REF>- Query at commit/tag/time--format <FMT>- Output format:table,json,csv
Examples:
epoch query "SELECT * FROM users"
epoch query "SELECT COUNT(*) FROM orders" --branch staging
epoch query "SELECT * FROM users" --at v1.0.0 --format json
epoch local
Manage local development environment.
epoch local [SUBCOMMAND]
epoch local start
Start local services.
epoch local start
epoch local stop
Stop local services.
epoch local stop
epoch local status
Show local service status.
epoch local status
Configuration File
~/.epoch/config.toml:
[defaults]
format = "text"
author_name = "Your Name"
author_email = "you@example.com"
[metadata]
url = "postgresql://localhost/horizon_epoch"
Environment Variables
| Variable | Description |
|---|---|
EPOCH_METADATA_URL | Metadata database URL |
EPOCH_CONFIG_FILE | Config file path |
EPOCH_LOG_LEVEL | Log level (debug, info, warn, error) |
EPOCH_FORMAT | Default output format |
Python SDK Reference
Complete reference for the Horizon Epoch Python SDK.
Installation
The Python SDK requires building from source (see Installation Guide).
# From project root
cd python && uv sync
uv run maturin develop --manifest-path ../crates/horizon-epoch-py/Cargo.toml
Quick Start
from horizon_epoch import Client, ClientConfig, Author
# Create client configuration
config = ClientConfig(
metadata_url="postgresql://localhost/horizon_epoch"
)
# Use async context manager for connection management
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
# Initialize a repository
await client.init("my-data-repo")
# Create a branch
await client.branch("feature-x")
# Make changes and commit
await client.commit("Initial commit")
Classes
ClientConfig
Configuration for the Horizon Epoch client.
from horizon_epoch import ClientConfig, Author
config = ClientConfig(
metadata_url="postgresql://user:pass@localhost/horizon_epoch",
default_branch="main",
author=Author(name="Dev", email="dev@example.com"),
timeout_seconds=30.0,
retry_count=3,
log_level="INFO",
)
| Parameter | Type | Default | Description |
|---|---|---|---|
metadata_url | str | required | Metadata database URL |
default_branch | str | "main" | Default branch name |
author | Author | None | Default commit author |
timeout_seconds | float | 30.0 | Operation timeout |
retry_count | int | 3 | Retry count for transient failures |
log_level | str | "INFO" | Logging level |
Client
Main client for interacting with Horizon Epoch.
Connection Management
from horizon_epoch import Client
# Option 1: Async context manager (recommended)
async with Client.connect("postgresql://...") as client:
await client.init("my-repo")
# Option 2: Manual connection management
client = Client.from_url("postgresql://...")
await client.open()
try:
await client.init("my-repo")
finally:
await client.close()
Repository Operations
# Initialize a new repository
config = await client.init(
name="my-repo",
storage_configs={"primary": {"backend": "postgresql", "url": "..."}},
description="My data repository"
)
# Open an existing repository
config = await client.open_repository("my-repo")
# Get repository status
status = await client.status(branch="main")
Branch Operations
# Create a branch
branch = await client.branch(
name="feature/new-schema",
start_point="main", # Optional, defaults to default branch
checkout=True # Optional, defaults to True
)
# List all branches
branches = await client.branches()
# Get a specific branch
branch = await client.get_branch("main")
# Checkout a branch
status = await client.checkout("feature/new-schema")
# Delete a branch
await client.delete_branch("old-feature", force=False)
# Compare branches
comparison = await client.compare_branches("feature-x", "main")
print(f"Ahead: {comparison.ahead_count}, Behind: {comparison.behind_count}")
Commit Operations
from horizon_epoch import Author
# Create a commit
commit = await client.commit(
message="Add user preferences",
author=Author(name="Dev", email="dev@example.com"),
branch="feature-x", # Optional
tables=["users"] # Optional, defaults to all staged
)
# View commit history
log = await client.log(ref="main", limit=10, skip=0)
for entry in log.entries:
print(f"{entry.commit.short_id}: {entry.commit.message}")
# Get commit details
commit = await client.show("abc123")
Diff Operations
# Diff between branches
diff = await client.diff(
base="main",
target="feature-x",
tables=["users"], # Optional
include_record_diff=True # Optional
)
for table_diff in diff.table_diffs:
print(f"{table_diff.table_name}: {table_diff.total_changes} changes")
Merge Operations
from horizon_epoch import MergeStrategy
# Merge a branch
result = await client.merge(
source="feature-x",
target="main", # Optional, defaults to default branch
strategy=MergeStrategy.THREE_WAY, # Optional
message="Merge feature-x", # Optional
dry_run=False # Optional
)
if result.has_conflicts:
print(f"Conflicts: {result.conflict_count}")
else:
print(f"Merged: {result.result_commit_id}")
# Abort a merge
await client.abort_merge(conflict_id="...")
Storage Operations
Horizon Epoch supports 8 storage backend types across databases, object storage, and filesystems.
from horizon_epoch import StorageBackend
# =============================================================================
# DATABASE BACKENDS
# =============================================================================
# PostgreSQL
backend_id = await client.add_storage(
name="postgres-warehouse",
backend=StorageBackend.POSTGRESQL,
config={"url": "postgresql://user:pass@localhost:5432/mydb"},
description="PostgreSQL data warehouse"
)
# MySQL
backend_id = await client.add_storage(
name="mysql-analytics",
backend=StorageBackend.MYSQL,
config={"url": "mysql://user:pass@localhost:3306/analytics"},
description="MySQL analytics database"
)
# Microsoft SQL Server
backend_id = await client.add_storage(
name="mssql-legacy",
backend=StorageBackend.MSSQL,
config={"url": "mssql://user:pass@localhost:1433/legacy_db"},
description="SQL Server legacy system"
)
# SQLite
backend_id = await client.add_storage(
name="sqlite-local",
backend=StorageBackend.SQLITE,
config={"path": "/data/local.db"},
description="Local SQLite database"
)
# =============================================================================
# OBJECT STORAGE BACKENDS
# =============================================================================
# AWS S3 / S3-compatible storage
backend_id = await client.add_storage(
name="s3-datalake",
backend=StorageBackend.S3,
config={
"bucket": "my-datalake",
"region": "us-west-2",
"endpoint": None, # Use AWS default, or specify for MinIO/LocalStack
},
description="S3 data lake"
)
# Azure Blob Storage
backend_id = await client.add_storage(
name="azure-warehouse",
backend=StorageBackend.AZURE,
config={
"account": "mystorageaccount",
"container": "data-warehouse",
},
description="Azure Blob data warehouse"
)
# Google Cloud Storage
backend_id = await client.add_storage(
name="gcs-analytics",
backend=StorageBackend.GCS,
config={
"bucket": "my-analytics-bucket",
"project": "my-gcp-project",
},
description="GCS analytics storage"
)
# =============================================================================
# FILESYSTEM BACKEND
# =============================================================================
# Local filesystem
backend_id = await client.add_storage(
name="local-dev",
backend=StorageBackend.LOCAL,
config={"path": "/data/epoch-storage"},
description="Local development storage"
)
# =============================================================================
# STORAGE MANAGEMENT
# =============================================================================
# List all storage backends
backends = await client.list_storage()
for name, backend_type in backends.items():
print(f"{name}: {backend_type.value}")
# Get storage details
config = await client.get_storage("s3-datalake")
print(f"Backend: {config.name}, Type: {config.backend_type}")
# Get detailed storage info
backends_info = await client.list_storage_info()
for backend in backends_info:
status = "default" if backend.is_default else "enabled"
print(f"{backend.name} ({backend.backend_type}): {status}")
# Remove storage backend
await client.remove_storage("old-storage")
Table Operations
Tables are tracked using StorageLocation objects that specify where the data lives. The native module provides factory methods for all 8 storage backends.
# Access native module for StorageLocation
from horizon_epoch.client import _native
# =============================================================================
# DATABASE STORAGE LOCATIONS
# =============================================================================
# PostgreSQL: connection_name, schema, table
loc = _native.StorageLocation.postgresql("main", "public", "users")
reg = await client.track_table("users", loc)
# MySQL: connection_name, database, table
loc = _native.StorageLocation.mysql("analytics", "sales_db", "orders")
reg = await client.track_table("orders", loc)
# Microsoft SQL Server: connection_name, database, schema, table
loc = _native.StorageLocation.mssql("legacy", "warehouse", "dbo", "customers")
reg = await client.track_table("customers", loc)
# SQLite: connection_name, table
loc = _native.StorageLocation.sqlite("local-db", "products")
reg = await client.track_table("products", loc)
# =============================================================================
# OBJECT STORAGE LOCATIONS
# =============================================================================
# S3: bucket, prefix, format (parquet, delta, csv)
loc = _native.StorageLocation.s3("my-datalake", "tables/events/", format="parquet")
reg = await client.track_table("events", loc)
# S3 with Delta Lake format
loc = _native.StorageLocation.s3("my-datalake", "delta/transactions/", format="delta")
reg = await client.track_table("transactions", loc)
# Azure Blob Storage: account, container, prefix, format
loc = _native.StorageLocation.azure_blob(
"mystorageaccount", "warehouse", "data/metrics/", format="parquet"
)
reg = await client.track_table("metrics", loc)
# Google Cloud Storage: bucket, prefix, format
loc = _native.StorageLocation.gcs("my-analytics", "tables/sessions/", format="parquet")
reg = await client.track_table("sessions", loc)
# =============================================================================
# FILESYSTEM STORAGE LOCATIONS
# =============================================================================
# Local filesystem: path, format
loc = _native.StorageLocation.local_filesystem("/data/exports/reports/", format="parquet")
reg = await client.track_table("reports", loc)
# =============================================================================
# TABLE MANAGEMENT
# =============================================================================
# View registration result
print(f"Tracked: {reg.table_name}")
print(f"Content hash: {reg.content_hash}")
print(f"Record count: {reg.record_count}")
print(f"Location: {reg.storage_location_uri}")
# List all tracked tables
tables = await client.list_tables()
for table in tables:
print(f" - {table}")
# Untrack a table (data is preserved, only tracking is removed)
await client.untrack_table("old-table")
Staging Operations
# Stage all changes
result = await client.stage_all(branch="main")
print(f"Staged {result.tables_affected} tables")
# Stage specific table
await client.stage("users", branch="main")
# Unstage all
result = await client.unstage_all(branch="main")
# Unstage specific table
await client.unstage("users", branch="main")
Constraint Operations
# Get constraints for a table
constraints = await client.get_constraints("users", branch="main")
print(f"Foreign keys: {len(constraints.foreign_keys)}")
print(f"Unique: {len(constraints.unique_constraints)}")
# Add a constraint (using native types)
from horizon_epoch.client import _native
constraint = _native.UniqueConstraint("uq_email", ["email"])
await client.add_constraint("users", constraint)
# Remove a constraint
await client.remove_constraint("users", "uq_email")
# Diff constraints between branches
diff = await client.diff_constraints("main", "feature-x", table_name="users")
# Add an index
col = _native.IndexColumn("email")
idx = _native.IndexDefinition("idx_email", [col])
await client.add_index("users", idx)
# Remove an index
await client.remove_index("users", "idx_email")
Models
Author
from horizon_epoch import Author
author = Author(name="Jane Developer", email="jane@example.com")
Branch
from horizon_epoch import Branch
# Returned from client.branch(), client.branches(), client.get_branch()
branch.name # str
branch.head_commit_id # str
branch.created_at # datetime
branch.is_default # bool
Commit
from horizon_epoch import Commit
# Returned from client.commit(), client.show()
commit.id # str (full commit ID)
commit.short_id # str (abbreviated)
commit.message # str
commit.author # Author
commit.timestamp # datetime
commit.parent_ids # List[str]
MergeResult
from horizon_epoch import MergeResult
result.source_branch # str
result.target_branch # str
result.is_fast_forward # bool
result.has_conflicts # bool
result.conflict_count # int
result.result_commit_id # Optional[str]
result.conflicts # List[RecordConflict]
Status
from horizon_epoch import Status
status.branch # Optional[str]
status.commit_id # Optional[str]
status.is_detached # bool
status.staged_tables # List[str]
status.unstaged_tables # List[str]
Enums
from horizon_epoch import MergeStrategy, StorageBackend
# Merge strategies
MergeStrategy.THREE_WAY # Standard three-way merge
MergeStrategy.OURS # Keep target branch version
MergeStrategy.THEIRS # Keep source branch version
MergeStrategy.MANUAL # Fail on conflicts
# Storage backends - Database
StorageBackend.POSTGRESQL # PostgreSQL database
StorageBackend.MYSQL # MySQL database
StorageBackend.MSSQL # Microsoft SQL Server
StorageBackend.SQLITE # SQLite database
# Storage backends - Object Storage
StorageBackend.S3 # AWS S3 / S3-compatible storage
StorageBackend.AZURE # Azure Blob Storage
StorageBackend.GCS # Google Cloud Storage
# Storage backends - Filesystem
StorageBackend.LOCAL # Local filesystem
Exceptions
from horizon_epoch import (
HorizonEpochError, # Base exception
RepositoryNotInitializedError,
BranchNotFoundError,
BranchAlreadyExistsError,
CommitNotFoundError,
MergeConflictError,
ConnectionFailedError,
StorageError,
ValidationError,
)
Async Support
All client methods are async and should be awaited:
import asyncio
from horizon_epoch import Client
async def main():
async with Client.connect("postgresql://...") as client:
await client.init("my-repo")
await client.branch("feature")
await client.commit("Initial commit")
asyncio.run(main())
Native Module Access
For advanced operations, access the native Rust module directly:
from horizon_epoch.client import (
_native, # Native module (if available)
is_native_available, # Check if native bindings loaded
get_native_version, # Get native module version
get_native_info, # Get diagnostic info
require_native, # Get native module or raise ImportError
)
if is_native_available():
print(f"Native version: {get_native_version()}")
# StorageLocation factory methods for all 8 backends
# Database backends
pg_loc = _native.StorageLocation.postgresql("conn", "schema", "table")
mysql_loc = _native.StorageLocation.mysql("conn", "database", "table")
mssql_loc = _native.StorageLocation.mssql("conn", "database", "schema", "table")
sqlite_loc = _native.StorageLocation.sqlite("conn", "table")
# Object storage backends
s3_loc = _native.StorageLocation.s3("bucket", "prefix/", format="parquet")
azure_loc = _native.StorageLocation.azure_blob("account", "container", "prefix/", format="parquet")
gcs_loc = _native.StorageLocation.gcs("bucket", "prefix/", format="parquet")
# Filesystem backend
local_loc = _native.StorageLocation.local_filesystem("/path/to/data/", format="parquet")
# Constraint types
unique = _native.UniqueConstraint("uq_email", ["email"])
check = _native.CheckConstraint("ck_age", "age >= 0")
fk = _native.ForeignKeyConstraint("fk_user", ["user_id"], "users", ["id"])
# Index types
col = _native.IndexColumn("email")
idx = _native.IndexDefinition("idx_email", [col])
Example Usage
import asyncio
from horizon_epoch import Client, Author, MergeStrategy
async def main():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
# Initialize repository
await client.init("my-data-repo")
# Create and checkout feature branch
await client.branch("feature/new-schema")
# Make changes (via your data pipeline)
# ...
# Stage and commit
await client.stage_all()
await client.commit(
message="Add user preferences",
author=Author(name="Dev", email="dev@example.com")
)
# Merge back to main
result = await client.merge(
source="feature/new-schema",
strategy=MergeStrategy.THREE_WAY
)
if result.has_conflicts:
print(f"Resolve {result.conflict_count} conflicts")
else:
print(f"Merged successfully: {result.result_commit_id}")
asyncio.run(main())
REST API Reference
The Horizon Epoch REST API provides HTTP endpoints for all version control operations. This reference documents all available endpoints, their parameters, and response formats.
Overview
Base URL
http://localhost:8000/api/v1
The API prefix is configurable via the EPOCH_API_PREFIX environment variable (default: /api/v1).
Content Type
All requests and responses use JSON format:
Content-Type: application/json
Running the API Server
# Start the server
uv run uvicorn horizon_epoch.api:app --reload --host 0.0.0.0 --port 8000
# With debug mode (enables Swagger UI)
EPOCH_DEBUG=true uv run uvicorn horizon_epoch.api:app --reload
Authentication
Current Status: The API does not currently implement authentication. All endpoints are open.
When authentication is implemented, the following methods will be supported:
Planned: API Key Authentication
curl -H "Authorization: Bearer <api-key>" \
http://localhost:8000/api/v1/repositories
Planned: Basic Auth
curl -u username:password \
http://localhost:8000/api/v1/repositories
For production deployments, we recommend placing the API behind a reverse proxy (nginx, Traefik) that handles authentication.
Health Endpoints
Health check endpoints are available at the root level (not under /api/v1).
Health Check
Check if the API server is running.
GET /health
Response:
{
"status": "healthy",
"version": "0.1.0",
"connected": true
}
curl Example:
curl http://localhost:8000/health
Readiness Check
Check if the API is ready to handle requests.
GET /ready
Response:
{
"ready": true,
"metadata_connected": true,
"repository": "my-repo"
}
curl Example:
curl http://localhost:8000/ready
API Root
Get basic API information.
GET /
Response:
{
"name": "Horizon Epoch API",
"version": "0.1.0",
"api_prefix": "/api/v1",
"docs": "/docs"
}
curl Example:
curl http://localhost:8000/
Repository Endpoints
Manage repositories and their configuration.
Create Repository
Initialize a new Horizon Epoch repository.
POST /api/v1/repositories
Request Body:
{
"name": "my-data-repo",
"storage_configs": {
"primary": {
"backend": "postgresql",
"url": "postgresql://user:pass@localhost/mydb"
}
}
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Repository name (min 1 character) |
storage_configs | object | No | Optional storage backend configurations |
Response: 201 Created
{
"name": "my-data-repo",
"default_branch": "main",
"metadata_url": "postgresql://localhost/horizon_epoch"
}
curl Example:
curl -X POST http://localhost:8000/api/v1/repositories \
-H "Content-Type: application/json" \
-d '{"name": "my-data-repo"}'
Open Repository
Open an existing repository.
POST /api/v1/repositories/{name}/open
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
name | string | Repository name |
Response: 200 OK
{
"name": "my-data-repo",
"default_branch": "main",
"metadata_url": "postgresql://localhost/horizon_epoch"
}
curl Example:
curl -X POST http://localhost:8000/api/v1/repositories/my-data-repo/open
Get Repository Status
Get the current status of the repository.
GET /api/v1/repositories/status
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
repository | string | No | Repository name (uses current if not specified) |
Response:
{
"branch": "main",
"commit_id": "abc123def456789",
"is_detached": false,
"is_clean": true,
"staged_count": 0,
"unstaged_count": 0,
"merge_in_progress": false
}
curl Example:
# Current repository
curl http://localhost:8000/api/v1/repositories/status
# Specific repository
curl "http://localhost:8000/api/v1/repositories/status?repository=my-data-repo"
List Tracked Tables
List all tables being tracked for version control.
GET /api/v1/repositories/tables
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
repository | string | No | Repository name |
Response:
["users", "orders", "products"]
curl Example:
curl http://localhost:8000/api/v1/repositories/tables
Add Storage Backend
Add a new storage backend to the repository.
POST /api/v1/repositories/storage
Request Body:
{
"name": "warehouse",
"backend": "postgresql",
"config": {
"url": "postgresql://user:pass@warehouse-host/analytics"
}
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Storage configuration name |
backend | string | Yes | Backend type (see below) |
config | object | Yes | Backend-specific configuration |
Supported Backend Types:
| Backend | Value | Required Config Fields |
|---|---|---|
| PostgreSQL | postgresql | url or (host, database) |
| MySQL | mysql | url or (host, database) |
| SQL Server | mssql | url or (host, database) |
| SQLite | sqlite | path |
| S3 | s3 | bucket |
| Azure Blob | azure | account, container |
| GCS | gcs | bucket |
| Local Filesystem | local | path |
Backend Configuration Examples:
// PostgreSQL
{
"name": "prod-pg",
"backend": "postgresql",
"config": {
"host": "db.example.com",
"port": 5432,
"database": "production",
"schema": "public"
}
}
// MySQL
{
"name": "analytics-mysql",
"backend": "mysql",
"config": {
"host": "mysql.example.com",
"port": 3306,
"database": "analytics"
}
}
// SQL Server
{
"name": "warehouse-mssql",
"backend": "mssql",
"config": {
"host": "sqlserver.example.com",
"port": 1433,
"database": "warehouse"
}
}
// SQLite
{
"name": "local-sqlite",
"backend": "sqlite",
"config": {
"path": "/data/local.db"
}
}
// S3
{
"name": "datalake",
"backend": "s3",
"config": {
"bucket": "company-datalake",
"region": "us-west-2",
"prefix": "epoch/"
}
}
// Azure Blob
{
"name": "azure-store",
"backend": "azure",
"config": {
"account": "myaccount",
"container": "data",
"prefix": "epoch/"
}
}
// GCS
{
"name": "gcs-store",
"backend": "gcs",
"config": {
"bucket": "my-bucket",
"project": "my-project",
"prefix": "epoch/"
}
}
// Local Filesystem
{
"name": "local-fs",
"backend": "local",
"config": {
"path": "/data/epoch"
}
}
Response: 201 Created
{
"status": "created",
"name": "warehouse"
}
curl Examples:
# PostgreSQL
curl -X POST http://localhost:8000/api/v1/repositories/storage \
-H "Content-Type: application/json" \
-d '{
"name": "warehouse",
"backend": "postgresql",
"config": {"host": "db.example.com", "database": "warehouse"}
}'
# S3
curl -X POST http://localhost:8000/api/v1/repositories/storage \
-H "Content-Type: application/json" \
-d '{
"name": "datalake",
"backend": "s3",
"config": {"bucket": "company-datalake", "region": "us-west-2"}
}'
# Azure Blob
curl -X POST http://localhost:8000/api/v1/repositories/storage \
-H "Content-Type: application/json" \
-d '{
"name": "azure-store",
"backend": "azure",
"config": {"account": "myaccount", "container": "data"}
}'
Track Table
Start tracking a table for version control.
POST /api/v1/repositories/tables/track
Request Body:
{
"table_name": "users",
"storage_name": "primary",
"primary_key": ["id"]
}
| Field | Type | Required | Description |
|---|---|---|---|
table_name | string | Yes | Table name to track |
storage_name | string | Yes | Storage backend containing the table |
primary_key | array | Yes | Primary key column(s) |
Response: 201 Created
{
"status": "tracking",
"table": "users"
}
curl Example:
curl -X POST http://localhost:8000/api/v1/repositories/tables/track \
-H "Content-Type: application/json" \
-d '{
"table_name": "users",
"storage_name": "primary",
"primary_key": ["id"]
}'
Branch Endpoints
Create, list, and manage branches.
Create Branch
Create a new branch.
POST /api/v1/branches
Request Body:
{
"name": "feature/add-preferences",
"start_point": "main",
"checkout": true
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | - | Branch name (min 1 character) |
start_point | string | No | HEAD | Commit ID or branch to start from |
checkout | boolean | No | true | Whether to checkout the new branch |
Response: 201 Created
{
"name": "feature/add-preferences",
"head_commit_id": "abc123def456789",
"is_protected": false,
"created_at": "2024-12-18T10:30:00Z"
}
curl Example:
# Create and checkout a new branch
curl -X POST http://localhost:8000/api/v1/branches \
-H "Content-Type: application/json" \
-d '{"name": "feature/add-preferences"}'
# Create from specific commit without checkout
curl -X POST http://localhost:8000/api/v1/branches \
-H "Content-Type: application/json" \
-d '{"name": "hotfix/urgent", "start_point": "abc123", "checkout": false}'
List Branches
List all branches in the repository.
GET /api/v1/branches
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
repository | string | No | Repository name |
Response:
{
"branches": [
{
"name": "main",
"head_commit_id": "abc123def456789",
"is_protected": true,
"created_at": "2024-12-01T00:00:00Z"
},
{
"name": "feature/add-preferences",
"head_commit_id": "def456abc789012",
"is_protected": false,
"created_at": "2024-12-18T10:30:00Z"
}
],
"current": "main"
}
curl Example:
curl http://localhost:8000/api/v1/branches
Checkout Branch
Switch to a different branch.
POST /api/v1/branches/{name}/checkout
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
name | string | Branch name to checkout |
Response:
{
"status": "checked_out",
"branch": "feature/add-preferences",
"commit_id": "def456abc789012"
}
curl Example:
curl -X POST http://localhost:8000/api/v1/branches/feature%2Fadd-preferences/checkout
Note: URL-encode branch names containing
/(e.g.,feature/xyzbecomesfeature%2Fxyz).
Delete Branch
Delete a branch.
DELETE /api/v1/branches/{name}
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
name | string | Branch name to delete |
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
force | boolean | No | false | Force delete even if unmerged |
repository | string | No | - | Repository name |
Response: 204 No Content
curl Example:
# Normal delete (fails if unmerged)
curl -X DELETE http://localhost:8000/api/v1/branches/feature%2Fadd-preferences
# Force delete
curl -X DELETE "http://localhost:8000/api/v1/branches/feature%2Fadd-preferences?force=true"
Compare Branches
Compare two branches to see how they’ve diverged.
GET /api/v1/branches/compare
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
source | string | Yes | Source branch |
target | string | Yes | Target branch |
repository | string | No | Repository name |
Response:
{
"source_branch": "feature/add-preferences",
"target_branch": "main",
"ahead_count": 3,
"behind_count": 1,
"can_fast_forward": false,
"merge_base_id": "abc123def456789"
}
curl Example:
curl "http://localhost:8000/api/v1/branches/compare?source=feature/add-preferences&target=main"
Merge Branch
Merge a branch into the current branch.
POST /api/v1/branches/merge
Request Body:
{
"source": "feature/add-preferences",
"strategy": "THREE_WAY",
"message": "Merge feature/add-preferences into main",
"no_commit": false
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
source | string | Yes | - | Source branch to merge from |
strategy | string | No | THREE_WAY | Merge strategy: THREE_WAY, FAST_FORWARD, OURS, THEIRS |
message | string | No | auto | Custom merge commit message |
no_commit | boolean | No | false | Stage but don’t commit merge |
Response:
{
"source_branch": "feature/add-preferences",
"target_branch": "main",
"is_fast_forward": false,
"has_conflicts": false,
"conflict_count": 0,
"result_commit_id": "ghi789jkl012345"
}
Response (with conflicts): 409 Conflict
{
"error": "MergeConflictError",
"message": "Merge conflict in 1 table(s)",
"details": {
"conflict_tables": ["users"],
"conflict_count": 3
}
}
curl Examples:
# Standard merge
curl -X POST http://localhost:8000/api/v1/branches/merge \
-H "Content-Type: application/json" \
-d '{"source": "feature/add-preferences"}'
# Fast-forward only
curl -X POST http://localhost:8000/api/v1/branches/merge \
-H "Content-Type: application/json" \
-d '{"source": "feature/add-preferences", "strategy": "FAST_FORWARD"}'
# Merge without committing
curl -X POST http://localhost:8000/api/v1/branches/merge \
-H "Content-Type: application/json" \
-d '{"source": "feature/add-preferences", "no_commit": true}'
Abort Merge
Abort an in-progress merge operation.
POST /api/v1/branches/merge/abort
Response:
{
"status": "aborted"
}
curl Example:
curl -X POST http://localhost:8000/api/v1/branches/merge/abort
Commit Endpoints
Create commits and view commit history.
Create Commit
Create a new commit with staged changes.
POST /api/v1/commits
Request Body:
{
"message": "Add user preferences feature",
"author_name": "Jane Developer",
"author_email": "jane@example.com"
}
| Field | Type | Required | Description |
|---|---|---|---|
message | string | Yes | Commit message (min 1 character) |
author_name | string | No | Author name (uses config default) |
author_email | string | No | Author email (uses config default) |
Response: 201 Created
{
"id": "abc123def456789abcdef0123456789abcdef01",
"short_id": "abc123d",
"message": "Add user preferences feature",
"author_name": "Jane Developer",
"author_email": "jane@example.com",
"created_at": "2024-12-18T10:45:00Z",
"parent_count": 1
}
curl Example:
curl -X POST http://localhost:8000/api/v1/commits \
-H "Content-Type: application/json" \
-d '{"message": "Add user preferences feature"}'
List Commits
Get commit history.
GET /api/v1/commits
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
ref | string | No | HEAD | Branch or commit reference |
limit | integer | No | 100 | Max commits to return (1-1000) |
offset | integer | No | 0 | Offset for pagination |
repository | string | No | - | Repository name |
Response:
{
"commits": [
{
"id": "abc123def456789abcdef0123456789abcdef01",
"short_id": "abc123d",
"message": "Add user preferences feature",
"author_name": "Jane Developer",
"author_email": "jane@example.com",
"created_at": "2024-12-18T10:45:00Z",
"parent_count": 1
}
],
"total": 42,
"has_more": true
}
curl Examples:
# Get recent commits
curl http://localhost:8000/api/v1/commits
# Get commits from specific branch
curl "http://localhost:8000/api/v1/commits?ref=feature/add-preferences"
# Pagination
curl "http://localhost:8000/api/v1/commits?limit=10&offset=20"
Get Commit
Get details of a specific commit.
GET /api/v1/commits/{commit_id}
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
commit_id | string | Full or short commit ID |
Response:
{
"id": "abc123def456789abcdef0123456789abcdef01",
"short_id": "abc123d",
"message": "Add user preferences feature",
"author_name": "Jane Developer",
"author_email": "jane@example.com",
"created_at": "2024-12-18T10:45:00Z",
"parent_count": 1
}
curl Example:
curl http://localhost:8000/api/v1/commits/abc123def456789
Get Diff
Show differences between commits or branches.
GET /api/v1/commits/diff
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
base | string | No | Base commit/branch (defaults to parent) |
target | string | No | Target commit/branch (defaults to HEAD) |
repository | string | No | Repository name |
Response:
{
"base_commit_id": "abc123def456789",
"target_commit_id": "def456ghi789012",
"tables_changed": 2,
"total_inserts": 15,
"total_updates": 8,
"total_deletes": 3
}
curl Examples:
# Diff working state vs HEAD
curl http://localhost:8000/api/v1/commits/diff
# Diff between branches
curl "http://localhost:8000/api/v1/commits/diff?base=main&target=feature/add-preferences"
# Diff between commits
curl "http://localhost:8000/api/v1/commits/diff?base=abc123&target=def456"
Stage All Changes
Stage all pending changes for commit.
POST /api/v1/commits/stage
Response:
{
"status": "staged",
"count": 5
}
curl Example:
curl -X POST http://localhost:8000/api/v1/commits/stage
Unstage All Changes
Remove all changes from the staging area.
POST /api/v1/commits/unstage
Response:
{
"status": "unstaged",
"count": 5
}
curl Example:
curl -X POST http://localhost:8000/api/v1/commits/unstage
Error Responses
All errors follow a consistent format:
{
"error": "ErrorClassName",
"message": "Human-readable error description",
"details": {
"additional": "context information"
}
}
HTTP Status Codes
| Code | Meaning | When Used |
|---|---|---|
200 | OK | Successful GET/POST operations |
201 | Created | Resource successfully created |
204 | No Content | Successful DELETE operations |
400 | Bad Request | Invalid request format or parameters |
404 | Not Found | Resource not found |
409 | Conflict | Merge conflicts, duplicate resources |
422 | Unprocessable Entity | Validation errors |
503 | Service Unavailable | Database connection issues |
Error Types
| Error | HTTP Status | Description |
|---|---|---|
BranchNotFoundError | 404 | Specified branch doesn’t exist |
CommitNotFoundError | 404 | Specified commit doesn’t exist |
RepositoryNotInitializedError | 404 | Repository not initialized or opened |
BranchAlreadyExistsError | 409 | Branch name already in use |
MergeConflictError | 409 | Merge has unresolved conflicts |
ConnectionFailedError | 503 | Unable to connect to metadata database |
ValidationError | 422 | Request body validation failed |
Example Error Responses
Branch Not Found:
{
"error": "BranchNotFoundError",
"message": "Branch 'feature/nonexistent' not found",
"details": {
"branch": "feature/nonexistent"
}
}
Validation Error:
{
"detail": [
{
"type": "string_too_short",
"loc": ["body", "name"],
"msg": "String should have at least 1 character",
"input": "",
"ctx": {"min_length": 1}
}
]
}
Connection Error:
{
"error": "ConnectionFailedError",
"message": "Failed to connect to metadata database",
"details": {
"url": "postgresql://localhost/horizon_epoch"
}
}
Pagination
List endpoints support pagination via limit and offset query parameters.
GET /api/v1/commits?limit=10&offset=20
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
limit | integer | 100 | 1-1000 | Maximum items per page |
offset | integer | 0 | 0+ | Number of items to skip |
Paginated responses include metadata:
{
"items": [...],
"total": 150,
"has_more": true
}
OpenAPI Specification
The API provides auto-generated OpenAPI 3.0 documentation:
| Endpoint | Description |
|---|---|
GET /openapi.json | Raw OpenAPI specification |
GET /docs | Swagger UI (interactive, debug mode only) |
GET /redoc | ReDoc (alternative UI, debug mode only) |
Enable debug mode to access Swagger UI:
EPOCH_DEBUG=true uv run uvicorn horizon_epoch.api:app --reload
Then visit: http://localhost:8000/docs
Export OpenAPI Spec
You can export the OpenAPI specification using the provided script:
# Export as JSON
cd python
uv run python scripts/export_openapi.py --output ../docs/openapi.json
# Export as YAML
uv run python scripts/export_openapi.py --format yaml --output ../docs/openapi.yaml
Environment Configuration
The API can be configured via environment variables:
| Variable | Default | Description |
|---|---|---|
EPOCH_METADATA_URL | postgresql://localhost:5432/horizon_epoch | Metadata database URL |
EPOCH_DEFAULT_BRANCH | main | Default branch name |
EPOCH_API_PREFIX | /api/v1 | API route prefix |
EPOCH_DEBUG | false | Enable debug mode (exposes Swagger UI) |
EPOCH_LOG_LEVEL | INFO | Logging level |
EPOCH_AUTHOR_NAME | - | Default author name for commits |
EPOCH_AUTHOR_EMAIL | - | Default author email for commits |
Example:
export EPOCH_METADATA_URL="postgresql://user:pass@localhost:5432/horizon_epoch"
export EPOCH_DEBUG=true
export EPOCH_AUTHOR_NAME="API User"
export EPOCH_AUTHOR_EMAIL="api@example.com"
uv run uvicorn horizon_epoch.api:app --host 0.0.0.0 --port 8000
Rate Limiting
Current Status: Rate limiting is not currently implemented.
When rate limiting is implemented, limits will be returned in response headers:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1705312200
CORS
Cross-Origin Resource Sharing is enabled for all origins by default. For production, configure allowed origins appropriately.
Current configuration:
- Origins:
*(all) - Methods: All HTTP methods
- Headers: All headers
- Credentials: Allowed
See Also
- CLI Reference - Command-line interface documentation
- Python SDK - Python client library
- Configuration Reference - All configuration options
Configuration Reference
Complete reference for Horizon Epoch configuration options.
Configuration Sources
Configuration is loaded from (in order of precedence):
- Command-line arguments (highest priority)
- Environment variables
- Local repository configuration (
.epoch/config) - Configuration file (
epoch.toml) - Default values (lowest priority)
Configuration File Search Order
The CLI searches for configuration files in this order:
./epoch.toml(current directory)./config/epoch.toml./.epoch/config.toml~/.config/epoch/config.toml/etc/epoch/config.toml
The first file found is used. Override with --config or EPOCH_CONFIG_FILE.
Configuration File
Default location: ~/.epoch/config.toml or ./epoch.toml
Full Example
# epoch.toml
[metadata]
url = "postgresql://localhost/horizon_epoch"
pool_size = 10
connect_timeout = 30
[defaults]
branch = "main"
author_name = "Your Name"
author_email = "you@example.com"
format = "text"
[logging]
level = "info"
format = "pretty" # or "json"
file = "/var/log/epoch/epoch.log"
# PostgreSQL storage backends
[storage.postgres.production]
url = "postgresql://user:pass@prod-db.example.com/production"
pool_size = 20
connect_timeout = 30
[storage.postgres.staging]
host = "staging-db.example.com"
port = 5432
database = "staging"
username = "epoch_user"
password_file = "/run/secrets/db-password"
sslmode = "verify-full"
ssl_rootcert = "/etc/ssl/certs/ca.crt"
[storage.postgres.secure]
host = "secure-db.internal"
database = "secure"
vault_path = "secret/data/secure-db"
[storage.postgres.secure.ssh_tunnel]
host = "bastion.example.com"
user = "ubuntu"
key_file = "~/.ssh/id_rsa"
# S3 storage backends
[storage.s3.datalake]
bucket = "company-datalake"
region = "us-east-1"
prefix = "horizon-epoch/"
[storage.s3.partner]
bucket = "partner-shared"
region = "us-west-2"
assume_role_arn = "arn:aws:iam::123456789012:role/PartnerAccess"
external_id = "partner-123"
[storage.s3.minio]
bucket = "local-data"
endpoint = "http://localhost:9000"
access_key = "minioadmin"
secret_key = "minioadmin"
force_path_style = true
# Vault integration
[vault]
addr = "https://vault.example.com:8200"
auth_method = "approle"
role_id = "xxx-xxx-xxx"
secret_id_file = "/run/secrets/vault-secret-id"
ca_cert = "/etc/ssl/certs/vault-ca.crt"
# AWS integration
[aws]
region = "us-east-1"
profile = "production"
# Credential caching
[credentials]
cache_enabled = true
cache_ttl = 300
refresh_before_expiry = 60
Environment Variables
All configuration can be set via environment variables with EPOCH_ prefix:
Core Settings
| Variable | Description | Default |
|---|---|---|
EPOCH_METADATA_URL | Metadata database URL | - |
DATABASE_URL | Fallback for metadata URL (SQLx compatible) | - |
EPOCH_CONFIG_FILE | Config file path | ~/.epoch/config.toml |
EPOCH_DEFAULT_BRANCH | Default branch name | main |
EPOCH_FORMAT | Output format | text |
Author Settings
| Variable | Description | Default |
|---|---|---|
EPOCH_AUTHOR_NAME | Default author name for commits | git config user.name |
EPOCH_AUTHOR_EMAIL | Default author email for commits | git config user.email |
Author information is resolved in this priority order:
- Command-line arguments (
--author-name,--author-email) - Environment variables (
EPOCH_AUTHOR_NAME,EPOCH_AUTHOR_EMAIL) - Local repository config (
.epoch/config) - Git configuration (
git config user.name,git config user.email)
Editor Settings
| Variable | Description | Default |
|---|---|---|
EPOCH_EDITOR | Editor for interactive operations | - |
EDITOR | Fallback editor | - |
VISUAL | Secondary fallback editor | - |
Editor is selected in priority order: EPOCH_EDITOR > EDITOR > VISUAL > platform default (vi on Unix, notepad.exe on Windows).
Commit Signing
| Variable | Description | Default |
|---|---|---|
EPOCH_SIGNING_KEY | Path to signing key file (ASCII-armored) | - |
EPOCH_SIGNING_KEY_PASSWORD | Password for signing key | - (prompts) |
Logging
| Variable | Description | Default |
|---|---|---|
EPOCH_LOG_LEVEL | Log level (error/warn/info/debug/trace) | info |
EPOCH_LOG_FORMAT | Log format (pretty/json) | pretty |
EPOCH_LOG_FILE | Log file path | - |
RUST_LOG | Rust logging filter (module=level format) | - |
API Server
| Variable | Description | Default |
|---|---|---|
EPOCH_API_HOST | API server bind address | 0.0.0.0 |
EPOCH_API_PORT | API server port | 8000 |
EPOCH_API_PREFIX | API route prefix | /api/v1 |
EPOCH_DEBUG | Enable debug mode (exposes Swagger UI) | false |
EPOCH_SERVER_URL | Remote server URL for CLI | - |
OAuth/OIDC Authentication
| Variable | Description |
|---|---|
EPOCH_OAUTH_CLIENT_ID | Default OAuth client ID |
EPOCH_OAUTH_CLIENT_SECRET | Default OAuth client secret |
EPOCH_OAUTH_{PROVIDER}_CLIENT_ID | Provider-specific client ID |
EPOCH_OAUTH_{PROVIDER}_CLIENT_SECRET | Provider-specific client secret |
EPOCH_OAUTH_OIDC_ISSUER | OIDC issuer URL for custom provider |
Supported providers: GITHUB, GOOGLE, AZURE, OKTA, OIDC (custom).
Example:
# GitHub OAuth
export EPOCH_OAUTH_GITHUB_CLIENT_ID="your-client-id"
export EPOCH_OAUTH_GITHUB_CLIENT_SECRET="your-client-secret"
epoch auth login --method oauth --provider github
# Custom OIDC provider
export EPOCH_OAUTH_CLIENT_ID="your-client-id"
export EPOCH_OAUTH_OIDC_ISSUER="https://your-idp.example.com"
epoch auth login --method oauth --provider oidc
PostgreSQL
| Variable | Description |
|---|---|
EPOCH_POSTGRES_{NAME}_URL | Full connection URL |
EPOCH_POSTGRES_{NAME}_HOST | Database host |
EPOCH_POSTGRES_{NAME}_PORT | Database port |
EPOCH_POSTGRES_{NAME}_DATABASE | Database name |
EPOCH_POSTGRES_{NAME}_USERNAME | Username |
EPOCH_POSTGRES_{NAME}_PASSWORD | Password |
EPOCH_POSTGRES_{NAME}_SSLMODE | SSL mode |
S3
| Variable | Description |
|---|---|
EPOCH_S3_{NAME}_BUCKET | Bucket name |
EPOCH_S3_{NAME}_REGION | AWS region |
EPOCH_S3_{NAME}_ENDPOINT | Custom endpoint |
EPOCH_S3_{NAME}_ACCESS_KEY | Access key ID |
EPOCH_S3_{NAME}_SECRET_KEY | Secret access key |
AWS_ACCESS_KEY_ID | Default AWS access key |
AWS_SECRET_ACCESS_KEY | Default AWS secret key |
AWS_DEFAULT_REGION | Default AWS region |
MySQL
| Variable | Description |
|---|---|
EPOCH_MYSQL_{NAME}_URL | Full connection URL |
EPOCH_MYSQL_{NAME}_HOST | Database host |
EPOCH_MYSQL_{NAME}_PORT | Database port |
EPOCH_MYSQL_{NAME}_DATABASE | Database name |
EPOCH_MYSQL_{NAME}_USERNAME | Username |
EPOCH_MYSQL_{NAME}_PASSWORD | Password |
SQL Server
| Variable | Description |
|---|---|
EPOCH_MSSQL_{NAME}_URL | Full connection URL |
EPOCH_MSSQL_{NAME}_HOST | Database host |
EPOCH_MSSQL_{NAME}_PORT | Database port |
EPOCH_MSSQL_{NAME}_DATABASE | Database name |
EPOCH_MSSQL_{NAME}_USERNAME | Username |
EPOCH_MSSQL_{NAME}_PASSWORD | Password |
SQLite
| Variable | Description |
|---|---|
EPOCH_SQLITE_{NAME}_PATH | Database file path |
Azure Blob Storage
| Variable | Description |
|---|---|
EPOCH_AZURE_{NAME}_ACCOUNT | Storage account name |
EPOCH_AZURE_{NAME}_CONTAINER | Container name |
EPOCH_AZURE_{NAME}_CONNECTION_STRING | Connection string |
EPOCH_AZURE_{NAME}_ACCESS_KEY | Storage access key |
AZURE_TENANT_ID | Azure AD tenant ID |
AZURE_CLIENT_ID | Azure AD client ID |
AZURE_CLIENT_SECRET | Azure AD client secret |
Google Cloud Storage
| Variable | Description |
|---|---|
EPOCH_GCS_{NAME}_BUCKET | Bucket name |
EPOCH_GCS_{NAME}_PROJECT | GCP project ID |
GOOGLE_APPLICATION_CREDENTIALS | Service account key file path |
Local Filesystem
| Variable | Description |
|---|---|
EPOCH_LOCAL_{NAME}_PATH | Base directory path |
Vault
| Variable | Description |
|---|---|
VAULT_ADDR | Vault server address |
VAULT_TOKEN | Vault token |
VAULT_ROLE_ID | AppRole role ID |
VAULT_SECRET_ID | AppRole secret ID |
SSH
| Variable | Description |
|---|---|
EPOCH_SSH_HOST | SSH tunnel host |
EPOCH_SSH_USER | SSH username |
EPOCH_SSH_KEY_FILE | SSH private key path |
SSH_AUTH_SOCK | SSH agent socket |
Configuration Sections
[metadata]
Metadata database configuration.
| Option | Type | Default | Description |
|---|---|---|---|
url | string | - | PostgreSQL connection URL |
pool_size | int | 10 | Connection pool size |
connect_timeout | int | 30 | Connection timeout (seconds) |
idle_timeout | int | 600 | Idle connection timeout |
max_lifetime | int | 3600 | Max connection lifetime |
[defaults]
Default values for operations.
| Option | Type | Default | Description |
|---|---|---|---|
branch | string | “main” | Default branch |
author_name | string | - | Default author name |
author_email | string | - | Default author email |
format | string | “text” | Output format |
[logging]
Logging configuration.
| Option | Type | Default | Description |
|---|---|---|---|
level | string | “info” | Log level |
format | string | “pretty” | Log format |
file | string | - | Log file path |
json | bool | false | JSON log format |
[storage.postgres.{name}]
PostgreSQL backend configuration.
| Option | Type | Default | Description |
|---|---|---|---|
url | string | - | Full connection URL |
host | string | - | Database host |
port | int | 5432 | Database port |
database | string | - | Database name |
username | string | - | Username |
password | string | - | Password |
password_file | string | - | Password file path |
sslmode | string | “prefer” | SSL mode |
ssl_cert | string | - | Client certificate |
ssl_key | string | - | Client key |
ssl_rootcert | string | - | CA certificate |
pool_size | int | 10 | Pool size |
connect_timeout | int | 30 | Timeout |
vault_path | string | - | Vault secret path |
vault_role | string | - | Vault dynamic role |
aws_secret_id | string | - | AWS Secrets Manager ID |
use_iam_auth | bool | false | Use RDS IAM auth |
[storage.postgres.{name}.ssh_tunnel]
SSH tunnel configuration for PostgreSQL.
| Option | Type | Default | Description |
|---|---|---|---|
host | string | - | Bastion host |
port | int | 22 | SSH port |
user | string | - | SSH username |
key_file | string | - | Private key path |
key_passphrase | string | - | Key passphrase |
use_agent | bool | false | Use SSH agent |
known_hosts_file | string | - | Known hosts file |
strict_host_key_checking | bool/string | true | Host key checking |
keepalive_interval | int | 60 | Keepalive interval |
connect_timeout | int | 30 | Connection timeout |
auto_reconnect | bool | true | Auto-reconnect |
[storage.s3.{name}]
S3 backend configuration.
| Option | Type | Default | Description |
|---|---|---|---|
bucket | string | - | Bucket name |
region | string | - | AWS region |
endpoint | string | - | Custom endpoint |
prefix | string | “” | Key prefix |
access_key | string | - | Access key ID |
secret_key | string | - | Secret access key |
session_token | string | - | Session token |
force_path_style | bool | false | Path-style URLs |
use_instance_credentials | bool | false | Use instance profile |
assume_role_arn | string | - | Role to assume |
external_id | string | - | External ID |
vault_path | string | - | Vault secret path |
aws_secret_id | string | - | Secrets Manager ID |
[storage.mysql.{name}]
MySQL backend configuration.
| Option | Type | Default | Description |
|---|---|---|---|
url | string | - | Full connection URL |
host | string | - | Database host |
port | int | 3306 | Database port |
database | string | - | Database name |
username | string | - | Username |
password | string | - | Password |
password_file | string | - | Password file path |
ssl_mode | string | “preferred” | SSL mode |
ssl_ca | string | - | CA certificate |
pool_size | int | 10 | Pool size |
connect_timeout | int | 30 | Timeout |
vault_path | string | - | Vault secret path |
[storage.mysql.{name}.ssh_tunnel]
SSH tunnel configuration for MySQL (same options as PostgreSQL).
[storage.mssql.{name}]
SQL Server backend configuration.
| Option | Type | Default | Description |
|---|---|---|---|
url | string | - | Full connection URL |
host | string | - | Database host |
port | int | 1433 | Database port |
database | string | - | Database name |
username | string | - | Username |
password | string | - | Password |
password_file | string | - | Password file path |
trust_cert | bool | false | Trust server certificate |
encrypt | bool | true | Encrypt connection |
pool_size | int | 10 | Pool size |
connect_timeout | int | 30 | Timeout |
vault_path | string | - | Vault secret path |
use_windows_auth | bool | false | Use Windows auth |
[storage.sqlite.{name}]
SQLite backend configuration.
| Option | Type | Default | Description |
|---|---|---|---|
path | string | - | Database file path |
mode | string | “rwc” | Open mode (r/rw/rwc) |
in_memory | bool | false | Use in-memory database |
busy_timeout | int | 5000 | Busy timeout (ms) |
journal_mode | string | “wal” | Journal mode |
[storage.azure.{name}]
Azure Blob Storage backend configuration.
| Option | Type | Default | Description |
|---|---|---|---|
account | string | - | Storage account name |
container | string | - | Container name |
prefix | string | “” | Blob prefix |
auth_method | string | “connection_string” | Auth method |
connection_string | string | - | Connection string |
access_key | string | - | Storage access key |
sas_token | string | - | SAS token |
tenant_id | string | - | Azure AD tenant ID |
client_id | string | - | Azure AD client ID |
client_secret | string | - | Azure AD client secret |
use_managed_identity | bool | false | Use managed identity |
vault_path | string | - | Vault secret path |
[storage.gcs.{name}]
Google Cloud Storage backend configuration.
| Option | Type | Default | Description |
|---|---|---|---|
bucket | string | - | Bucket name |
project_id | string | - | GCP project ID |
prefix | string | “” | Object prefix |
auth_method | string | “service_account” | Auth method |
credentials_file | string | - | Service account key file |
use_adc | bool | false | Use Application Default Credentials |
vault_path | string | - | Vault secret path |
[storage.local.{name}]
Local filesystem backend configuration.
| Option | Type | Default | Description |
|---|---|---|---|
path | string | - | Base directory path |
create_dirs | bool | true | Create directories if missing |
[vault]
HashiCorp Vault configuration.
| Option | Type | Default | Description |
|---|---|---|---|
addr | string | - | Vault address |
token | string | - | Vault token |
auth_method | string | “token” | Auth method |
role_id | string | - | AppRole role ID |
secret_id | string | - | AppRole secret ID |
secret_id_file | string | - | Secret ID file |
role | string | - | K8s/AWS role |
ca_cert | string | - | CA certificate |
skip_verify | bool | false | Skip TLS verify |
namespace | string | - | Vault namespace |
[aws]
AWS configuration.
| Option | Type | Default | Description |
|---|---|---|---|
region | string | - | Default region |
profile | string | - | AWS profile |
access_key_id | string | - | Access key |
secret_access_key | string | - | Secret key |
[credentials]
Credential caching configuration.
| Option | Type | Default | Description |
|---|---|---|---|
cache_enabled | bool | true | Enable caching |
cache_ttl | int | 300 | Cache TTL (seconds) |
cache_max_size | int | 100 | Max cache entries |
refresh_before_expiry | int | 60 | Refresh threshold |
Variable Substitution
Environment variables can be referenced in config:
[storage.postgres.production]
password = "${DB_PASSWORD}"
File contents can be referenced:
[storage.postgres.production]
password = "@file:/run/secrets/db-password"
Profiles
Multiple profiles can be defined:
[profiles.development]
metadata.url = "postgresql://localhost/horizon_epoch_dev"
[profiles.production]
metadata.url = "postgresql://prod-db/horizon_epoch"
Switch profiles:
epoch --profile production status
Local Repository Configuration
When you initialize a repository with epoch init, a .epoch/ directory is created with local configuration.
Directory Structure
.epoch/
├── config # Repository configuration (TOML)
├── HEAD # Current branch reference
└── refs/
├── heads/ # Branch references
└── tags/ # Tag references
Local Config File (.epoch/config)
# Repository identity
repository_name = "my-data-repo"
repository_id = "550e8400-e29b-41d4-a716-446655440000"
# Metadata database connection
metadata_url = "postgresql://localhost/horizon_epoch"
# Current branch (also tracked in HEAD file)
current_branch = "main"
# Author information (optional, overrides global config)
[author]
name = "Your Name"
email = "you@example.com"
# Branching configuration (optional)
[branching]
strategy = "environment" # or "gitflow", "trunk"
[branching.promotion]
paths = [
{ source = "development", target = "staging" },
{ source = "staging", target = "production" }
]
require_approval = true
HEAD File
The HEAD file tracks the current branch:
ref: refs/heads/main
Or in detached HEAD state (direct commit reference):
abc123def456...
DDL Configuration
DDL (Data Definition Language) execution settings for schema changes.
[ddl]
| Option | Type | Default | Description |
|---|---|---|---|
lock_timeout_secs | int | 30 | Lock acquisition timeout |
statement_timeout_secs | int | 300 | Statement execution timeout |
check_table_stats | bool | true | Check table size before DDL |
use_concurrent | bool | false | Use concurrent index operations |
warn_large_table_rows | int | 1000000 | Warn if table exceeds rows |
warn_large_table_bytes | int | 1073741824 | Warn if table exceeds bytes (1GB) |
[ddl.retry]
| Option | Type | Default | Description |
|---|---|---|---|
max_retries | int | 3 | Maximum retry attempts |
initial_delay_ms | int | 500 | Initial retry delay |
max_delay_ms | int | 30000 | Maximum retry delay |
backoff_multiplier | float | 2.0 | Exponential backoff factor |
Example:
[ddl]
lock_timeout_secs = 30
statement_timeout_secs = 300
check_table_stats = true
[ddl.retry]
max_retries = 3
initial_delay_ms = 500
max_delay_ms = 30000
Branching Configuration
[branching]
| Option | Type | Default | Description |
|---|---|---|---|
strategy | string | “environment” | Branching strategy |
default_branch | string | “main” | Default branch name |
protected_branches | array | [“main”, “production”] | Protected branch patterns |
Branching Strategies
environment (default):
- Branches represent environments (development, staging, production)
- Promotion flows upward through environments
- Good for data pipelines and ETL workflows
gitflow:
- Feature branches, develop, release, hotfix, main
- Traditional Git Flow pattern adapted for data
- Good for teams familiar with Git Flow
trunk:
- Single main branch with short-lived feature branches
- Continuous integration style
- Good for fast-moving projects
[branching.promotion]
| Option | Type | Default | Description |
|---|---|---|---|
paths | array | [] | Allowed promotion paths |
require_approval | bool | false | Require approval for promotions |
auto_merge | bool | true | Auto-merge if no conflicts |
conflict_strategy | string | “fail” | Conflict handling (fail/ours/theirs) |
Example:
[branching]
strategy = "environment"
default_branch = "development"
protected_branches = ["staging", "production"]
[branching.promotion]
paths = [
{ source = "development", target = "staging" },
{ source = "staging", target = "production" }
]
require_approval = true
conflict_strategy = "fail"
Conflict Handling
[conflict]
| Option | Type | Default | Description |
|---|---|---|---|
default_strategy | string | “manual” | Default conflict resolution |
auto_resolve_schema | bool | false | Auto-resolve schema conflicts |
[conflict.table_rules]
Per-table conflict rules:
[conflict.table_rules.users]
strategy = "last_write_wins"
timestamp_field = "updated_at"
[conflict.table_rules.audit_log]
strategy = "append_only"
[conflict.field_rules]
Per-field conflict rules:
[conflict.field_rules."users.email"]
strategy = "source_wins"
[conflict.field_rules."users.login_count"]
strategy = "sum"
Strategies: manual, source_wins, target_wins, last_write_wins, sum, max, min, append_only.
Testing and Development
Development Environment Variables
| Variable | Description |
|---|---|
DATABASE_URL | Database URL for tests/development |
TEST_DATABASE_URL | Alternative test database URL |
SQLX_OFFLINE | Enable SQLx offline mode (true) |
S3/MinIO Testing
| Variable | Description |
|---|---|
S3_ENDPOINT | S3/MinIO endpoint URL |
S3_BUCKET | Test bucket name |
S3_ACCESS_KEY | Access key for testing |
S3_SECRET_KEY | Secret key for testing |
E2E Testing
| Variable | Description | Default |
|---|---|---|
EPOCH_BIN | Path to CLI binary for tests | ./target/debug/epoch |
E2E_KEEP_TEMP | Keep temporary test directories | false |
E2E_VERBOSE | Verbose test output | false |
Example Development Setup
# .env file for development
DATABASE_URL=postgresql://horizon:horizon_dev_password@localhost:5432/horizon_epoch
EPOCH_METADATA_URL=postgresql://horizon:horizon_dev_password@localhost:5432/horizon_epoch
# MinIO for local S3 testing
EPOCH_S3_ENDPOINT=http://localhost:9000
EPOCH_S3_ACCESS_KEY=minioadmin
EPOCH_S3_SECRET_KEY=minioadmin
EPOCH_S3_BUCKET=horizon-epoch-data
# Development settings
EPOCH_DEBUG=true
EPOCH_LOG_LEVEL=debug
RUST_LOG=horizon_epoch=debug,sqlx=warn
# Author info
EPOCH_AUTHOR_NAME="Development User"
EPOCH_AUTHOR_EMAIL="dev@example.com"
Complete Configuration Example
# epoch.toml - Complete configuration example
# =============================================================================
# Metadata Database
# =============================================================================
[metadata_db]
url = "postgresql://localhost:5432/horizon_epoch"
max_connections = 10
min_connections = 2
# =============================================================================
# Logging
# =============================================================================
[logging]
level = "info"
format = "json" # "pretty" or "json"
# =============================================================================
# DDL Execution
# =============================================================================
[ddl]
lock_timeout_secs = 30
statement_timeout_secs = 300
check_table_stats = true
use_concurrent = false
[ddl.retry]
max_retries = 3
initial_delay_ms = 500
max_delay_ms = 30000
# =============================================================================
# Branching Strategy
# =============================================================================
[branching]
strategy = "environment"
default_branch = "development"
protected_branches = ["staging", "production"]
[branching.promotion]
paths = [
{ source = "development", target = "staging" },
{ source = "staging", target = "production" }
]
require_approval = true
# =============================================================================
# Storage Backends
# =============================================================================
# PostgreSQL backends
[[storage.postgresql]]
name = "primary"
url = "postgresql://localhost:5432/data"
[[storage.postgresql]]
name = "analytics"
host = "analytics-db.internal"
port = 5432
database = "analytics"
[storage.postgresql.credentials]
type = "vault"
addr = "https://vault.example.com:8200"
secret_path = "database/creds/analytics"
[storage.postgresql.tls]
mode = "verify_full"
ca_cert = "/etc/ssl/certs/ca.pem"
# S3 backends
[[storage.s3]]
name = "datalake"
endpoint = "https://s3.amazonaws.com"
bucket = "company-datalake"
region = "us-east-1"
prefix = "epoch/"
[[storage.s3]]
name = "archive"
endpoint = "http://minio.internal:9000"
bucket = "archive"
access_key_id = "${MINIO_ACCESS_KEY}"
secret_access_key = "${MINIO_SECRET_KEY}"
force_path_style = true
Credential Provider Reference
Complete reference for the Horizon Epoch credential provider abstraction. This document covers the architecture, all available providers, credential types, caching, and error handling.
Architecture Overview
The credential provider system provides a unified abstraction for managing credentials across all storage backends.
┌─────────────────────────────────────────┐
│ Storage Backend │
│ (PostgreSQL, S3, Vault, etc.) │
└─────────────────────┬───────────────────┘
│
│ get_credentials()
▼
┌─────────────────────────────────────────┐
│ CachedCredentialProvider │
│ (optional caching layer) │
└─────────────────────┬───────────────────┘
│
┌───────────────────────────────┼───────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Static │ │ Environment │ │ File │
│ Provider │ │ Provider │ │ Provider │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Vault │ │ AWS │ │ Web Identity │
│ Provider │ │ Provider │ │ Provider │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Core Trait: CredentialProvider
All credential providers implement this async trait:
#![allow(unused)]
fn main() {
#[async_trait]
pub trait CredentialProvider: Send + Sync {
/// Retrieves credentials from this provider.
async fn get_credentials(&self) -> CredentialResult<StorageCredentials>;
/// Returns true if credentials need refresh.
fn requires_refresh(&self) -> bool;
/// Refreshes the credentials.
async fn refresh(&self) -> CredentialResult<StorageCredentials>;
/// Returns the recommended refresh interval.
fn refresh_interval(&self) -> Option<Duration>;
/// Returns the provider name for logging.
fn provider_name(&self) -> &str;
/// Returns true if refresh is supported.
fn supports_refresh(&self) -> bool {
self.refresh_interval().is_some()
}
/// Returns metadata for diagnostics.
fn provider_info(&self) -> ProviderInfo;
}
}
Extension Trait: CredentialProviderExt
Additional utility methods available on all providers:
| Method | Description |
|---|---|
get_fresh_credentials() | Gets credentials, refreshing first if needed |
validate() | Validates that credentials can be retrieved |
Credential Types
StorageCredentials
The unified credential container with expiration tracking:
#![allow(unused)]
fn main() {
pub struct StorageCredentials {
/// The specific type of credential
pub credential_type: StorageCredentialType,
/// When these credentials expire (optional)
pub expires_at: Option<DateTime<Utc>>,
/// Additional metadata
pub metadata: HashMap<String, String>,
}
}
Factory Methods:
| Method | Description |
|---|---|
username_password(user, pass) | Create username/password credentials |
token(token) | Create token-based credentials |
aws(access_key, secret_key, session_token) | Create AWS credentials |
oauth2(access_token, refresh_token, token_type) | Create OAuth2 credentials |
certificate(cert, key) | Create certificate credentials |
Builder Methods:
| Method | Description |
|---|---|
.with_expiration(datetime) | Set expiration time |
.with_ttl(duration) | Set TTL relative to now |
.with_metadata(key, value) | Add metadata |
Inspection Methods:
| Method | Description |
|---|---|
is_expired() | Check if credentials are expired |
time_to_expiry() | Get time until expiration |
is_username_password() | Type check |
is_token() | Type check |
is_aws() | Type check |
is_certificate() | Type check |
is_oauth2() | Type check |
StorageCredentialType
The underlying credential type enum:
| Variant | Fields | Use Case |
|---|---|---|
UsernamePassword | username, password | Database authentication |
Token | token | API key/token authentication |
AwsCredentials | access_key_id, secret_access_key, session_token? | AWS services |
Certificate | cert, key | mTLS client certificates |
OAuth2 | access_token, refresh_token?, token_type | OAuth2 authentication |
Certificate Types
CertificateData:
#![allow(unused)]
fn main() {
pub struct CertificateData {
pub pem: SecretString, // PEM-encoded certificate
pub chain: Option<Vec<SecretString>>, // Optional chain
}
}
PrivateKeyData:
#![allow(unused)]
fn main() {
pub struct PrivateKeyData {
pub pem: SecretString, // PEM-encoded private key
pub passphrase: Option<SecretString>, // Optional passphrase
}
}
Built-in Providers
Static Provider
Pre-configured credentials for testing and development. Never use in production.
Factory Methods:
#![allow(unused)]
fn main() {
StaticCredentialProvider::username_password("user", "pass")
StaticCredentialProvider::token("api-key")
StaticCredentialProvider::aws("AKIA...", "secret", None)
StaticCredentialProvider::oauth2("access", Some("refresh"), "Bearer")
StaticCredentialProvider::certificate(cert_data, key_data)
}
Builder Pattern:
#![allow(unused)]
fn main() {
StaticCredentialProviderBuilder::new()
.username("admin")
.password("secret")
.build()
}
Configuration (TOML):
[storage.postgres.mydb]
# Static credentials embedded in config (dev only!)
url = "postgresql://user:password@localhost/db"
| Property | Required | Description |
|---|---|---|
url | Yes | Full connection URL with credentials |
username | Alternative | Username (with separate password) |
password | With username | Password |
Refresh Support: No
Environment Provider
Reads credentials from environment variables.
Factory Methods:
#![allow(unused)]
fn main() {
// PostgreSQL standard: PGUSER, PGPASSWORD
EnvCredentialProvider::postgres_default()
// AWS standard: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
EnvCredentialProvider::aws_default()
// Custom prefix: MYAPP_DB_USERNAME, MYAPP_DB_PASSWORD
EnvCredentialProvider::with_prefix("MYAPP_DB")
}
Builder Pattern:
#![allow(unused)]
fn main() {
EnvCredentialProviderBuilder::new()
.username_var("DB_USER")
.password_var("DB_PASS")
.credential_type(EnvCredentialType::UsernamePassword)
.build()
}
Environment Variable Mappings:
| Preset | Username | Password | Access Key | Secret Key | Session Token |
|---|---|---|---|---|---|
postgres_default() | PGUSER | PGPASSWORD | - | - | - |
mysql_default() | MYSQL_USER | MYSQL_PASSWORD | - | - | - |
mssql_default() | MSSQL_USER | MSSQL_PASSWORD | - | - | - |
aws_default() | - | - | AWS_ACCESS_KEY_ID | AWS_SECRET_ACCESS_KEY | AWS_SESSION_TOKEN |
with_prefix("X") | X_USERNAME | X_PASSWORD | X_ACCESS_KEY | X_SECRET_KEY | X_SESSION_TOKEN |
Configuration (TOML):
[storage.postgres.mydb]
url = "${EPOCH_POSTGRES_MYDB_URL}" # Variable substitution
# Or individual fields
host = "db.example.com"
username = "${DB_USER}"
password = "${DB_PASS}"
Refresh Support: Yes (re-reads environment on refresh)
File Provider
Reads credentials from files in multiple formats.
Factory Methods:
#![allow(unused)]
fn main() {
FileCredentialProvider::json("/etc/secrets/db.json")
FileCredentialProvider::yaml("/etc/secrets/db.yaml")
FileCredentialProvider::ini("/etc/secrets/db.ini")
FileCredentialProvider::env("/etc/secrets/.env")
FileCredentialProvider::aws_credentials() // ~/.aws/credentials
FileCredentialProvider::certificate("/path/cert.pem", "/path/key.pem")
}
Builder Pattern:
#![allow(unused)]
fn main() {
FileCredentialProviderBuilder::new()
.path("/etc/secrets/creds.json")
.format(CredentialFileFormat::Json)
.profile("production") // For INI files
.build()
}
Supported Formats:
| Format | Extension | Example |
|---|---|---|
| JSON | .json | {"username": "x", "password": "y"} |
| YAML | .yaml, .yml | username: xpassword: y |
| INI | .ini, .cfg | [default]username = x |
| ENV | .env | USERNAME=xPASSWORD=y |
| PEM | .pem, .crt, .key | Certificate/key files |
Configuration (TOML):
[storage.postgres.mydb]
password_file = "/run/secrets/db-password"
# Or JSON credentials file
credentials_file = "/etc/secrets/db.json"
Refresh Support: Yes (re-reads file on refresh)
AWS Provider
Retrieves credentials from AWS services.
Secrets Manager Provider
#![allow(unused)]
fn main() {
SecretsManagerProvider::new("my-secret-id")
.region("us-east-1")
.field_mapping(SecretFieldMapping::default())
.build()
}
Configuration (TOML):
[storage.postgres.mydb]
aws_secret_id = "prod/db/credentials"
[aws]
region = "us-east-1"
| Property | Required | Description |
|---|---|---|
aws_secret_id | Yes | Secrets Manager secret ID or ARN |
region | No | AWS region (default: from environment) |
RDS IAM Provider
Generates temporary IAM authentication tokens for RDS.
#![allow(unused)]
fn main() {
RdsIamProvider::new()
.host("mydb.cluster-xxx.us-east-1.rds.amazonaws.com")
.port(5432)
.username("iam_user")
.region("us-east-1")
.build()
}
Configuration (TOML):
[storage.postgres.mydb]
host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com"
use_iam_auth = true
username = "iam_user"
| Property | Required | Description |
|---|---|---|
use_iam_auth | Yes | Enable RDS IAM authentication |
host | Yes | RDS endpoint |
username | Yes | IAM database user |
Credential Sources
#![allow(unused)]
fn main() {
pub enum AwsCredentialSource {
/// Default AWS credential chain
DefaultChain,
/// Explicit credentials
Explicit {
access_key_id: String,
secret_access_key: String,
session_token: Option<String>,
},
/// Assume a role
AssumeRole {
role_arn: String,
external_id: Option<String>,
session_name: Option<String>,
session_tags: Option<HashMap<String, String>>,
},
/// Named AWS profile
Profile { name: String },
/// Web Identity (OIDC)
WebIdentity { /* ... */ },
}
}
Refresh Support: Yes (automatic token refresh)
Vault Provider
Retrieves credentials from HashiCorp Vault.
#![allow(unused)]
fn main() {
VaultCredentialProvider::new(VaultConfig {
addr: "https://vault.example.com:8200".to_string(),
namespace: Some("my-namespace".to_string()),
auth: VaultAuthMethod::AppRole {
role_id: "xxx".to_string(),
secret_id: "yyy".to_string(),
},
tls: Some(VaultTlsConfig {
ca_cert: Some("/etc/ssl/vault-ca.crt".to_string()),
client_cert: None,
client_key: None,
}),
})
.secret_path("secret/data/mydb")
.secret_type(VaultSecretType::KvV2)
.build()
}
Authentication Methods:
| Method | Configuration |
|---|---|
| Token | auth_method = "token", token = "hvs.xxx" |
| AppRole | auth_method = "approle", role_id, secret_id |
| Kubernetes | auth_method = "kubernetes", role |
| AWS IAM | auth_method = "aws", role |
| TLS | auth_method = "tls", client certificate |
| UserPass | auth_method = "userpass", username, password |
Secret Types:
| Type | Description |
|---|---|
KvV2 | Key-Value v2 secrets engine (static secrets) |
Database | Database secrets engine (dynamic credentials) |
Configuration (TOML):
[vault]
addr = "https://vault.example.com:8200"
auth_method = "approle"
role_id = "xxx-xxx-xxx"
secret_id_file = "/run/secrets/vault-secret-id"
ca_cert = "/etc/ssl/certs/vault-ca.crt"
namespace = "my-namespace"
[storage.postgres.mydb]
vault_path = "secret/data/mydb"
# Or for dynamic credentials:
vault_role = "my-db-role"
| Property | Required | Description |
|---|---|---|
addr | Yes | Vault server address |
auth_method | Yes | Authentication method |
vault_path | For KV | Path to KV secret |
vault_role | For DB | Database role name |
namespace | No | Vault namespace |
Refresh Support: Yes (lease renewal, token refresh)
Vault PKI Provider
Issues X.509 certificates from Vault’s PKI secrets engine.
#![allow(unused)]
fn main() {
VaultPkiProvider::new(vault_config)
.pki_path("pki/issue/my-role")
.common_name("myapp.example.com")
.ttl(Duration::from_secs(86400))
.dns_sans(vec!["myapp.example.com", "myapp-alt.example.com"])
.build()
}
Configuration (TOML):
[storage.postgres.secure]
vault_pki_path = "pki/issue/postgres-client"
vault_pki_common_name = "epoch-client"
vault_pki_ttl = "24h"
| Property | Required | Description |
|---|---|---|
vault_pki_path | Yes | PKI issue path |
vault_pki_common_name | Yes | Certificate CN |
vault_pki_ttl | No | Certificate TTL |
vault_pki_dns_sans | No | DNS Subject Alt Names |
vault_pki_ip_sans | No | IP Subject Alt Names |
Refresh Support: Yes (automatic certificate renewal)
Web Identity Provider
Uses OIDC tokens for AWS authentication.
#![allow(unused)]
fn main() {
WebIdentityCredentialProvider::new(WebIdentityConfig {
role_arn: "arn:aws:iam::123456789012:role/MyRole".to_string(),
web_identity_token_file: Some("/var/run/secrets/token".to_string()),
session_name: Some("my-session".to_string()),
provider_url: Some("https://oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E".to_string()),
duration_seconds: Some(3600),
policy_arns: None,
inline_policy: None,
})
.build()
}
Supported OIDC Providers:
| Provider | Token Source |
|---|---|
| GitHub Actions | GITHUB_TOKEN or ACTIONS_ID_TOKEN_REQUEST_* |
| GitLab CI/CD | CI_JOB_JWT |
| Azure AD / Entra ID | Azure managed identity |
| Google Cloud | GCP service account |
| Kubernetes (EKS/GKE) | Service account token file |
| Custom | Token file or direct token |
Configuration (TOML):
[aws]
role_arn = "arn:aws:iam::123456789012:role/MyRole"
web_identity_token_file = "/var/run/secrets/token"
Refresh Support: Yes (token refresh on expiration)
SSO Provider
Integrates with AWS IAM Identity Center (formerly AWS SSO).
#![allow(unused)]
fn main() {
SsoCredentialProvider::new(SsoConfig {
start_url: "https://my-sso-portal.awsapps.com/start".to_string(),
region: "us-east-1".to_string(),
account_id: "123456789012".to_string(),
role_name: "MyRole".to_string(),
})
.build()
}
Uses the OAuth 2.0 Device Authorization Grant flow for browser-based authentication.
Configuration (TOML):
[aws.sso]
start_url = "https://my-sso-portal.awsapps.com/start"
region = "us-east-1"
account_id = "123456789012"
role_name = "MyRole"
Refresh Support: Yes (automatic token refresh, cached tokens)
Credential Caching
Wrap any provider with CachedCredentialProvider to reduce external calls.
#![allow(unused)]
fn main() {
// Simple caching
let cached = CachedCredentialProvider::new(inner_provider);
// With configuration
let cached = CachedCredentialProvider::new(inner_provider)
.with_refresh_buffer(Duration::from_secs(120)) // Refresh 2 min before expiry
.with_max_cache_age(Duration::from_secs(3600)); // Max 1 hour cache
// Using extension trait
let cached = inner_provider.with_cache();
// Using builder
let cached = inner_provider.cached()
.refresh_buffer(Duration::from_secs(60))
.max_cache_age(Duration::from_secs(3600))
.build();
}
Cache Configuration:
| Parameter | Default | Description |
|---|---|---|
refresh_buffer | 60s | How long before expiry to refresh |
max_cache_age | None | Max cache age for non-expiring credentials |
Cache Operations:
| Method | Description |
|---|---|
invalidate() | Force cache invalidation |
cache_stats() | Get cache statistics |
CacheStats Fields:
| Field | Type | Description |
|---|---|---|
is_cached | bool | Whether credentials are cached |
cache_age | Option<Duration> | How long cached |
time_to_refresh | Option<Duration> | Time until auto-refresh |
is_expired | bool | Whether cached credentials expired |
Configuration (TOML):
[credentials]
cache_enabled = true
cache_ttl = 300 # 5 minutes
cache_max_size = 100 # Max entries
refresh_before_expiry = 60 # Refresh 60s before expiry
Provider Chains
Chain multiple providers for fallback behavior:
#![allow(unused)]
fn main() {
use std::sync::Arc;
let env_provider = Arc::new(EnvCredentialProvider::postgres_default());
let file_provider = Arc::new(FileCredentialProvider::json("/etc/db.json"));
let vault_provider = Arc::new(VaultCredentialProvider::new(config));
let chain = ChainedCredentialProvider::new(vec![
env_provider,
file_provider,
vault_provider,
]).with_name("my-credential-chain");
// Tries each provider in order until one succeeds
let creds = chain.get_credentials().await?;
}
Chain Behavior:
- Tries providers in order
- Returns first successful result
- Falls through on any error
- Collects last error if all fail
- Refresh interval = minimum of all providers
Error Types
CredentialError Variants
| Variant | Description | Retryable |
|---|---|---|
NotFound | Credentials not found | No |
Expired | Credentials expired | No |
ProviderUnavailable | Provider unreachable | Yes |
AuthFailed | Authentication failed | No |
Network | Network error | Yes |
Config | Configuration error | No |
FileRead | File read error | No |
FileParse | File parse error | No |
EnvVarNotSet | Missing env var | No |
EnvVarInvalid | Invalid env var value | No |
MissingField | Required field missing | No |
RefreshNotSupported | Refresh not supported | No |
RefreshFailed | Refresh failed | Maybe |
InvalidFormat | Invalid format | No |
Certificate | Certificate error | No |
PrivateKey | Private key error | No |
Error Helper Methods:
| Method | Description |
|---|---|
is_not_found() | Check if NotFound |
is_expired() | Check if Expired |
is_retryable() | Check if safe to retry |
Implementing Custom Providers
Implement the CredentialProvider trait for custom credential sources:
#![allow(unused)]
fn main() {
use async_trait::async_trait;
use horizon_epoch_core::storage::credentials::{
CredentialProvider, StorageCredentials, CredentialError, CredentialResult,
};
use std::time::Duration;
struct MyProvider {
config: MyConfig,
}
#[async_trait]
impl CredentialProvider for MyProvider {
async fn get_credentials(&self) -> CredentialResult<StorageCredentials> {
// Fetch from your credential source
let secret = self.fetch_secret().await?;
Ok(StorageCredentials::username_password(
secret.username,
secret.password,
).with_ttl(Duration::from_secs(3600)))
}
fn requires_refresh(&self) -> bool {
// Check if credentials need refresh
false
}
async fn refresh(&self) -> CredentialResult<StorageCredentials> {
// Refresh credentials
self.get_credentials().await
}
fn refresh_interval(&self) -> Option<Duration> {
// Recommended refresh interval
Some(Duration::from_secs(300))
}
fn provider_name(&self) -> &str {
"my_provider"
}
}
}
Security Considerations
Secret Protection
All sensitive values use secrecy::SecretString:
- Not printed in Debug output
- Zeroized on drop
- Explicit
.expose_secret()required
Best Practices
- Never log credentials - Use structured logging carefully
- Use environment variables - Avoid hardcoded credentials
- Enable caching - Reduce calls to external services
- Check expiration - Always verify before use
- Rotate regularly - Use providers that support rotation
- Least privilege - Grant minimal required permissions
Secure Configuration Order
- Vault/Secrets Manager (production)
- Environment variables (containers)
- Credential files with proper permissions (Kubernetes secrets)
- Never: hardcoded credentials in config files
Feature Flags
Credential providers are enabled via Cargo features:
| Feature | Providers Enabled |
|---|---|
vault | VaultCredentialProvider, VaultPkiProvider |
aws | SecretsManagerProvider, RdsIamProvider, SsoProvider, WebIdentityProvider |
Core providers (Static, Environment, File) are always available.
Database Backend Credentials
PostgreSQL
PostgreSQL credentials can be provided via:
| Method | Configuration |
|---|---|
| Environment | PGUSER, PGPASSWORD or custom prefix |
| Connection URL | postgresql://user:pass@host/database |
| Vault | KV secret or database secrets engine |
| AWS RDS IAM | Token-based authentication |
| AWS Secrets Manager | Stored credentials |
Example (Environment):
export PGUSER="myuser"
export PGPASSWORD="mypassword"
Example (Vault):
[storage.postgres.prod]
vault_path = "secret/data/postgres/prod"
MySQL
MySQL credentials can be provided via:
| Method | Configuration |
|---|---|
| Environment | MYSQL_USER, MYSQL_PASSWORD or custom prefix |
| Connection URL | mysql://user:pass@host/database |
| Vault | KV secret or database secrets engine |
| AWS RDS IAM | Token-based authentication (Aurora MySQL) |
| AWS Secrets Manager | Stored credentials |
Example (Environment):
export MYSQL_USER="myuser"
export MYSQL_PASSWORD="mypassword"
Example (TOML):
[storage.mysql.analytics]
host = "mysql.example.com"
database = "analytics"
username = "${MYSQL_USER}"
password = "${MYSQL_PASSWORD}"
SQL Server (MSSQL)
SQL Server credentials can be provided via:
| Method | Configuration |
|---|---|
| Environment | MSSQL_USER, MSSQL_PASSWORD or custom prefix |
| Connection URL | mssql://user:pass@host/database |
| Vault | KV secret or database secrets engine |
| Windows Authentication | Integrated security (Windows only) |
| Azure AD | Token-based authentication (Azure SQL) |
Example (Environment):
export MSSQL_USER="sa"
export MSSQL_PASSWORD="YourPassword123!"
Example (TOML):
[storage.mssql.warehouse]
host = "sqlserver.example.com"
database = "warehouse"
username = "${MSSQL_USER}"
password = "${MSSQL_PASSWORD}"
encrypt = true
trust_cert = false
Example (Windows Authentication):
[storage.mssql.warehouse]
host = "sqlserver.internal"
database = "warehouse"
use_windows_auth = true
SQLite
SQLite typically does not require credentials as it uses file-based storage. However, encrypted SQLite databases (SQLCipher) may require a passphrase.
| Method | Configuration |
|---|---|
| No credentials | Default for unencrypted databases |
| Environment | SQLITE_PASSPHRASE for encrypted databases |
| File | Passphrase file for SQLCipher |
Example (Unencrypted):
[storage.sqlite.local]
path = "/data/local.db"
# No credentials needed
Example (Encrypted with SQLCipher):
[storage.sqlite.secure]
path = "/data/secure.db"
passphrase = "${SQLITE_PASSPHRASE}"
See Also
- Credential Providers How-To - Practical usage guide
- Vault Integration - HashiCorp Vault setup
- AWS Secrets - AWS credential management
- mTLS Authentication - Certificate authentication
- Configuration Reference - Full configuration options
Constraint Reference
Horizon Epoch versions database constraints alongside your data, enabling schema evolution with the same Git-like branching, merging, and conflict detection you use for data.
Constraint Types
Horizon Epoch supports seven constraint types that map to PostgreSQL DDL:
| Type | Rust Struct | PostgreSQL DDL |
|---|---|---|
| Foreign Key | ForeignKeyConstraint | ALTER TABLE ADD CONSTRAINT ... FOREIGN KEY ... REFERENCES |
| Unique | UniqueConstraint | ALTER TABLE ADD CONSTRAINT ... UNIQUE |
| Check | CheckConstraint | ALTER TABLE ADD CONSTRAINT ... CHECK (expression) |
| Exclusion | ExclusionConstraint | ALTER TABLE ADD CONSTRAINT ... EXCLUDE USING |
| Index | IndexDefinition | CREATE INDEX |
| Enum Type | EnumType | CREATE TYPE ... AS ENUM |
| Default Value | DefaultValue | ALTER COLUMN ... SET DEFAULT |
Foreign Key Constraints
Foreign keys define referential integrity between tables.
Structure
#![allow(unused)]
fn main() {
ForeignKeyConstraint {
name: String, // Constraint name
columns: Vec<String>, // Local columns
references_table: String, // Referenced table
references_columns: Vec<String>, // Referenced columns
on_update: ReferentialAction, // Action on parent update
on_delete: ReferentialAction, // Action on parent delete
deferrable: bool, // Can be deferred
initially_deferred: bool, // Deferred by default
}
}
Referential Actions
| Action | Description |
|---|---|
NoAction | Raise error if referenced rows exist (default) |
Restrict | Same as NoAction, checked immediately |
Cascade | Delete/update child rows automatically |
SetNull | Set child columns to NULL |
SetDefault | Set child columns to default value |
PostgreSQL Mapping
-- Generated DDL
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers (id)
ON DELETE CASCADE ON UPDATE NO ACTION;
Python Example
from horizon_epoch import Schema, ForeignKeyConstraint
schema = Schema.builder() \
.add_field("customer_id", "integer") \
.add_foreign_key(
name="fk_orders_customer",
columns=["customer_id"],
references_table="customers",
references_columns=["id"],
on_delete="cascade"
) \
.build()
Unique Constraints
Unique constraints ensure no duplicate values exist in the specified columns.
Structure
#![allow(unused)]
fn main() {
UniqueConstraint {
name: String, // Constraint name
columns: Vec<String>, // Columns in the constraint
nulls_distinct: bool, // PostgreSQL 15+ NULLS NOT DISTINCT
deferrable: bool, // Can be deferred
initially_deferred: bool, // Deferred by default
index_method: Option<String>, // Index method (btree, hash)
}
}
PostgreSQL Mapping
-- Simple unique constraint
ALTER TABLE users ADD CONSTRAINT uq_users_email
UNIQUE (email);
-- Composite unique constraint
ALTER TABLE order_items ADD CONSTRAINT uq_order_product
UNIQUE (order_id, product_id);
-- PostgreSQL 15+ with NULLS NOT DISTINCT
ALTER TABLE users ADD CONSTRAINT uq_users_phone
UNIQUE NULLS NOT DISTINCT (phone);
Python Example
schema = Schema.builder() \
.add_field("email", "text") \
.add_unique_constraint(
name="uq_users_email",
columns=["email"]
) \
.build()
Check Constraints
Check constraints validate that column values satisfy a boolean expression.
Structure
#![allow(unused)]
fn main() {
CheckConstraint {
name: String, // Constraint name
expression: String, // SQL boolean expression
referenced_columns: Vec<String>, // Columns used in expression
is_not_null: bool, // Is this a NOT NULL constraint?
no_inherit: bool, // PostgreSQL NO INHERIT option
}
}
PostgreSQL Mapping
-- Value range check
ALTER TABLE products ADD CONSTRAINT chk_products_price
CHECK (price > 0);
-- Pattern check
ALTER TABLE users ADD CONSTRAINT chk_users_email
CHECK (email ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');
-- NOT NULL (special case)
ALTER TABLE users ALTER COLUMN name SET NOT NULL;
Python Example
schema = Schema.builder() \
.add_field("price", "numeric") \
.add_check_constraint(
name="chk_products_price",
expression="price > 0"
) \
.build()
Exclusion Constraints
Exclusion constraints prevent overlapping or conflicting values using operators.
Structure
#![allow(unused)]
fn main() {
ExclusionConstraint {
name: String, // Constraint name
index_method: String, // Usually "gist" or "btree"
columns: Vec<String>, // Columns for display
elements: Vec<ExclusionElement>, // Column/operator pairs
where_clause: Option<String>, // Partial exclusion
deferrable: bool, // Can be deferred
initially_deferred: bool, // Deferred by default
}
ExclusionElement {
column: String, // Column name or expression
operator: String, // Comparison operator
opclass: Option<String>, // Operator class
}
}
PostgreSQL Mapping
-- Prevent overlapping time ranges
ALTER TABLE reservations ADD CONSTRAINT excl_reservations_overlap
EXCLUDE USING gist (
room_id WITH =,
tsrange(start_time, end_time) WITH &&
);
-- Prevent overlapping circles (geometric)
ALTER TABLE zones ADD CONSTRAINT excl_zones_overlap
EXCLUDE USING gist (area WITH &&);
Python Example
schema = Schema.builder() \
.add_field("room_id", "integer") \
.add_field("start_time", "timestamp") \
.add_field("end_time", "timestamp") \
.add_exclusion_constraint(
name="excl_reservations_overlap",
index_method="gist",
elements=[
{"column": "room_id", "operator": "="},
{"column": "tsrange(start_time, end_time)", "operator": "&&"}
]
) \
.build()
Index Definitions
Indexes improve query performance and can enforce uniqueness.
Structure
#![allow(unused)]
fn main() {
IndexDefinition {
name: String, // Index name
method: IndexMethod, // btree, hash, gist, gin, brin
columns: Vec<IndexColumn>, // Columns with sort options
is_unique: bool, // UNIQUE index
where_clause: Option<String>, // Partial index
include_columns: Vec<String>, // Covering index (INCLUDE)
storage_parameters: HashMap<String, String>, // fillfactor, etc.
concurrent: bool, // CONCURRENTLY option
if_not_exists: bool, // IF NOT EXISTS
}
IndexColumn {
name: String, // Column name or expression
sort_order: Option<SortOrder>, // ASC or DESC
nulls_order: Option<NullsOrder>, // NULLS FIRST or NULLS LAST
opclass: Option<String>, // Operator class
}
}
Index Methods
| Method | Use Case |
|---|---|
btree | Default; equality and range queries |
hash | Equality comparisons only |
gist | Geometric, full-text, range types |
gin | Arrays, JSONB, full-text search |
brin | Large tables with natural ordering |
PostgreSQL Mapping
-- Basic index
CREATE INDEX idx_users_email ON users (email);
-- Composite index with sort order
CREATE INDEX idx_orders_date_desc ON orders (created_at DESC NULLS LAST);
-- Partial index
CREATE INDEX idx_orders_pending ON orders (status)
WHERE status = 'pending';
-- Covering index (PostgreSQL 11+)
CREATE INDEX idx_users_email_include ON users (email)
INCLUDE (name, created_at);
-- GIN index for JSONB
CREATE INDEX idx_products_attrs ON products USING gin (attributes);
Python Example
schema = Schema.builder() \
.add_field("email", "text") \
.add_index(
name="idx_users_email",
columns=[{"name": "email"}],
method="btree"
) \
.build()
Enum Types
Enum types define a fixed set of allowed values.
Structure
#![allow(unused)]
fn main() {
EnumType {
name: String, // Type name
values: Vec<String>, // Ordered enum values
schema: Option<String>, // Schema name (optional)
}
}
PostgreSQL Mapping
-- Create enum type
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered');
-- Add value to enum (PostgreSQL is additive-only for enums)
ALTER TYPE order_status ADD VALUE 'cancelled' AFTER 'delivered';
-- Use in table
ALTER TABLE orders ALTER COLUMN status TYPE order_status
USING status::order_status;
Enum Diffing
When comparing enum types between branches, Horizon Epoch tracks:
- Added values: Safe to add in migrations
- Removed values: Requires data migration (destructive)
- Reordered values: May affect comparisons
# Check if enum change is safe (additive only)
enum_diff = client.diff_enums("main", "feature-branch")
if enum_diff.is_additive():
print("Safe to merge - only new values added")
else:
print("Warning: Enum values removed or reordered")
Python Example
schema = Schema.builder() \
.add_enum_type(
name="order_status",
values=["pending", "processing", "shipped", "delivered"]
) \
.add_field("status", "order_status") \
.build()
Default Values
Default values specify what value to use when none is provided.
Structure
#![allow(unused)]
fn main() {
enum DefaultValue {
Literal(String), // Literal value: 'active', 42, true
Function(String), // Function call: gen_random_uuid()
Sequence(String), // Sequence: nextval('users_id_seq')
Expression(String), // Expression: CURRENT_DATE + INTERVAL '30 days'
CurrentTimestamp, // CURRENT_TIMESTAMP / NOW()
CurrentDate, // CURRENT_DATE
CurrentUser, // CURRENT_USER
Null, // NULL (explicit)
}
}
PostgreSQL Mapping
-- Literal default
ALTER TABLE users ALTER COLUMN status SET DEFAULT 'active';
-- Function default
ALTER TABLE users ALTER COLUMN id SET DEFAULT gen_random_uuid();
-- Timestamp default
ALTER TABLE users ALTER COLUMN created_at SET DEFAULT CURRENT_TIMESTAMP;
-- Expression default
ALTER TABLE subscriptions ALTER COLUMN expires_at
SET DEFAULT CURRENT_DATE + INTERVAL '30 days';
Determinism
Horizon Epoch tracks whether defaults are deterministic:
| Default Type | Deterministic | Notes |
|---|---|---|
Literal | Yes | Same value every time |
Null | Yes | Always NULL |
Function | No | May return different values |
Sequence | No | Returns next value |
CurrentTimestamp | No | Current time |
CurrentDate | No | Current date |
CurrentUser | No | Depends on session |
Expression | Varies | Analyzed at runtime |
Storage Backend Support
Different storage backends support constraints at different levels:
Constraint Support Matrix
| Backend | Support Level | Primary Key | Foreign Key | Unique | Check | Index |
|---|---|---|---|---|---|---|
| PostgreSQL | Full | Enforced | Enforced | Enforced | Enforced | Enforced |
| MySQL | Full | Enforced | Enforced | Enforced | Enforced | Enforced |
| SQL Server | Full | Enforced | Enforced | Enforced | Enforced | Enforced |
| SQLite | Partial | Enforced | Optional* | Enforced | Enforced | Enforced |
| S3/Delta Lake | Metadata | Merge key | Metadata | Metadata | Metadata | N/A |
| Azure Blob | Metadata | Merge key | Metadata | Metadata | Metadata | N/A |
| GCS | Metadata | Merge key | Metadata | Metadata | Metadata | N/A |
| Local FS | Metadata | Merge key | Metadata | Metadata | Metadata | N/A |
* SQLite requires PRAGMA foreign_keys = ON to enforce foreign keys.
PostgreSQL (Full Support)
PostgreSQL provides ConstraintSupportLevel::Full:
| Constraint Type | Support | Enforcement |
|---|---|---|
| Primary Key | Full | Enforced via DDL |
| Foreign Key | Full | Enforced via DDL |
| Unique | Full | Enforced via DDL |
| Check | Full | Enforced via DDL |
| Exclusion | Full | Enforced via DDL |
| Index | Full | Created via DDL |
| Enum Type | Full | Created via DDL |
| Default Value | Full | Applied via DDL |
MySQL (Full Support)
MySQL provides ConstraintSupportLevel::Full:
| Constraint Type | Support | Notes |
|---|---|---|
| Primary Key | Full | Enforced via DDL |
| Foreign Key | Full | Enforced via DDL (InnoDB only) |
| Unique | Full | Enforced via DDL |
| Check | Full | Enforced via DDL (MySQL 8.0.16+) |
| Exclusion | Not Supported | MySQL-specific alternative: unique constraints |
| Index | Full | Created via DDL |
| Enum Type | Full | Native ENUM type support |
| Default Value | Full | Applied via DDL |
SQL Server (Full Support)
SQL Server provides ConstraintSupportLevel::Full:
| Constraint Type | Support | Notes |
|---|---|---|
| Primary Key | Full | Enforced via DDL |
| Foreign Key | Full | Enforced via DDL |
| Unique | Full | Enforced via DDL |
| Check | Full | Enforced via DDL |
| Exclusion | Not Supported | Use unique constraints or triggers |
| Index | Full | Created via DDL |
| Enum Type | Partial | Use CHECK constraints for enum-like behavior |
| Default Value | Full | Applied via DDL |
SQLite (Partial Support)
SQLite provides ConstraintSupportLevel::Partial:
| Constraint Type | Support | Notes |
|---|---|---|
| Primary Key | Full | Enforced via DDL |
| Foreign Key | Partial | Requires PRAGMA foreign_keys = ON |
| Unique | Full | Enforced via DDL |
| Check | Full | Enforced via DDL |
| Exclusion | Not Supported | SQLite-specific; not available |
| Index | Full | Created via DDL |
| Enum Type | Metadata | Use CHECK constraints for validation |
| Default Value | Full | Applied via DDL |
S3/Delta Lake (Metadata Only)
S3/Delta Lake provides ConstraintSupportLevel::MetadataOnly:
| Constraint Type | Support | Notes |
|---|---|---|
| Primary Key | Partial | Used as merge key; uniqueness not enforced |
| NOT NULL | Enforced | Via Arrow schema nullability |
| Foreign Key | Metadata | Stored for documentation; no enforcement |
| Unique | Metadata | Stored for documentation; duplicates allowed |
| Check | Metadata | Stored for documentation; expressions not evaluated |
| Exclusion | Metadata | PostgreSQL-specific; not enforced |
| Index | Metadata | Delta Lake handles its own indexing |
| Enum Type | Partial | Can use Arrow dictionary encoding |
| Default Value | Metadata | Stored for documentation |
Azure Blob Storage (Metadata Only)
Azure Blob Storage provides ConstraintSupportLevel::MetadataOnly:
| Constraint Type | Support | Notes |
|---|---|---|
| Primary Key | Partial | Used as merge key; uniqueness not enforced |
| NOT NULL | Enforced | Via Arrow schema nullability |
| Foreign Key | Metadata | Stored for documentation; no enforcement |
| Unique | Metadata | Stored for documentation; duplicates allowed |
| Check | Metadata | Stored for documentation; expressions not evaluated |
| Exclusion | Metadata | Not applicable to blob storage |
| Index | Metadata | Not applicable to blob storage |
| Enum Type | Partial | Can use Arrow dictionary encoding |
| Default Value | Metadata | Stored for documentation |
Google Cloud Storage (Metadata Only)
GCS provides ConstraintSupportLevel::MetadataOnly:
| Constraint Type | Support | Notes |
|---|---|---|
| Primary Key | Partial | Used as merge key; uniqueness not enforced |
| NOT NULL | Enforced | Via Arrow schema nullability |
| Foreign Key | Metadata | Stored for documentation; no enforcement |
| Unique | Metadata | Stored for documentation; duplicates allowed |
| Check | Metadata | Stored for documentation; expressions not evaluated |
| Exclusion | Metadata | Not applicable to object storage |
| Index | Metadata | Not applicable to object storage |
| Enum Type | Partial | Can use Arrow dictionary encoding |
| Default Value | Metadata | Stored for documentation |
Local Filesystem (Metadata Only)
Local filesystem provides ConstraintSupportLevel::MetadataOnly:
| Constraint Type | Support | Notes |
|---|---|---|
| Primary Key | Partial | Used as merge key; uniqueness not enforced |
| NOT NULL | Enforced | Via Arrow schema nullability |
| Foreign Key | Metadata | Stored for documentation; no enforcement |
| Unique | Metadata | Stored for documentation; duplicates allowed |
| Check | Metadata | Stored for documentation; expressions not evaluated |
| Exclusion | Metadata | Not applicable to file storage |
| Index | Metadata | Parquet/Delta handles its own indexing |
| Enum Type | Partial | Can use Arrow dictionary encoding |
| Default Value | Metadata | Stored for documentation |
ConstraintSupportLevel Enum
#![allow(unused)]
fn main() {
pub enum ConstraintSupportLevel {
Full, // All constraints enforced (PostgreSQL)
MetadataOnly, // Constraints stored but not enforced (S3)
Partial, // Some constraints work, others don't
None, // No constraint support
}
impl ConstraintSupportLevel {
pub fn can_enforce(&self) -> bool; // True only for Full
pub fn can_store(&self) -> bool; // True for Full, MetadataOnly, Partial
pub fn can_execute_ddl(&self) -> bool; // True only for Full
}
}
Cross-Backend Considerations
When working with mixed storage backends:
- Constraints are always stored in metadata - Even if not enforced, constraints are tracked
- DDL generation targets PostgreSQL - Use
PostgresDdlGeneratorfor migrations - Merge conflicts work everywhere - Constraint conflicts are detected regardless of enforcement
- Validation is backend-specific - Check
can_enforce()before relying on constraint enforcement
TableConstraints Container
All constraints for a table are grouped in TableConstraints:
#![allow(unused)]
fn main() {
pub struct TableConstraints {
pub foreign_keys: Vec<ForeignKeyConstraint>,
pub unique_constraints: Vec<UniqueConstraint>,
pub check_constraints: Vec<CheckConstraint>,
pub exclusion_constraints: Vec<ExclusionConstraint>,
pub indexes: Vec<IndexDefinition>,
pub enum_types: Vec<EnumType>,
}
}
Key Methods
#![allow(unused)]
fn main() {
// Get all constraints as enum variants
let all: Vec<Constraint> = constraints.all_constraints();
// Find constraint by name
let fk: Option<&ForeignKeyConstraint> = constraints.find_foreign_key("fk_orders_customer");
// Get constraints affecting a column
let affecting: Vec<&Constraint> = constraints.constraints_on_column("customer_id");
// Validate all constraints
let errors: Vec<String> = constraints.validate();
}
See Also
- Schema Versioning Tutorial - Step-by-step guide
- Merge Algorithm - How constraint conflicts are resolved
- Configuration Reference - DDL generation options
Architecture
System Overview
┌─────────────────────────────────────────────────────────────┐
│ User Interfaces │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ CLI │ │ Python SDK │ │ REST API │ │
│ └──────┬──────┘ └──────┬──────┘ └──────────┬──────────┘ │
└─────────┼────────────────┼─────────────────────┼────────────┘
│ │ │
└────────────────┼─────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────┐
│ Rust Core Engine │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Version Control Operations │ │
│ │ branch, commit, merge, diff, promote │ │
│ └────────────────────────────────────────────────────────┘ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Metadata Layer │ │
│ │ commit graph, branch pointers, table versions │ │
│ └────────────────────────────────────────────────────────┘ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Storage Abstraction │ │
│ │ unified interface for all backends │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────────▼───────────────────────────────┐
│ Storage Backends │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐ │
│ │ PostgreSQL │ │ MySQL │ │ SQL Server │ │ SQLite │ │
│ └────────────┘ └────────────┘ └────────────┘ └──────────┘ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐ │
│ │ AWS S3 │ │Azure Blob │ │ GCS │ │ Local │ │
│ └────────────┘ └────────────┘ └────────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
Design Principles
1. Git Semantics
If you know Git, you know Horizon Epoch. We use the same conceptual model:
- Repositories contain versioned data
- Branches are independent lines of development
- Commits are immutable snapshots
- Merges combine branches with conflict detection
2. Storage Agnostic
Work with data where it lives:
- No data migration required
- Each storage backend has an adapter
- Same operations work across all backends
- Mix and match storage types
3. Zero-Copy Branching
Creating a branch is instant:
- No data is duplicated
- Branch is just a pointer to a commit
- Changes are tracked incrementally
- Storage is only used for actual changes
4. Record-Level Tracking
Granular change detection:
- Track individual record changes (not just files)
- Field-level conflict detection
- Efficient storage of deltas
- Precise merge resolution
5. Metadata Separation
Version control metadata is separate from data:
- Metadata stored in dedicated PostgreSQL database
- Data stays in original storage
- No modification to existing data infrastructure
- Easy to add/remove without affecting data
Component Details
Rust Core Engine
The heart of Horizon Epoch, written in Rust for:
- Performance and memory safety
- Concurrent operations
- Cross-platform compatibility
Key modules:
operations/- Branch, commit, merge, diff logicmetadata/- Commit graph, branch managementstorage/- Storage adapter traits and implementations
Metadata Layer
Stores all versioning information:
┌─────────────────────────────────────────┐
│ Metadata Database │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Repositories│ │ Branches │ │
│ └─────────────┘ └─────────────────┘ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Commits │ │ Tables │ │
│ └─────────────┘ └─────────────────┘ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Tags │ │ Change Tracking │ │
│ └─────────────┘ └─────────────────┘ │
└─────────────────────────────────────────┘
Storage Adapters
Each adapter implements a common trait:
#![allow(unused)]
fn main() {
trait StorageAdapter {
fn read_records(&self, table: &str, keys: &[RecordKey]) -> Result<Vec<Record>>;
fn write_records(&self, table: &str, records: &[Record]) -> Result<()>;
fn delete_records(&self, table: &str, keys: &[RecordKey]) -> Result<()>;
fn get_schema(&self, table: &str) -> Result<Schema>;
fn list_tables(&self) -> Result<Vec<String>>;
// ... more methods
}
}
Python Bindings (PyO3)
Python SDK wraps Rust core via PyO3:
- Zero-copy data exchange where possible
- Async support via tokio
- Pythonic API design
- Type hints for IDE support
REST API (FastAPI)
HTTP API for external integrations:
- OpenAPI documentation
- Authentication/authorization
- Rate limiting
- Webhook support
CLI
Command-line interface for developers:
- Git-like commands
- Interactive prompts
- JSON output option
- Shell completion
Data Flow
Commit Operation
1. User: epoch commit -m "Update users"
│
2. CLI: Parse command, validate
│
3. Core: Identify changed records
│
4. Core: Create commit metadata
│
5. Core: Store change delta
│
6. Meta: Update branch pointer
│
7. CLI: Return success + commit ID
Merge Operation
1. User: epoch merge feature/branch
│
2. Core: Find merge base (common ancestor)
│
3. Core: Compute changes from base to source
│
4. Core: Compute changes from base to target
│
5. Core: Identify conflicts
│
┌────┴────┐
│ │
No conflicts Has conflicts
│ │
▼ ▼
6a. Apply 6b. Report
changes conflicts
│ │
7a. Create 7b. Wait for
merge resolution
commit
Query Operation
1. User: SELECT * FROM users WHERE id = 42
│
2. Core: Determine current branch
│
3. Core: Check branch overlay for record
│
┌────┴────┐
│ │
Found in Not in
overlay overlay
│ │
▼ ▼
4a. Return 4b. Query
overlay base data
record │
┌──────┴──────┐
│ │
Found Not found
│ │
▼ ▼
Return Return
record empty
Scalability Considerations
Horizontal Scaling
- Metadata database can be replicated
- Storage backends scale independently
- API servers are stateless
Performance Optimization
- Connection pooling for databases
- Lazy loading of large datasets
- Caching of frequently accessed data
- Batch operations for bulk changes
Large Repository Support
- Incremental operations
- Streaming for large diffs
- Pagination for history
- Sparse checkout support
Security Model
Authentication
- Multiple auth methods supported
- Credential providers for secrets
- Token-based API access
Authorization
- Repository-level permissions
- Branch protection rules
- Audit logging
Data Protection
- TLS for all connections
- Encryption at rest (via storage backends)
- Credential encryption
Future Architecture
Planned enhancements:
- Distributed metadata for global deployments
- Real-time collaboration features
- Advanced caching layer
- Plugin system for custom adapters
Storage Adapters
Storage adapters enable Horizon Epoch to work with different data storage systems while providing a unified interface for version control operations.
Supported Backends
Horizon Epoch supports 8 storage backends:
| Backend | Status | Constraint Support | Use Case |
|---|---|---|---|
| PostgreSQL | Production | Full | Transactional databases |
| MySQL | Production | Full | Transactional databases |
| Microsoft SQL Server | Production | Full | Enterprise databases |
| SQLite | Production | Partial | Local/embedded databases |
| S3/Delta Lake | Production | Metadata Only | Data lakes, analytics |
| Azure Blob Storage | Production | Metadata Only | Cloud data lakes |
| Google Cloud Storage | Production | Metadata Only | Cloud data lakes |
| Local Filesystem | Production | Metadata Only | Development, testing |
Adapter Architecture
┌─────────────────────────────────────────────────────────────┐
│ Storage Abstraction Layer │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ StorageAdapter Trait │ │
│ │ - read_records() - write_records() │ │
│ │ - delete_records() - get_schema() │ │
│ │ - list_tables() - scan_table() │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ PostgreSQL │ │ MySQL │ │ MSSQL │ │ SQLite │
│ Adapter │ │ Adapter │ │ Adapter │ │ Adapter │
└────────────┘ └────────────┘ └────────────┘ └────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ S3/Delta │ │ Azure │ │ GCS │ │ LocalFS │
│ Adapter │ │ Adapter │ │ Adapter │ │ Adapter │
└────────────┘ └────────────┘ └────────────┘ └────────────┘
Common Interface
All storage adapters implement these core operations:
Record Operations
#![allow(unused)]
fn main() {
// Read specific records by primary key
fn read_records(
&self,
table: &str,
keys: &[RecordKey],
branch_context: &BranchContext
) -> Result<Vec<Record>>;
// Write records (insert or update)
fn write_records(
&self,
table: &str,
records: &[Record],
branch_context: &BranchContext
) -> Result<WriteResult>;
// Delete records by primary key
fn delete_records(
&self,
table: &str,
keys: &[RecordKey],
branch_context: &BranchContext
) -> Result<DeleteResult>;
}
Schema Operations
#![allow(unused)]
fn main() {
// Get table schema
fn get_schema(&self, table: &str) -> Result<Schema>;
// Detect schema changes
fn compare_schema(
&self,
table: &str,
old_schema: &Schema
) -> Result<SchemaChanges>;
}
Table Operations
#![allow(unused)]
fn main() {
// List available tables
fn list_tables(&self) -> Result<Vec<TableInfo>>;
// Scan entire table (for initial import or full sync)
fn scan_table(
&self,
table: &str,
options: ScanOptions
) -> Result<RecordStream>;
// Get table statistics
fn get_table_stats(&self, table: &str) -> Result<TableStats>;
}
Relational Database Adapters
PostgreSQL Adapter
Capabilities:
- Direct SQL execution
- ACID transactions
- Full schema introspection
- Constraint enforcement (foreign keys, unique, check, exclusion)
- Connection pooling
- TLS/SSL secure connections
Configuration:
[storage.postgresql.mydb]
url = "postgresql://user:pass@host/database"
pool_size = 10
connect_timeout = 30
statement_timeout = 60000 # 60 seconds
ssl_mode = "require" # disable, prefer, require
MySQL Adapter
Capabilities:
- Full CRUD operations
- Schema introspection via
information_schema - Full constraint support
- 5 SSL/TLS modes
- SSH tunnel support
- Connection pooling
Configuration:
[storage.mysql.mydb]
url = "mysql://user:pass@host/database"
pool_size = 10
ssl_mode = "required" # disabled, preferred, required, verify_ca, verify_identity
Microsoft SQL Server Adapter
Capabilities:
- Full CRUD operations
- Schema introspection via
sys.*catalog - Full constraint support
- TLS/SSL encryption (3 levels)
- Windows and SQL Server authentication
Configuration:
[storage.mssql.mydb]
url = "mssql://user:pass@host/database"
pool_size = 10
encrypt = "required" # off, on, required
trust_server_certificate = false
SQLite Adapter
Capabilities:
- File-based and in-memory databases
- Schema introspection via
PRAGMA - Partial constraint support (foreign keys require
PRAGMA foreign_keys = ON) - WAL journal mode for concurrency
- Connection pooling with health checks
Configuration:
[storage.sqlite.local]
path = "/path/to/database.db"
# Or use in-memory:
# path = ":memory:"
journal_mode = "wal"
foreign_keys = true
Object Storage Adapters
All object storage adapters use Delta Lake format for transactional semantics.
S3/Delta Lake Adapter
Capabilities:
- Delta Lake protocol support
- Parquet file format
- Time travel via Delta versioning
- Schema evolution
- Efficient columnar storage
- AWS S3 and S3-compatible (MinIO, etc.)
Configuration:
[storage.s3.datalake]
bucket = "company-datalake"
region = "us-east-1"
prefix = "horizon-epoch/"
endpoint = "https://s3.amazonaws.com" # Optional, for S3-compatible
delta_log_retention_days = 30
Azure Blob Storage Adapter
Capabilities:
- Delta Lake format on Azure Blob
- Multiple authentication methods:
- Account key
- SAS token
- Service principal
- Managed identity
- Copy-on-write semantics
Configuration:
[storage.azure.datalake]
account = "mystorageaccount"
container = "data"
prefix = "horizon-epoch/"
auth_method = "account_key" # account_key, sas_token, service_principal, managed_identity
Google Cloud Storage Adapter
Capabilities:
- Delta Lake format on GCS
- Authentication:
- Service account key
- Application Default Credentials (ADC)
- Copy-on-write semantics
Configuration:
[storage.gcs.datalake]
bucket = "my-gcs-bucket"
prefix = "horizon-epoch/"
project_id = "my-project"
auth_method = "service_account" # service_account, adc
Local Filesystem Adapter
Capabilities:
- Delta Lake format on local disk
- SQL query execution via DataFusion
- Ideal for development and testing
- Copy-on-write semantics
Configuration:
[storage.local.dev]
path = "/data/horizon-epoch"
Constraint Support Levels
| Level | Description | Backends |
|---|---|---|
| Full | Constraints enforced via DDL | PostgreSQL, MySQL, MSSQL |
| Partial | Some constraints require config | SQLite |
| Metadata Only | Stored but not enforced | S3, Azure, GCS, LocalFS |
For object storage backends, constraints are versioned in metadata but enforcement happens at the application layer or when promoting to a relational database.
Adapter Selection
Horizon Epoch selects adapters based on table location URL scheme:
# PostgreSQL table
client.track_table("users", StorageLocation.postgresql("main", "public", "users"))
# MySQL table
client.track_table("orders", StorageLocation.mysql("main", "mydb", "orders"))
# S3 Delta table
client.track_table("events", StorageLocation.s3("bucket", "delta/events"))
# Local filesystem
client.track_table("test_data", StorageLocation.local("/data/test"))
Performance Characteristics
| Operation | PostgreSQL | MySQL | MSSQL | SQLite | S3/Delta | Azure/GCS |
|---|---|---|---|---|---|---|
| Point read | Fast | Fast | Fast | Fast | Medium | Medium |
| Range scan | Fast | Fast | Fast | Fast | Fast | Fast |
| Write single | Fast | Fast | Fast | Fast | Slow | Slow |
| Write batch | Fast | Fast | Fast | Medium | Fast | Fast |
| Branch create | Instant | Instant | Instant | Instant | Instant | Instant |
| Full scan | Depends | Depends | Depends | Fast | Very Fast | Very Fast |
Best Practices
Relational Databases (PostgreSQL, MySQL, MSSQL)
- Ensure proper indexes on primary keys and commonly queried columns
- Use connection pooling for high-throughput workloads
- Configure appropriate timeouts for long-running operations
- Monitor overlay table size and compact periodically
SQLite
- Use WAL mode for better concurrency (
journal_mode = "wal") - Enable foreign keys explicitly if needed (
foreign_keys = true) - Use file-based for persistence, in-memory for testing
Object Storage (S3, Azure, GCS, Local)
- Use appropriate file sizes (target 100MB-1GB per file)
- Partition large tables for efficient queries
- Enable predicate pushdown for filtered scans
- Run OPTIMIZE periodically for better read performance
- Configure retention for Delta log cleanup
Troubleshooting
Connection Issues
PostgreSQL/MySQL/MSSQL:
- Verify network connectivity and firewall rules
- Check SSL/TLS configuration matches server requirements
- Ensure connection pool size is appropriate for workload
Object Storage:
- Verify credentials and permissions
- Check bucket/container exists and is accessible
- Ensure network allows access to storage endpoints
Performance Issues
Slow Queries:
- Add indexes for overlay joins (relational)
- Analyze query plans
- Consider materializing branches for read-heavy workloads
Slow Writes:
- Batch writes together (especially for object storage)
- Check network latency to storage
- Use regional endpoints for cloud storage
Copy-on-Write
Copy-on-Write (CoW) is the fundamental mechanism that enables Horizon Epoch to create instant branches and track changes efficiently.
Overview
When you create a branch, no data is copied. Instead:
- The branch is created as a pointer to the current commit
- When you modify a record, only that record is copied
- Queries combine the original data with branch-specific changes
How It Works
Initial State
main branch:
┌─────────────────────────────────────────┐
│ users table │
│ id │ name │ email │
│─────┼─────────┼─────────────────────────│
│ 1 │ Alice │ alice@example.com │
│ 2 │ Bob │ bob@example.com │
│ 3 │ Charlie │ charlie@example.com │
└─────────────────────────────────────────┘
Branch Creation (Zero-Copy)
epoch branch create feature/update-emails
┌─── feature/update-emails
│ (pointer to same data)
▼
main ───────────────►┌─────────────────────────────────────────┐
│ users table │
│ (shared, no copy) │
└─────────────────────────────────────────┘
No data is copied. Both branches point to the same underlying data.
First Modification (Copy-on-Write)
# On feature/update-emails branch
client.execute("UPDATE users SET email = 'alice@company.com' WHERE id = 1")
Now the modified record is stored separately:
main branch: feature/update-emails:
┌───────────────────────┐ ┌─────────────────────────────┐
│ users (base) │ │ users (overlay) │
│ id │ name │ email │ │ id │ name │ email │
│─────┼───────┼─────────│ │─────┼───────┼───────────────│
│ 1 │ Alice │ alice@ │ │ 1 │ Alice │ alice@company │
│ 2 │ Bob │ bob@ │ └─────────────────────────────┘
│ 3 │ Charl │ charlie@│ Only modified records stored
└───────────────────────┘
Querying a Branch
When you query the feature branch:
SELECT * FROM users;
Horizon Epoch:
- Checks the branch overlay for each record
- Returns overlay version if found
- Returns base version otherwise
Result on feature/update-emails:
│ 1 │ Alice │ alice@company.com │ ← from overlay
│ 2 │ Bob │ bob@example.com │ ← from base
│ 3 │ Charlie │ charlie@example.com │ ← from base
Storage Layout: Before and After Modifications
Understanding how data is physically stored helps you reason about performance and storage costs.
PostgreSQL Implementation
Horizon Epoch uses version tables with a naming convention {table}__v_{version_id}:
State 0: Initial Setup (Main Branch Only)
┌─────────────────────────────────────────────────────────────────────────┐
│ Database: my_database │
│ │
│ users (base table) │
│ ┌──────┬─────────┬─────────────────────┬──────────┐ │
│ │ id │ name │ email │ status │ │
│ ├──────┼─────────┼─────────────────────┼──────────┤ │
│ │ 1 │ Alice │ alice@example.com │ active │ │
│ │ 2 │ Bob │ bob@example.com │ active │ │
│ │ 3 │ Charlie │ charlie@example.com │ inactive │ │
│ └──────┴─────────┴─────────────────────┴──────────┘ │
│ │
│ Total storage: ~3 records │
└─────────────────────────────────────────────────────────────────────────┘
State 1: Branch Created (Zero-Copy)
epoch branch create feature/update-users
┌─────────────────────────────────────────────────────────────────────────┐
│ Database: my_database │
│ │
│ users (base table) users__v_feature-update-users │
│ ┌──────┬─────────┬─────────────────────┐ ┌───────────────────────────┐│
│ │ id │ name │ email │ │ (empty table) ││
│ ├──────┼─────────┼─────────────────────┤ │ ││
│ │ 1 │ Alice │ alice@example.com │ │ Created with same schema ││
│ │ 2 │ Bob │ bob@example.com │ │ using LIKE ... INCLUDING ││
│ │ 3 │ Charlie │ charlie@example.com │ │ ALL (copies constraints) ││
│ └──────┴─────────┴─────────────────────┘ └───────────────────────────┘│
│ │
│ Total storage: ~3 records (0 additional for branch) │
│ Branch creation time: O(1) - just creates empty table │
└─────────────────────────────────────────────────────────────────────────┘
State 2: First Modification (Copy-on-Write Triggers)
# On feature branch: Update Alice's email
client.execute("UPDATE users SET email = 'alice@company.com' WHERE id = 1")
┌─────────────────────────────────────────────────────────────────────────┐
│ Database: my_database │
│ │
│ users (base table) users__v_feature-update-users │
│ ┌──────┬─────────┬─────────────────────┐ ┌──────┬───────┬────────────┐│
│ │ id │ name │ email │ │ id │ name │ email ││
│ ├──────┼─────────┼─────────────────────┤ ├──────┼───────┼────────────┤│
│ │ 1 │ Alice │ alice@example.com │ │ 1 │ Alice │ alice@co.. ││
│ │ 2 │ Bob │ bob@example.com │ └──────┴───────┴────────────┘│
│ │ 3 │ Charlie │ charlie@example.com │ │
│ └──────┴─────────┴─────────────────────┘ Only 1 record copied! │
│ │
│ Total storage: ~4 records (1 additional for branch) │
│ Base table: UNCHANGED │
└─────────────────────────────────────────────────────────────────────────┘
State 3: Multiple Modifications
# More changes on feature branch
client.execute("UPDATE users SET status = 'vip' WHERE id = 1") # Updates overlay
client.execute("INSERT INTO users VALUES (4, 'Dana', 'dana@company.com', 'active')")
client.execute("DELETE FROM users WHERE id = 2") # Marks as deleted
┌─────────────────────────────────────────────────────────────────────────┐
│ Database: my_database │
│ │
│ users (base table) users__v_feature-update-users │
│ ┌──────┬─────────┬─────────────────────┐ ┌──────┬───────┬────────────┐│
│ │ id │ name │ email │ │ id │ name │ status ││
│ ├──────┼─────────┼─────────────────────┤ ├──────┼───────┼────────────┤│
│ │ 1 │ Alice │ alice@example.com │ │ 1 │ Alice │ vip (mod) ││
│ │ 2 │ Bob │ bob@example.com │ │ 4 │ Dana │ active(new)││
│ │ 3 │ Charlie │ charlie@example.com │ │ 2 │ NULL │ _DELETED_ ││
│ └──────┴─────────┴─────────────────────┘ └──────┴───────┴────────────┘│
│ │
│ Total storage: ~6 records (3 additional for branch) │
│ Main branch queries: See only original 3 records │
│ Feature branch queries: See Alice(mod) + Charlie + Dana = 3 records │
└─────────────────────────────────────────────────────────────────────────┘
S3/Delta Lake Implementation
Delta Lake uses a similar pattern with separate directories:
State 0: Initial Table
s3://horizon-data/
└── repositories/my-repo/
└── tables/
└── users/ # Base table (main branch)
├── _delta_log/
│ └── 00000000000000000000.json # Transaction log
├── part-00000-abc123.parquet # Data file 1
└── part-00001-def456.parquet # Data file 2
Total: ~100MB (all records)
State 1: Branch Created
s3://horizon-data/
└── repositories/my-repo/
└── tables/
├── users/ # Base table (unchanged)
│ ├── _delta_log/
│ └── *.parquet
│
└── users__v_feature-xyz/ # Version table (empty)
└── _delta_log/
└── 00000000000000000000.json # Schema only, no data files
Additional storage: ~1KB (just metadata)
State 2: After Modifications
s3://horizon-data/
└── repositories/my-repo/
└── tables/
├── users/ # Base table (unchanged: 100MB)
│ ├── _delta_log/
│ └── *.parquet
│
└── users__v_feature-xyz/ # Version table
├── _delta_log/
│ ├── 00000000000000000000.json # Initial schema
│ └── 00000000000000000001.json # Insert/update transaction
└── part-00000-xyz789.parquet # Only modified records (~1MB)
Additional storage: ~1MB (only changed records)
Query Overlay Mechanism
The query overlay mechanism is the heart of CoW - it makes branched data appear seamless to applications.
The UNION ALL + NOT EXISTS Pattern
Horizon Epoch uses a highly optimized SQL pattern that leverages PostgreSQL’s query planner:
-- Overlay query for feature branch
SELECT * FROM schema.users__v_feature_branch
UNION ALL
SELECT b.* FROM schema.users b
WHERE NOT EXISTS (
SELECT 1 FROM schema.users__v_feature_branch v
WHERE b.id = v.id -- Primary key match
)
How it works:
- First SELECT: Returns ALL records from the version table (modified/new records)
- UNION ALL: Combines without deduplication (faster than UNION)
- Second SELECT: Returns base records WHERE they DON’T exist in version table
- NOT EXISTS: Anti-join pattern - highly optimized in PostgreSQL
Query Flow Diagram
┌─────────────────────────────────────────────────────────────────────────┐
│ SELECT * FROM users │
│ (on feature branch) │
└─────────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Horizon Epoch Query Rewriter │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ 1. Detect current branch context │ │
│ │ 2. Identify version table: users__v_feature_branch │ │
│ │ 3. Rewrite query with UNION ALL pattern │ │
│ └────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────┬───────────────────────────────────────┘
│
┌─────────────┴─────────────┐
│ │
▼ ▼
┌──────────────────────────────┐ ┌──────────────────────────────────────┐
│ Version Table Scan │ │ Base Table Scan with Anti-Join │
│ users__v_feature_branch │ │ users WHERE NOT EXISTS (version) │
│ │ │ │
│ Returns: Modified records │ │ Returns: Unmodified records │
│ (Alice with new email) │ │ (Bob, Charlie - unchanged) │
└──────────────┬───────────────┘ └──────────────────┬───────────────────┘
│ │
└──────────────┬───────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ UNION ALL Result │
│ ┌──────┬─────────┬─────────────────────┐ │
│ │ id │ name │ email │ │
│ ├──────┼─────────┼─────────────────────┤ │
│ │ 1 │ Alice │ alice@company.com │ ← from version table │
│ │ 2 │ Bob │ bob@example.com │ ← from base (not in version) │
│ │ 3 │ Charlie │ charlie@example.com │ ← from base (not in version) │
│ └──────┴─────────┴─────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Composite Primary Key Handling
For tables with composite primary keys, the pattern adapts:
-- Table with composite PK (order_id, line_item_id)
SELECT * FROM schema.order_lines__v_feature_branch
UNION ALL
SELECT b.* FROM schema.order_lines b
WHERE NOT EXISTS (
SELECT 1 FROM schema.order_lines__v_feature_branch v
WHERE b.order_id = v.order_id
AND b.line_item_id = v.line_item_id
)
Filtered Queries
When you add a WHERE clause, it’s applied to both parts:
-- Original query
SELECT * FROM users WHERE status = 'active'
-- Rewritten overlay query
SELECT * FROM schema.users__v_feature_branch
WHERE status = 'active'
UNION ALL
SELECT b.* FROM schema.users b
WHERE b.status = 'active'
AND NOT EXISTS (
SELECT 1 FROM schema.users__v_feature_branch v
WHERE b.id = v.id
)
Query Optimization Details
Point Lookups (O(1))
SELECT * FROM users WHERE id = 1
Optimized path:
- Check version table first (indexed lookup)
- If found → return immediately
- If not found → check base table
- Return result
-- Horizon Epoch generates:
SELECT * FROM schema.users__v_feature_branch WHERE id = 1
UNION ALL
SELECT * FROM schema.users
WHERE id = 1
AND NOT EXISTS (
SELECT 1 FROM schema.users__v_feature_branch WHERE id = 1
)
With proper indexing, this is two index lookups - very fast.
Full Table Scans
SELECT * FROM users
Performance depends on overlay size:
- Empty overlay: ~0% overhead (NOT EXISTS returns quickly)
- Small overlay (< 1% of base): ~5-10% overhead
- Large overlay (> 10% of base): Consider materialization
Aggregations
SELECT status, COUNT(*) FROM users GROUP BY status
Aggregations work correctly across the union:
SELECT status, COUNT(*) FROM (
SELECT status FROM schema.users__v_feature_branch
UNION ALL
SELECT b.status FROM schema.users b
WHERE NOT EXISTS (...)
) combined
GROUP BY status
S3/Delta Lake Query Pattern
For Delta Lake tables, Horizon Epoch uses DataFusion SQL engine:
-- Register tables in DataFusion context
CREATE EXTERNAL TABLE base STORED AS DELTA LOCATION 's3://bucket/users/'
CREATE EXTERNAL TABLE branch STORED AS DELTA LOCATION 's3://bucket/users__v_feature/'
-- Query with overlay
SELECT * FROM branch
UNION ALL
SELECT * FROM base
WHERE id NOT IN (SELECT id FROM branch)
Performance Benchmarks
| Scenario | Base Query | Overlay Query | Overhead |
|---|---|---|---|
| Point lookup (indexed) | 0.5ms | 0.6ms | +20% |
| Range scan (1000 rows) | 5ms | 5.5ms | +10% |
| Full scan (1M rows, 1K overlay) | 500ms | 520ms | +4% |
| Full scan (1M rows, 100K overlay) | 500ms | 600ms | +20% |
| Aggregation (1M rows) | 200ms | 220ms | +10% |
Key insight: Overhead is proportional to overlay size, not base table size.
Index Recommendations
For optimal overlay query performance:
-- Version tables automatically inherit indexes via LIKE ... INCLUDING ALL
-- But you may want to ensure these exist:
-- Primary key index (automatic)
CREATE INDEX ON users__v_feature_branch (id);
-- Covering indexes for common queries
CREATE INDEX ON users__v_feature_branch (status) INCLUDE (name, email);
Benefits
1. Instant Branch Creation
Creating a branch takes constant time regardless of data size:
- 1 row table: ~1ms
- 1 billion row table: ~1ms
2. Storage Efficiency
Only store what changes:
- Base: 1TB
- Branch with 1000 changes: ~1MB additional
3. Fast Merging
Merge only examines changed records:
- Base: 1 billion rows
- Changes: 1000 rows
- Merge compares: 1000 rows (not 1 billion)
4. Time Travel
Every commit preserves the overlay state:
- Query any historical version
- No additional storage for old versions
- Instant access to history
Considerations
Query Performance
Overlaid queries have slight overhead:
- Must check overlay before/after base
- Overhead is proportional to overlay size
- Negligible for typical branch sizes
For large overlays:
# Materialize long-lived branches
client.materialize_branch("long-running-branch")
Garbage Collection
Old overlays must be cleaned up:
# Merged branches can have overlays removed
client.gc_branch("merged-feature")
# Or automatic cleanup
config = Config(
gc_merged_branches=True,
gc_retention_days=7
)
Deep Branch Hierarchies
main → dev → feature → sub-feature → experiment
Each level adds overlay resolution. Keep hierarchies shallow.
Comparison with Full Copy
| Aspect | Copy-on-Write | Full Copy |
|---|---|---|
| Branch creation | O(1) | O(n) |
| Storage per branch | O(changes) | O(data) |
| Query (point) | O(1) | O(1) |
| Query (scan) | O(n + overlay) | O(n) |
| Merge | O(changes) | O(n) |
Best Practices
- Keep branches short-lived - Smaller overlays are faster
- Merge frequently - Reduces overlay accumulation
- Monitor overlay size - Large overlays may need materialization
- Use appropriate indexes - Index overlay tables for performance
- Clean up merged branches - Run garbage collection regularly
Merge Algorithm
Horizon Epoch uses a three-way merge algorithm similar to Git, extended for structured data with field-level conflict detection.
Overview
Three-way merge compares:
- Base - The common ancestor (merge base)
- Ours - The target branch (where we’re merging into)
- Theirs - The source branch (what we’re merging)
Visual Overview
┌─────────────────────────────────────────────────────────┐
│ THREE-WAY MERGE │
└─────────────────────────────────────────────────────────┘
Source Branch Target Branch
(feature) (main)
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│Commit E │ │Commit G │
│ users: │ │ users: │
│ Alice→ │ │ Alice→ │
│ Smith │ │ alice@ │
└────┬────┘ │ company │
│ └────┬────┘
│ │
│ ┌──────────────────┐ │
└──────► MERGE BASE ◄────────┘
│ Commit B │
│ users: │
│ Alice │
│ alice@example │
└────────┬─────────┘
│
▼
┌────────────────────┐
│ MERGE RESULT │
│ users: │
│ Alice Smith │ ← from source (name changed)
│ alice@company.com │ ← from target (email changed)
└────────────────────┘
Finding the Merge Base
Commit Graph
C---D---E (feature)
/
A---B---F---G (main)
The merge base for merging feature into main is commit B - the most recent common ancestor.
Algorithm
def find_merge_base(source_commit, target_commit):
"""Find the lowest common ancestor in the commit DAG."""
source_ancestors = get_all_ancestors(source_commit)
# BFS from target to find first intersection
queue = [target_commit]
while queue:
commit = queue.pop(0)
if commit in source_ancestors:
return commit
queue.extend(commit.parents)
return None # No common ancestor (shouldn't happen)
Content Hash Optimization
Before performing record-level comparison, Horizon Epoch uses content hashes for quick diff detection:
def should_merge_table(base_version, source_version, target_version):
"""Use content hashes to skip unchanged tables."""
base_hash = base_version.content_hash
source_hash = source_version.content_hash
target_hash = target_version.content_hash
if source_hash == target_hash:
# Both branches have identical data - no merge needed
return MergeAction.SKIP
if source_hash == base_hash:
# Only target changed - keep target (fast-forward)
return MergeAction.KEEP_TARGET
if target_hash == base_hash:
# Only source changed - take source
return MergeAction.TAKE_SOURCE
# Both changed differently - full record merge required
return MergeAction.RECORD_LEVEL_MERGE
Three-Way Merge Process
Step 1: Compute Changes
For each table, compute:
- Changes from base to ours (
ours_changes) - Changes from base to theirs (
theirs_changes)
def compute_changes(base_state, current_state):
changes = {}
for record in current_state:
base_record = base_state.get(record.key)
if base_record is None:
changes[record.key] = Change(type=INSERT, record=record)
elif base_record != record:
changes[record.key] = Change(type=UPDATE,
base=base_record,
current=record)
for key in base_state:
if key not in current_state:
changes[key] = Change(type=DELETE, base=base_state[key])
return changes
Step 2: Classify Changes
For each changed record:
| Ours | Theirs | Result |
|---|---|---|
| No change | Changed | Accept theirs |
| Changed | No change | Keep ours |
| Same change | Same change | Keep either (identical) |
| Different changes | Different changes | CONFLICT |
| Deleted | Changed | CONFLICT |
| Changed | Deleted | CONFLICT |
| Deleted | Deleted | Keep deleted |
Step 3: Apply Non-Conflicting Changes
def apply_changes(base, ours_changes, theirs_changes):
result = base.copy()
conflicts = []
for key in set(ours_changes.keys()) | set(theirs_changes.keys()):
ours = ours_changes.get(key)
theirs = theirs_changes.get(key)
if ours is None:
# Only theirs changed
result.apply(theirs)
elif theirs is None:
# Only ours changed - already in result
pass
elif ours == theirs:
# Same change - already in result
pass
else:
# Potential conflict - check field level
conflict = check_field_conflict(key, ours, theirs)
if conflict:
conflicts.append(conflict)
else:
result.apply(merge_changes(ours, theirs))
return result, conflicts
Field-Level Conflict Detection
Unlike file-based version control, Horizon Epoch detects conflicts at the field level. This is one of the most powerful features - it allows more changes to merge automatically.
How Field-Level Merge Works
┌─────────────────────────────────────────────────────────────────────────────┐
│ FIELD-LEVEL MERGE ALGORITHM │
└─────────────────────────────────────────────────────────────────────────────┘
For each field in a record where both branches made changes:
┌──────────────────┬──────────────────┬──────────────────┬────────────────────┐
│ Ancestor (A) │ Source (S) │ Target (T) │ Result │
├──────────────────┼──────────────────┼──────────────────┼────────────────────┤
│ value_a │ value_a │ value_a │ value_a (same) │
│ value_a │ value_s │ value_a │ value_s (source) │
│ value_a │ value_a │ value_t │ value_t (target) │
│ value_a │ value_s │ value_s │ value_s (same) │
│ value_a │ value_s │ value_t │ ⚠️ CONFLICT │
│ NULL │ value_s │ NULL │ value_s (added) │
│ NULL │ NULL │ value_t │ value_t (added) │
│ NULL │ value_s │ value_t │ ⚠️ CONFLICT │
│ value_a │ NULL │ value_a │ NULL (deleted) │
│ value_a │ value_a │ NULL │ NULL (deleted) │
│ value_a │ NULL │ value_t │ ⚠️ CONFLICT │
└──────────────────┴──────────────────┴──────────────────┴────────────────────┘
Implementation
def field_level_merge(ancestor, source, target, strategy):
"""
Merge two records field-by-field using three-way merge.
Returns merged record or conflict information.
"""
result = {}
conflicting_fields = []
# Get all fields across all three versions
all_fields = set(ancestor.keys()) | set(source.keys()) | set(target.keys())
for field in all_fields:
ancestor_val = ancestor.get(field)
source_val = source.get(field)
target_val = target.get(field)
# Case 1: Same in both source and target → no conflict
if source_val == target_val:
result[field] = source_val
continue
# Case 2: Only source changed from ancestor
if ancestor_val == target_val and ancestor_val != source_val:
result[field] = source_val
continue
# Case 3: Only target changed from ancestor
if ancestor_val == source_val and ancestor_val != target_val:
result[field] = target_val
continue
# Case 4: Field added in source only
if ancestor_val is None and target_val is None and source_val is not None:
result[field] = source_val
continue
# Case 5: Field added in target only
if ancestor_val is None and source_val is None and target_val is not None:
result[field] = target_val
continue
# Case 6: CONFLICT - both changed differently
conflicting_fields.append(FieldConflict(
field_name=field,
ancestor_value=ancestor_val,
source_value=source_val,
target_value=target_val
))
if conflicting_fields:
return MergeRecordResult.Conflict(
merged_fields=result,
conflicting_fields=conflicting_fields
)
return MergeRecordResult.Resolved(result)
Example: Successful Auto-Merge
Base record:
{"id": 1, "name": "Alice", "email": "alice@example.com", "status": "active"}
Ours (main):
{"id": 1, "name": "Alice", "email": "alice@company.com", "status": "active"}
Theirs (feature):
{"id": 1, "name": "Alice Smith", "email": "alice@example.com", "status": "inactive"}
Field-by-Field Analysis
┌─────────┬───────────────┬───────────────┬───────────────┬────────────────────┐
│ Field │ Base │ Ours (main) │ Theirs (feat) │ Result │
├─────────┼───────────────┼───────────────┼───────────────┼────────────────────┤
│ id │ 1 │ 1 │ 1 │ 1 (unchanged) │
│ name │ Alice │ Alice │ Alice Smith │ Alice Smith ✓ │
│ │ │ (unchanged) │ (changed) │ (take theirs) │
│ email │ alice@example │ alice@company │ alice@example │ alice@company ✓ │
│ │ │ (changed) │ (unchanged) │ (keep ours) │
│ status │ active │ active │ inactive │ inactive ✓ │
│ │ │ (unchanged) │ (changed) │ (take theirs) │
└─────────┴───────────────┴───────────────┴───────────────┴────────────────────┘
Result (auto-merged):
{"id": 1, "name": "Alice Smith", "email": "alice@company.com", "status": "inactive"}
Example: Field-Level Conflict
If both branches changed the same field to different values:
┌─────────┬───────────────┬────────────────────┬────────────────────┬──────────┐
│ Field │ Base │ Ours (main) │ Theirs (feature) │ Result │
├─────────┼───────────────┼────────────────────┼────────────────────┼──────────┤
│ email │ alice@example │ alice@company.com │ a.smith@example │ CONFLICT │
│ │ │ (changed) │ (changed) │ ⚠️ │
└─────────┴───────────────┴────────────────────┴────────────────────┴──────────┘
Conflict Types
Horizon Epoch distinguishes several conflict types:
#![allow(unused)]
fn main() {
pub enum RecordConflictType {
/// Both sides modified the same record differently
ModifyModify,
/// Source deleted, target modified
DeleteModify,
/// Source modified, target deleted
ModifyDelete,
/// Both sides added a record with the same primary key
AddAdd,
/// Other edge cases
Other,
}
}
Each type may require different resolution strategies.
Merge Strategies
Horizon Epoch supports four merge strategies, each suited for different workflows:
┌─────────────────────────────────────────────────────────────────────────────┐
│ MERGE STRATEGIES │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────┬────────────────────────────────────────────────────────────┐
│ Strategy │ Behavior │
├─────────────────┼────────────────────────────────────────────────────────────┤
│ ThreeWay │ Standard 3-way merge, reports conflicts for resolution │
│ (default) │ Use when: You want to review and manually resolve conflicts│
├─────────────────┼────────────────────────────────────────────────────────────┤
│ AcceptSource │ Auto-resolve conflicts by taking source branch values │
│ (--strategy │ Use when: Source branch should "win" (e.g., hotfixes) │
│ theirs) │ │
├─────────────────┼────────────────────────────────────────────────────────────┤
│ AcceptTarget │ Auto-resolve conflicts by keeping target branch values │
│ (--strategy │ Use when: Target branch should "win" (preserve main) │
│ ours) │ │
├─────────────────┼────────────────────────────────────────────────────────────┤
│ FailOnConflict │ Abort immediately if any conflict is detected │
│ (--fail-on- │ Use when: CI/CD pipelines, automated processes │
│ conflict) │ │
└─────────────────┴────────────────────────────────────────────────────────────┘
Default Strategy (ThreeWay)
Field-level three-way merge with conflict detection. Non-conflicting changes merge automatically; conflicts are reported for manual resolution.
epoch merge feature/branch
Accept Source (Theirs) Strategy
On conflicts, automatically choose the source branch (the branch being merged in):
epoch merge feature/branch --strategy theirs
Use cases:
- Hotfix branches that must override any conflicting changes
- Feature branches developed by domain experts
- Emergency deployments
Accept Target (Ours) Strategy
On conflicts, automatically keep the target branch values (the branch you’re merging into):
epoch merge feature/branch --strategy ours
Use cases:
- Integrating experimental branches where existing data should be preserved
- Rebasing feature branches onto main
- Pulling updates while preserving local modifications
Fail on Conflict Strategy
Fail immediately if any conflict is detected. Useful for automated pipelines.
epoch merge feature/branch --fail-on-conflict
Use cases:
- CI/CD pipelines where conflicts indicate process failures
- Automated data sync jobs
- Pre-merge validation checks
Strategy Comparison
| Scenario | ThreeWay | AcceptSource | AcceptTarget | FailOnConflict |
|---|---|---|---|---|
| Non-conflicting changes | ✓ Merge | ✓ Merge | ✓ Merge | ✓ Merge |
| Same field, different values | ⚠️ Report | Take source | Keep target | ❌ Abort |
| Delete vs modify | ⚠️ Report | Delete | Keep modified | ❌ Abort |
| Both add same key | ⚠️ Report | Take source | Keep target | ❌ Abort |
Conflict Resolution
Interactive Resolution
epoch conflicts resolve --interactive
Walks through each conflict:
Conflict in users.email (record id=1):
BASE: alice@example.com
OURS: alice@company.com
THEIRS: alice.smith@example.com
Choose:
[o] Keep ours
[t] Keep theirs
[b] Keep base
[c] Custom value
>
Programmatic Resolution
conflicts = client.get_conflicts()
for conflict in conflicts:
for field in conflict.fields:
# Custom resolution logic
if field.name == "email":
# Business rule: prefer company domain
if "@company.com" in field.ours_value:
resolution = field.ours_value
else:
resolution = field.theirs_value
client.resolve_conflict(
table=conflict.table_name,
record_id=conflict.record_id,
field=field.name,
value=resolution
)
client.merge_continue()
Schema Merge
Schema changes are also merged:
Compatible Changes
These merge automatically:
- Adding a column (either branch)
- Adding an index
- Changing a default value (non-conflicting)
Conflicting Changes
These require resolution:
- Same column changed to different types
- Same column renamed differently
- Conflicting constraints
Schema conflict in users:
Column 'status':
OURS: VARCHAR(20)
THEIRS: INTEGER
Choose resolution:
[o] Keep VARCHAR(20)
[t] Change to INTEGER
[c] Custom type
>
Merge Commits
After successful merge, a merge commit is created:
Commit: ghi789
Parents: abc123 (main), def456 (feature)
Message: Merge feature/branch into main
Changes:
users: 5 records merged, 2 conflicts resolved
Performance
Complexity
- Finding merge base: O(commit history depth)
- Computing changes: O(changed records)
- Conflict detection: O(changed records)
Optimization
- Change tracking - Only examine records marked as changed
- Parallel comparison - Compare multiple tables concurrently
- Early termination - Stop at first conflict if not auto-resolving
Best Practices
- Merge frequently - Smaller merges have fewer conflicts
- Keep branches focused - One feature per branch reduces conflicts
- Communicate on shared data - Coordinate changes to same records
- Test after merge - Verify data integrity post-merge
- Document conflict resolution - Include context in merge commit
Multi-Backend Architecture
Horizon Epoch is designed to manage data across multiple, heterogeneous storage backends through a unified interface.
Design Philosophy
Separation of Concerns
┌─────────────────────────────────────────────────────────────┐
│ Horizon Epoch │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Metadata Layer │ │
│ │ - Commits, branches, tags │ │
│ │ - Table registrations │ │
│ │ - Change tracking indices │ │
│ │ - Version graph │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ │ (references, not data) │
│ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Storage Layer │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │PostgreSQL│ │ MySQL │ │SQL Server│ │ SQLite │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ AWS S3 │ │ Azure │ │ GCS │ │ Local │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Key insight: The metadata layer stores what changed and when, while storage adapters handle where the data physically lives.
Benefits
- No Data Migration - Keep data where it is
- Best Tool for Job - Use PostgreSQL for transactional, S3 for analytics
- Gradual Adoption - Add version control to existing infrastructure
- Unified Operations - Same commands work across all backends
Architecture Components
Storage Registry
Central registry of all configured backends:
config = Config(
metadata_url="postgresql://localhost/horizon_epoch"
).add_postgres(
"prod_users", # Logical name
"postgresql://prod/users"
).add_postgres(
"prod_orders",
"postgresql://prod/orders"
).add_s3(
"analytics",
bucket="company-analytics"
)
Storage Location
Each table has a storage location that identifies:
- Protocol - Which adapter to use
- Backend Name - Which configured backend
- Path - Table identifier within the backend
postgresql://prod_users/public.users
│ │ │
protocol backend table path
Table Registration
Tables are registered with their location:
# PostgreSQL table
client.register_table("users", "postgresql://prod_users/public.users")
# S3 Delta table
client.register_table("events", "s3://analytics/delta/events")
Metadata References
Metadata stores references to data, not the data itself:
-- In metadata database
SELECT * FROM epoch_tables;
┌──────────┬────────────────────────────────────────┐
│ name │ location │
├──────────┼────────────────────────────────────────┤
│ users │ postgresql://prod_users/public.users │
│ orders │ postgresql://prod_orders/public.orders │
│ events │ s3://analytics/delta/events │
└──────────┴────────────────────────────────────────┘
Cross-Backend Operations
Branching
Branches span all registered tables:
# Creates a branch that covers:
# - users (PostgreSQL)
# - orders (PostgreSQL)
# - events (S3)
client.create_branch("feature/new-reporting")
Each backend maintains its own overlay:
- PostgreSQL: Overlay tables
- S3: Separate Delta log
Committing
Commits can include changes from multiple backends:
# Changes to users (PostgreSQL) and events (S3)
# are captured in a single commit
client.commit(message="Update user events schema")
The commit metadata tracks which backends have changes:
{
"commit_id": "abc123",
"message": "Update user events schema",
"changes": {
"postgresql://prod_users": ["users"],
"s3://analytics": ["events"]
}
}
Diffing
Diff operations aggregate across backends:
diff = client.diff("main", "feature/branch")
# Returns changes from all backends
for table_diff in diff.table_diffs:
print(f"{table_diff.location}: {table_diff.status}")
Merging
Merges are coordinated across backends:
- Compute changes per backend
- Detect conflicts per backend
- Apply changes per backend (in transaction if supported)
- Create unified merge commit
Consistency Model
Within a Backend
Operations within a single backend use that backend’s consistency guarantees:
- PostgreSQL: ACID transactions
- S3/Delta: Serializable via Delta protocol
Across Backends
Cross-backend operations provide best-effort consistency:
┌─────────────┐ ┌─────────────┐
│ PostgreSQL │ │ S3 │
│ commit │ │ commit │
└──────┬──────┘ └──────┬──────┘
│ │
└────────┬───────────┘
│
┌──────▼──────┐
│ Metadata │
│ Commit │
└─────────────┘
If one backend fails:
- The operation is marked as partial
- Rollback is attempted where possible
- User is notified of partial state
try:
client.commit(message="Multi-backend update")
except PartialCommitError as e:
print(f"Committed to: {e.successful_backends}")
print(f"Failed on: {e.failed_backends}")
# Manual intervention needed
Configuration Patterns
Separate Environments
# Development
[storage.postgres.dev_db]
url = "postgresql://localhost/dev"
# Staging
[storage.postgres.staging_db]
url = "postgresql://staging-db.internal/staging"
# Production
[storage.postgres.prod_db]
url = "postgresql://prod-db.internal/production"
aws_secret_id = "prod-db-credentials"
Mixed Workloads
# Transactional data in PostgreSQL
[storage.postgres.transactional]
url = "postgresql://oltp-db/production"
# Analytics in S3
[storage.s3.analytics]
bucket = "company-analytics"
region = "us-east-1"
# Archive in Glacier
[storage.s3.archive]
bucket = "company-archive"
region = "us-east-1"
storage_class = "GLACIER_IR"
Cross-Region
[storage.s3.us_data]
bucket = "data-us"
region = "us-east-1"
[storage.s3.eu_data]
bucket = "data-eu"
region = "eu-west-1"
Routing and Discovery
Explicit Routing
Specify backend when registering tables:
client.register_table("users", "postgresql://prod_users/public.users")
Pattern-Based Routing
Configure default routing patterns:
[routing]
# Tables starting with "raw_" go to S3
"raw_*" = "s3://analytics"
# Everything else goes to PostgreSQL
"*" = "postgresql://default"
Auto-Discovery
Discover tables from backends:
# List tables in a backend
tables = client.discover_tables("postgresql://prod_db")
# Register discovered tables
for table in tables:
client.register_table(table.name, table.location)
Performance Considerations
Query Routing
Queries are routed to the appropriate backend:
# Routed to PostgreSQL
client.query("SELECT * FROM users")
# Routed to S3
client.query("SELECT * FROM events")
Cross-Backend Queries
Currently, joins across backends are not supported in a single query. Use application-level joining:
# Query each backend
users = client.query("SELECT * FROM users")
events = client.query("SELECT * FROM events WHERE user_id IN (...)")
# Join in application
result = join(users, events, on="user_id")
Caching
Per-backend connection pooling and caching:
[storage.postgres.prod_db]
pool_size = 20
cache_schema = true
[storage.s3.analytics]
cache_metadata = true
cache_ttl = 300
Limitations
- No cross-backend transactions - ACID only within single backend
- No cross-backend joins - Query each backend separately
- Eventual consistency - Cross-backend commits may be partially applied
- Network latency - Operations touch multiple backends
Best Practices
- Group related tables - Tables that are often queried together should be in the same backend
- Consider latency - Place backends close to where they’re accessed
- Plan for failures - Have recovery procedures for partial commits
- Monitor all backends - Track health and performance per backend
- Document routing - Make it clear which tables are where
Schema Storage Across Backends
Horizon Epoch stores schemas consistently regardless of the underlying storage backend, but the level of enforcement varies. This document explains how schemas are stored, introspected, and enforced across different backends.
Storage Architecture
┌──────────────────────────────────────────────────────────────────────────┐
│ Metadata Database │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ table_versions │ │
│ │ ┌──────────────────────────────────────────────────────────────┐ │ │
│ │ │ id | table_id | commit_id | schema (JSONB) │ │ │
│ │ │ ---------------------------------------------------------- │ │ │
│ │ │ 1 | users | abc123 | {"fields": [...], ...} │ │ │
│ │ │ 2 | users | def456 | {"fields": [...], ...} │ │ │
│ │ │ 3 | orders | abc123 | {"fields": [...], ...} │ │ │
│ │ └──────────────────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
│
┌─────────────┬─────────────┬─────┴─────┬─────────────┬─────────────┐
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ SQL │ │ SQL │ │ Object │ │ File │ │ File │ │ File │
│ (Full) │ │(Part.) │ │ Storage │ │Storage │ │Storage │ │Storage │
├────────┤ ├────────┤ │ (Metadata) │ │(Meta.) │ │(Meta.) │ │(Meta.) │
│PostgreSQL │MySQL │ ├────────────┤ ├────────┤ ├────────┤ ├────────┤
│ MySQL │ │SQLite │ │ S3 │ │ Azure │ │ GCS │ │ Local │
│ MSSQL │ │ │ │ │ │ Blob │ │ │ │ FS │
└────────┘ └────────┘ └────────────┘ └────────┘ └────────┘ └────────┘
Key Principle: Schema is always stored in the metadata database as JSONB. The storage backend may or may not enforce the schema.
PostgreSQL: Full Schema Support
PostgreSQL provides ConstraintSupportLevel::Full - all schema elements are enforced by the database engine.
Schema Storage
Schemas are stored in two places:
- Metadata database: JSONB in
table_versions.schema - Live database: Actual DDL (tables, constraints, indexes)
Introspection
Horizon Epoch introspects PostgreSQL schemas using system catalogs:
-- Columns from information_schema
SELECT column_name, data_type, is_nullable, column_default
FROM information_schema.columns
WHERE table_name = 'users';
-- Constraints from pg_catalog
SELECT conname, contype, pg_get_constraintdef(oid)
FROM pg_constraint
WHERE conrelid = 'users'::regclass;
-- Indexes from pg_indexes
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users';
-- Enum types from pg_type
SELECT typname, enumlabel
FROM pg_type t
JOIN pg_enum e ON t.oid = e.enumtypid
WHERE typname = 'order_status';
What’s Captured
| Element | Captured | Enforced | Notes |
|---|---|---|---|
| Column names | Yes | Yes | |
| Column types | Yes | Yes | Full PostgreSQL type mapping |
| Nullability | Yes | Yes | NOT NULL constraints |
| Defaults | Yes | Yes | Literals, functions, sequences |
| Primary keys | Yes | Yes | |
| Foreign keys | Yes | Yes | With referential actions |
| Unique constraints | Yes | Yes | Including NULLS NOT DISTINCT |
| Check constraints | Yes | Yes | Full expression capture |
| Exclusion constraints | Yes | Yes | GiST-based |
| Indexes | Yes | Yes | B-tree, GIN, GiST, BRIN |
| Enum types | Yes | Yes | With value order |
CLI Examples
# Show schema from PostgreSQL table
epoch schema show users
# Output:
# Table: users (PostgreSQL)
# Source: Live database introspection
#
# Columns:
# id SERIAL PRIMARY KEY
# email VARCHAR(255) NOT NULL
# name VARCHAR(255) NOT NULL
# status user_status DEFAULT 'active'
# created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
#
# Constraints:
# pk_users PRIMARY KEY (id)
# uq_users_email UNIQUE (email)
# chk_users_email CHECK (email ~ '^[^@]+@[^@]+$')
# fk_users_org FOREIGN KEY (org_id) REFERENCES orgs (id)
#
# Indexes:
# idx_users_email btree (email)
# idx_users_status btree (status) WHERE status = 'active'
#
# Enum Types:
# user_status (active, inactive, suspended)
Python Example
import asyncio
from horizon_epoch import Client
async def inspect_schema():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
# Get constraints for a table
constraints = await client.get_constraints("users", branch="main")
print(f"Unique Constraints: {len(constraints.unique_constraints)}")
print(f"Foreign Keys: {len(constraints.foreign_keys)}")
print(f"Check Constraints: {len(constraints.check_constraints)}")
asyncio.run(inspect_schema())
S3/Delta Lake: Metadata-Only Support
S3/Delta Lake provides ConstraintSupportLevel::MetadataOnly - schemas are stored for documentation and merge operations but not enforced at the storage level.
Schema Storage
Schemas are stored in:
- Metadata database: JSONB in
table_versions.schema - Delta Lake: Parquet schema (columns and types only)
- Delta Lake table properties: Constraint metadata
What’s Stored vs Enforced
| Element | Stored | Enforced | Notes |
|---|---|---|---|
| Column names | Yes | Yes | Via Arrow/Parquet schema |
| Column types | Yes | Yes | Mapped to Arrow types |
| Nullability | Yes | Partial | Arrow schema nullability |
| Defaults | Yes | No | Stored as metadata only |
| Primary keys | Yes | No | Used as merge key, not unique |
| Foreign keys | Yes | No | Stored for documentation |
| Unique constraints | Yes | No | Duplicates allowed |
| Check constraints | Yes | No | Expressions not evaluated |
| Exclusion constraints | Yes | No | PostgreSQL-specific |
| Indexes | Yes | No | Delta Lake handles indexing |
| Enum types | Yes | Partial | Arrow dictionary encoding |
Schema Conversion
Horizon Epoch converts schemas to/from Arrow format:
Horizon Epoch Schema ←→ Apache Arrow Schema ←→ Parquet Files
Type Mapping:
| Horizon Epoch | Arrow Type | Notes |
|---|---|---|
| int8 | Int8 | |
| int16 | Int16 | |
| int32 | Int32 | |
| int64 | Int64 | |
| float32 | Float32 | |
| float64 | Float64 | |
| decimal(p,s) | Decimal128 | Precision preserved |
| varchar(n) | Utf8 | Length not enforced |
| text | LargeUtf8 | |
| boolean | Boolean | |
| timestamp | Timestamp | With timezone |
| date | Date32 | |
| time | Time64 | Nanoseconds |
| uuid | FixedSizeBinary(16) | |
| jsonb | LargeUtf8 | Stored as JSON string |
| bytea | LargeBinary | |
| enum | Dictionary | Values preserved |
CLI Examples
# Show schema from S3/Delta table
epoch schema show events
# Output:
# Table: events (S3/Delta Lake)
# Source: Stored metadata (constraints not enforced)
#
# Columns:
# id UUID
# user_id INT64
# event_type VARCHAR(100)
# payload JSONB
# created_at TIMESTAMP
#
# Constraints (metadata only - not enforced):
# pk_events PRIMARY KEY (id)
# fk_events_user FOREIGN KEY (user_id) REFERENCES users (id)
#
# Indexes (not created - Delta Lake manages indexing):
# idx_events_user btree (user_id)
# idx_events_type btree (event_type)
#
# Note: S3/Delta Lake does not enforce constraints at the storage level.
# Constraints are stored for documentation and merge conflict detection.
Python Example
import asyncio
from horizon_epoch import Client
async def check_delta_constraints():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
# Get constraints from Delta table
constraints = await client.get_constraints("events", branch="main")
# Constraints are stored but not enforced by S3/Delta
print(f"Foreign Keys: {len(constraints.foreign_keys)}")
print(f"Unique Constraints: {len(constraints.unique_constraints)}")
# For S3/Delta backends, constraints are metadata-only
# Validation happens at write time (application level)
asyncio.run(check_delta_constraints())
Constraint Storage in Delta Lake
Constraints are stored as Delta Lake table properties:
{
"horizon_epoch.constraints": {
"primary_key": ["id"],
"foreign_keys": [
{
"name": "fk_events_user",
"columns": ["user_id"],
"references_table": "users",
"references_columns": ["id"]
}
],
"unique_constraints": [],
"check_constraints": [
{
"name": "chk_events_type",
"expression": "event_type IN ('click', 'view', 'purchase')"
}
]
}
}
Cross-Backend Operations
Merge Behavior
Constraint conflicts are detected regardless of enforcement level:
# Branch A (PostgreSQL) modifies constraint
epoch checkout main
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 18)"
epoch commit -m "Add age constraint"
# Branch B (same table) modifies same constraint
epoch checkout feature
psql -c "ALTER TABLE users ADD CONSTRAINT chk_age CHECK (age >= 21)"
epoch commit -m "Different age constraint"
# Merge detects conflict
epoch merge feature
# CONFLICT (constraint): chk_age modified in both branches
Even for S3/Delta tables where constraints aren’t enforced, merge conflicts are detected:
# Both branches modify the same constraint metadata
# Merge requires resolution, even though neither is enforced
epoch merge feature
# CONFLICT (constraint): chk_events_type modified in both branches
Generating DDL for S3 Schemas
You can generate PostgreSQL DDL from schemas stored with S3/Delta tables:
# Generate DDL from S3 table schema
epoch schema show events --generate-ddl
# Output:
# -- DDL for events (from S3/Delta metadata)
# CREATE TABLE events (
# id UUID PRIMARY KEY,
# user_id INT64 NOT NULL,
# event_type VARCHAR(100),
# payload JSONB,
# created_at TIMESTAMP
# );
#
# ALTER TABLE events ADD CONSTRAINT fk_events_user
# FOREIGN KEY (user_id) REFERENCES users (id);
#
# CREATE INDEX idx_events_user ON events (user_id);
This is useful for:
- Creating PostgreSQL replicas of S3 data
- Validating schema compatibility
- Documentation
Constraint Support Level API
from horizon_epoch import ConstraintSupportLevel
# Check what a backend supports
level = client.get_constraint_support("postgresql://...")
# Methods available
level.can_enforce() # True for Full only
level.can_store() # True for Full, MetadataOnly, Partial
level.can_execute_ddl() # True for Full only
level.validation_mode() # "database", "application", or "none"
# Use in application logic
if level.can_enforce():
# Let database handle constraint enforcement
client.insert(data)
else:
# Validate in application before insert
violations = client.validate_constraints(data)
if violations:
raise ValueError(f"Constraint violations: {violations}")
client.insert(data)
Best Practices
1. Know Your Enforcement Level
support = client.get_constraint_support(table_location)
if support != ConstraintSupportLevel.Full:
logger.warning(f"Constraints not enforced for {table_location}")
2. Validate Before Writing to S3
# For S3/Delta tables, validate in application
def insert_to_s3_table(client, table, data):
schema = client.get_schema(table)
# Validate primary key uniqueness
existing_keys = client.query(f"SELECT id FROM {table}")
if data['id'] in existing_keys:
raise ValueError("Duplicate primary key")
# Validate foreign keys exist
for fk in schema.foreign_keys:
# Check referenced table
pass
# Then insert
client.insert(table, data)
3. Use PostgreSQL for Critical Constraints
If constraint enforcement is critical, use PostgreSQL tables:
# Analytics data - S3 is fine (no strict constraints needed)
client.register_table("events", "s3://bucket/events")
# User data - PostgreSQL for enforcement
client.register_table("users", "postgresql://mydb/public.users")
4. Document Constraint Semantics
In mixed-backend environments, document which constraints are enforced:
## Data Model
| Table | Backend | Constraints Enforced |
|-------|---------|---------------------|
| users | PostgreSQL | All (FK, UK, CHECK) |
| events | S3/Delta | None (metadata only) |
| orders | PostgreSQL | All |
| analytics | S3/Delta | None |
See Also
- Storage Adapters - Backend implementation details
- Constraint Reference - Constraint types
- Multi-Backend Setup - Configuring multiple backends
Common Issues
Solutions to frequently encountered problems with Horizon Epoch.
Connection Problems
Could not connect to metadata database
Symptoms:
Error: Database connection failed: connection refused
Causes:
- PostgreSQL not running
- Wrong connection string
- Firewall blocking port
- Authentication failure
Solutions:
-
Check PostgreSQL is running:
pg_isready -h localhost -p 5432 -
Verify connection string format:
postgresql://user:password@host:port/database -
Check firewall settings:
# Check if port is accessible nc -zv localhost 5432 -
Test connection directly:
psql "postgresql://user:pass@localhost/horizon_epoch"
S3 bucket not accessible
Symptoms:
Error: S3 error: Access Denied
Causes:
- Invalid credentials
- Bucket doesn’t exist
- Incorrect permissions
- Wrong region
Solutions:
-
Verify AWS credentials:
aws sts get-caller-identity -
Check bucket exists:
aws s3 ls s3://bucket-name -
Review IAM permissions (need s3:GetObject, s3:PutObject, s3:ListBucket)
-
Verify region matches bucket location:
aws s3api get-bucket-location --bucket bucket-name
SSH tunnel connection failed
Symptoms:
Error: SSH tunnel failed: Connection refused
Solutions:
-
Test SSH connection manually:
ssh -i ~/.ssh/key user@bastion -
Check SSH key permissions:
chmod 600 ~/.ssh/id_rsa -
Verify bastion host is accessible
-
Check known_hosts:
ssh-keyscan bastion.example.com >> ~/.ssh/known_hosts
Merge Issues
Merge conflict in table X
Symptoms:
CONFLICT (content): Merge conflict in users
Automatic merge failed; fix conflicts and then commit.
Understanding the conflict:
- Same record modified in both branches
- Same field changed to different values
Resolution:
-
View conflicts:
epoch conflicts show -
Resolve interactively:
epoch conflicts resolve --interactive -
Or accept one side:
epoch conflicts resolve --ours # Keep target branch epoch conflicts resolve --theirs # Keep source branch -
Complete merge:
epoch merge --continue -
Or abort:
epoch merge --abort
Merge base not found
Symptoms:
Error: Could not find common ancestor for branches
Causes:
- Branches have no common history
- Orphaned branch
- Corrupted commit graph
Solutions:
-
Check branch history:
epoch log main --oneline epoch log feature/branch --oneline -
Find common commits:
epoch log --all --graph -
If no common ancestor, use
--allow-unrelated:epoch merge feature/branch --allow-unrelated
Branch Issues
Branch not found
Symptoms:
Error: Branch 'feature/xyz' not found
Solutions:
-
List available branches:
epoch branch list -
Check for typos in branch name
-
Verify branch exists in repository:
epoch branch list --all
Cannot delete branch
Symptoms:
Error: Cannot delete branch 'feature/xyz': not fully merged
Solutions:
-
Merge the branch first:
epoch merge feature/xyz epoch branch delete feature/xyz -
Force delete (loses unmerged changes):
epoch branch delete feature/xyz --force
Table Issues
Table not found
Symptoms:
Error: Table 'users' not found
Solutions:
-
List registered tables:
epoch table list -
Register the table:
epoch table add users --location "postgresql://mydb/public.users" -
Check table exists in storage backend
Schema mismatch
Symptoms:
Error: Schema mismatch for table 'users': expected 5 columns, found 6
Causes:
- Table schema changed outside Horizon Epoch
- Uncommitted schema changes
Solutions:
-
View current schema:
epoch table show users --schema -
Refresh schema:
epoch table refresh users -
Commit schema changes:
epoch commit -m "Update schema for users table"
Performance Issues
Slow queries on branches
Symptoms:
- Queries on branches significantly slower than main
- Query times increase with branch age
Causes:
- Large overlay size
- Deep branch hierarchy
- Missing indexes
Solutions:
-
Check overlay size:
epoch branch info feature/branch --stats -
Materialize long-lived branches:
epoch branch materialize feature/branch -
Merge to reduce hierarchy depth
-
Add indexes to overlay tables
Slow commit operations
Symptoms:
- Commits take longer than expected
- Timeout during commit
Causes:
- Large number of changes
- Network latency
- Lock contention
Solutions:
-
Commit in smaller batches:
epoch commit --tables users -m "Part 1" epoch commit --tables orders -m "Part 2" -
Check for locks:
SELECT * FROM pg_locks WHERE relation = 'epoch_commits'::regclass; -
Increase timeout:
epoch commit -m "Large update" --timeout 600
Authentication Issues
Vault authentication failed
Symptoms:
Error: Vault authentication failed: permission denied
Solutions:
-
Check Vault connectivity:
vault status -
Verify token/credentials:
vault token lookup -
Check policy permissions:
vault policy read horizon-epoch -
For AppRole, verify role_id and secret_id are correct
AWS credentials expired
Symptoms:
Error: ExpiredTokenException: The security token included in the request is expired
Solutions:
-
For IAM roles, credentials refresh automatically
-
For SSO:
aws sso login -
For access keys, rotate them:
aws iam create-access-key
Data Integrity Issues
Corrupted commit
Symptoms:
Error: Commit 'abc123' is corrupted or missing
Solutions:
-
Check commit exists:
epoch show abc123 -
Verify metadata database integrity:
SELECT * FROM epoch_commits WHERE id = 'abc123'; -
If commit is referenced but missing, contact support
Orphaned records
Symptoms:
- Records in overlay not connected to commits
- Disk usage growing unexpectedly
Solutions:
-
Run garbage collection:
epoch gc -
Check for orphaned overlays:
epoch gc --dry-run -
Manually clean up if needed:
epoch gc --force
Certificate/TLS Issues
Certificate expired
Symptoms:
Error: SSL error: certificate has expired
Causes:
- Client certificate expired
- Server certificate expired
- CA certificate expired
Solutions:
-
Check certificate expiry:
# Client certificate openssl x509 -enddate -noout -in /path/to/client.crt # Server certificate (remote) openssl s_client -connect db.example.com:5432 -starttls postgres 2>/dev/null | \ openssl x509 -noout -enddate -
Renew the certificate:
# If using Vault PKI vault write pki/issue/my-role common_name="client" # Manual renewal - contact your CA -
If using Vault dynamic certificates, check renewal is working:
epoch doctor --check vault
Certificate verification failed
Symptoms:
Error: SSL error: certificate verify failed
Error: CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate
Causes:
- Wrong CA certificate
- Incomplete certificate chain
- Self-signed certificate not trusted
- Server hostname mismatch
Solutions:
-
Verify CA certificate is correct:
# Check certificate chain openssl verify -CAfile /path/to/ca.crt /path/to/client.crt -
Check certificate chain is complete:
# View full chain openssl crl2pkcs7 -nocrl -certfile /path/to/cert.pem | \ openssl pkcs7 -print_certs -noout -
For hostname mismatch, check server certificate SANs:
openssl x509 -noout -text -in server.crt | grep -A1 "Subject Alternative Name" -
Verify you’re connecting to the correct hostname matching the certificate
Permission denied reading certificate files
Symptoms:
Error: could not load private key file: Permission denied
Error: could not load certificate file: No such file or directory
Solutions:
-
Check file permissions:
ls -la /path/to/client.key # Should show -rw------- (600) or -r-------- (400) chmod 600 /path/to/client.key chmod 644 /path/to/client.crt -
Verify file exists and path is correct:
ls -la /path/to/client.crt /path/to/client.key /path/to/ca.crt -
Check process user has access:
# Run as same user sudo -u epoch_user cat /path/to/client.crt
Key doesn’t match certificate
Symptoms:
Error: key values mismatch
Error: SSL error: private key does not match certificate
Solutions:
-
Verify key matches certificate:
# Compare modulus hashes - they should match openssl x509 -noout -modulus -in client.crt | openssl md5 openssl rsa -noout -modulus -in client.key | openssl md5 -
If mismatched, regenerate key/cert pair or locate correct files
RDS IAM Authentication Issues
Failed to generate IAM auth token
Symptoms:
Error: Failed to generate RDS IAM authentication token
Error: The security token included in the request is invalid
Causes:
- IAM role doesn’t have rds-db:connect permission
- Wrong AWS region
- Invalid instance endpoint
Solutions:
-
Verify IAM policy allows rds-db:connect:
{ "Effect": "Allow", "Action": "rds-db:connect", "Resource": "arn:aws:rds-db:REGION:ACCOUNT:dbuser:DBI_RESOURCE_ID/DB_USER" } -
Check AWS region matches RDS instance:
aws rds describe-db-instances --db-instance-identifier mydb \ --query 'DBInstances[0].DBInstanceArn' -
Verify you’re using the correct endpoint:
# epoch.toml - use the actual endpoint, not a custom DNS [storage.postgres.mydb] host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com" use_iam_auth = true
RDS IAM token expired immediately
Symptoms:
Error: PAM authentication failed
Error: password authentication failed for user "iam_user"
Causes:
- System clock skew
- Token generated too far in advance
- Wrong database user
Solutions:
-
Check system time is accurate:
date -u # Compare with actual UTC time # Sync time if needed sudo ntpdate pool.ntp.org -
Verify database user is configured for IAM:
-- In RDS CREATE USER iam_user WITH LOGIN; GRANT rds_iam TO iam_user; -
Check the token is being generated correctly:
# Generate token manually for testing aws rds generate-db-auth-token \ --hostname mydb.cluster-xxx.us-east-1.rds.amazonaws.com \ --port 5432 \ --username iam_user
Cannot assume role for RDS IAM
Symptoms:
Error: User is not authorized to perform: sts:AssumeRole
Error: Access denied when assuming role for RDS authentication
Solutions:
-
Check assume role trust policy:
aws iam get-role --role-name MyRdsRole \ --query 'Role.AssumeRolePolicyDocument' -
Verify trust policy allows your principal:
{ "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::ACCOUNT:role/YourExecutionRole" }, "Action": "sts:AssumeRole" } -
Check for external_id requirement:
[storage.s3.mydb] assume_role_arn = "arn:aws:iam::123456789012:role/RdsRole" external_id = "required-external-id" # If configured
Credential Refresh Issues
Credential refresh failed
Symptoms:
Error: Credential refresh failed: provider returned error
Error: Failed to refresh credentials: connection refused
Causes:
- Credential provider unavailable
- Network connectivity issues
- Token/secret expired
Solutions:
-
Check provider connectivity:
# For Vault vault status # For AWS aws sts get-caller-identity -
Verify credentials haven’t expired beyond refresh:
# Check Vault token vault token lookup # Check current credentials are valid epoch doctor --check credentials -
Force credential refresh:
# Clear cached credentials epoch config cache clear # Re-authenticate epoch auth login
Credentials expired and refresh not supported
Symptoms:
Error: Credentials expired
Error: Refresh not supported for static credentials
Causes:
- Using static credentials without refresh capability
- Credential cache disabled
- Refresh interval too long
Solutions:
-
Use a provider that supports refresh:
# Instead of static password [storage.postgres.mydb] vault_path = "secret/data/mydb" # Dynamic refresh # Or environment variables (refresh on re-read) [storage.postgres.mydb] url = "${DB_URL}" -
Enable credential caching with refresh:
[credentials] cache_enabled = true cache_ttl = 300 refresh_before_expiry = 60 # Refresh 60s before expiry
Token lease expired (Vault)
Symptoms:
Error: Vault token expired
Error: permission denied (token expired)
Solutions:
-
Check token status:
vault token lookup -
Renew token if renewable:
vault token renew -
For non-renewable tokens, re-authenticate:
# AppRole vault write auth/approle/login \ role_id=$ROLE_ID \ secret_id=$SECRET_ID -
Configure automatic renewal:
[vault] auto_renew_token = true renew_threshold = 0.7 # Renew when 70% of TTL elapsed
Multi-Backend Routing Issues
Storage backend not found
Symptoms:
Error: Storage backend 'mydb' not found
Error: Unknown storage location: postgresql://mydb/...
Causes:
- Backend not configured
- Typo in backend name
- Configuration not loaded
Solutions:
-
List configured backends:
epoch config show storage -
Check configuration file:
# epoch.toml [storage.postgres.mydb] # Backend name is 'mydb' url = "postgresql://..." -
Verify configuration is loaded:
epoch doctor --check config
Wrong backend selected for table
Symptoms:
Error: Table 'users' not found in storage 'datalake'
Error: Cannot access PostgreSQL table through S3 backend
Causes:
- Table registered with wrong backend
- Backend mismatch in storage location
Solutions:
-
Check table registration:
epoch table show users -
Re-register with correct backend:
epoch table remove users epoch table add users \ --location "postgresql://correct_backend/public.users" -
Verify storage location format:
PostgreSQL: postgresql://backend_name/schema.table S3: s3://backend_name/path/to/table
Cannot connect to multiple backends simultaneously
Symptoms:
Error: Connection pool exhausted
Error: Too many connections
Causes:
- Pool size too small for multi-backend operations
- Connection leak
- Long-running transactions
Solutions:
-
Increase pool sizes:
[storage.postgres.backend1] pool_size = 20 [storage.postgres.backend2] pool_size = 20 -
Check for connection leaks:
-- PostgreSQL SELECT * FROM pg_stat_activity WHERE application_name LIKE '%epoch%'; -
Set idle connection timeouts:
[storage.postgres.mydb] idle_timeout = 300 # Close idle connections after 5 min
Cross-backend operations failing
Symptoms:
Error: Cannot merge tables from different storage backends
Error: Cross-storage operation not supported
Causes:
- Tables on different backends can’t be joined directly
- Merge requires compatible storage types
Solutions:
-
Understand cross-backend limitations:
- Merges work within same storage type
- Queries are executed per-backend
- Results merged in memory
-
For cross-backend data access, use Python SDK:
# Read from both backends users = client.query_table("users", backend="postgres1") orders = client.query_table("orders", backend="postgres2") # Join in Python/Pandas merged = users.merge(orders, on="user_id") -
Consider consolidating frequently-joined tables on same backend
Backend credentials conflict
Symptoms:
Error: Credential 'default' conflicts with existing credential
Warning: Multiple credentials found for backend 'mydb'
Solutions:
-
Use unique credential names per backend:
[storage.postgres.backend1] vault_path = "secret/data/db1" [storage.postgres.backend2] vault_path = "secret/data/db2" # Different path -
Clear credential cache if switching configurations:
epoch config cache clear
Getting Help
If these solutions don’t resolve your issue:
-
Check logs:
epoch --verbose command -
Enable debug logging:
RUST_LOG=debug epoch command -
Report issues:
- Contact Horizon Analytic Studios for support
- Include: version (
epoch --version), error message, steps to reproduce
FAQ
Frequently asked questions about Horizon Epoch.
General Questions
What is Horizon Epoch?
Horizon Epoch is a Git-like version control system for data. It provides branching, merging, commits, and history tracking for data stored in PostgreSQL, MySQL, SQL Server, SQLite, S3/Delta Lake, Azure Blob, GCS, and local filesystems.
How is it different from Git?
Git tracks changes to files (text). Horizon Epoch tracks changes to structured data (rows/records). Key differences:
- Record-level tracking instead of line-level
- Field-level conflict detection instead of text diff
- Works with live databases instead of files
- Storage-agnostic - supports multiple backends
Do I need to migrate my data?
No. Horizon Epoch works with data where it lives. You register existing tables and start versioning. Your data stays in its original location.
Is my data copied when I create a branch?
No. Branching is zero-copy. Only modified records are stored separately. Creating a branch is instant regardless of data size.
Architecture
Where is the metadata stored?
Metadata (commits, branches, change tracking) is stored in a dedicated PostgreSQL database. This is separate from your data storage.
Can I use different databases for metadata and data?
Yes. The metadata database and data storage are completely separate. For example:
- Metadata: PostgreSQL on localhost
- Data: PostgreSQL on AWS RDS + S3 for analytics
Does Horizon Epoch modify my existing tables?
Horizon Epoch creates overlay tables alongside your existing tables to track changes. Your original table structure is not modified.
What happens if I modify data directly without using Epoch?
Changes made outside of Horizon Epoch’s context won’t be tracked. For full version control, all changes should go through Epoch or you should periodically sync.
Performance
How fast is branch creation?
Branch creation is O(1) - constant time regardless of data size. Creating a branch from a 1TB dataset takes the same time as from a 1KB dataset (~milliseconds).
What’s the storage overhead?
Storage overhead is proportional to changes, not data size:
- Base data: Original size (no overhead)
- Per branch: Only modified records are stored
- Typical overhead: 1-10% for active branches
Are queries slower on branches?
There’s a small overhead for branch queries due to overlay resolution. For typical branch sizes (< 1% of data modified), overhead is negligible (<10%). For large overlays, consider materializing the branch.
Can it handle large datasets?
Yes. Horizon Epoch is designed for large-scale data:
- Zero-copy branching works at any scale
- Only changed records are processed
- Streaming for large operations
- Pagination for history
Operations
Can I undo a merge?
Yes. Use epoch revert HEAD to create a new commit that undoes the merge. The original merge commit remains in history.
How do I see what changed?
# Changes since last commit
epoch status
# Changes between branches
epoch diff main feature/branch
# Commit history
epoch log
Can I query historical data?
Yes. Use time-travel queries:
# At specific commit
epoch query "SELECT * FROM users" --at abc123
# At specific time
epoch query "SELECT * FROM users" --at "2024-01-15 10:00:00"
# At tag
epoch query "SELECT * FROM users" --at v1.0.0
How do I resolve merge conflicts?
# See conflicts
epoch conflicts show
# Interactive resolution
epoch conflicts resolve --interactive
# Accept one side
epoch conflicts resolve --ours # or --theirs
# Complete merge
epoch merge --continue
Storage Backends
Which storage systems are supported?
Horizon Epoch supports 8 storage backends:
Relational Databases (Full Constraint Support):
- PostgreSQL
- MySQL
- Microsoft SQL Server
- SQLite (partial constraint support)
Object Storage (Delta Lake Format):
- AWS S3 (and S3-compatible like MinIO)
- Azure Blob Storage
- Google Cloud Storage
- Local Filesystem
Can I use multiple storage backends?
Yes. You can register tables from different backends and manage them in a single repository. Operations like branching and merging work across backends.
Does it work with AWS RDS?
Yes. Use the PostgreSQL or MySQL adapter with your RDS connection string. IAM authentication is supported for enhanced security.
What about cloud data warehouses?
Snowflake, BigQuery, and Redshift are not currently supported. These have unique architectures that would require specialized adapters.
Security
Are credentials stored securely?
Credentials can be sourced from:
- Environment variables
- Encrypted files
- HashiCorp Vault
- AWS Secrets Manager
Never store credentials in plain text configuration files.
Does it support SSO?
AWS SSO/Identity Center is supported for S3 access. Database SSO depends on your database’s support.
What access controls are available?
Horizon Epoch includes comprehensive security features:
- Branch protection rules - Prevent direct commits to protected branches
- Role-based access control (RBAC) - Define roles with specific permissions
- Row-level security (RLS) - Fine-grained data access policies
- Audit logging - Track all operations for compliance
- Commit signing - PGP/OpenPGP signature verification
- Secret scanning - Detect accidentally committed secrets
- Field-level masking - Redact sensitive data
Integration
Does it work with dbt?
You can use Horizon Epoch to version the data that dbt transforms. Create a branch, run dbt, validate results, then merge.
Can I use it with Airflow/Dagster?
Yes. Use the Python SDK in your DAG tasks:
from horizon_epoch import Client
async def branch_and_transform():
async with Client.connect("postgresql://...") as client:
await client.branch("etl/daily-run")
# ... run transformations ...
await client.commit("Daily ETL run")
await client.merge("etl/daily-run")
Is there a REST API?
Yes. The FastAPI server provides a REST API for all operations. See the REST API Reference.
Does it have a web UI?
A web UI is planned for future releases. Currently, use the CLI or SDK.
Comparison
vs. dbt snapshots?
dbt snapshots capture point-in-time state. Horizon Epoch provides:
- Full branching and merging
- Efficient storage (only changes)
- Any storage backend
vs. Delta Lake time travel?
Delta Lake provides time travel within a single table. Horizon Epoch adds:
- Cross-table consistency
- Branching and merging
- Multi-backend support
- Git-like workflow
vs. lakeFS?
Both provide Git-like versioning for data. Key differences:
- Horizon Epoch: Works with databases (PostgreSQL, MySQL, etc.) and object storage
- lakeFS: Focused on object storage
vs. Nessie/Project Nessie?
Nessie provides Git-like versioning for Iceberg/Delta tables. Horizon Epoch:
- Supports more storage backends (including relational databases)
- Provides record-level tracking
- Works with transactional databases
Troubleshooting
Where are logs?
# Verbose output
epoch --verbose command
# Debug logging
RUST_LOG=debug epoch command
How do I check system health?
# Run diagnostics
epoch doctor
# Check local services
epoch local status
Where can I get help?
- Documentation: This site
- Run
epoch tipsfor getting started help - Run
epoch --helpfor command reference
Licensing
Licensing terms to be announced. Please contact Horizon Analytic Studios for licensing inquiries.