Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

FAQ

Frequently asked questions about Horizon Epoch.

General Questions

What is Horizon Epoch?

Horizon Epoch is a Git-like version control system for data. It provides branching, merging, commits, and history tracking for data stored in PostgreSQL, MySQL, SQL Server, SQLite, S3/Delta Lake, Azure Blob, GCS, and local filesystems.

How is it different from Git?

Git tracks changes to files (text). Horizon Epoch tracks changes to structured data (rows/records). Key differences:

  • Record-level tracking instead of line-level
  • Field-level conflict detection instead of text diff
  • Works with live databases instead of files
  • Storage-agnostic - supports multiple backends

Do I need to migrate my data?

No. Horizon Epoch works with data where it lives. You register existing tables and start versioning. Your data stays in its original location.

Is my data copied when I create a branch?

No. Branching is zero-copy. Only modified records are stored separately. Creating a branch is instant regardless of data size.


Architecture

Where is the metadata stored?

Metadata (commits, branches, change tracking) is stored in a dedicated PostgreSQL database. This is separate from your data storage.

Can I use different databases for metadata and data?

Yes. The metadata database and data storage are completely separate. For example:

  • Metadata: PostgreSQL on localhost
  • Data: PostgreSQL on AWS RDS + S3 for analytics

Does Horizon Epoch modify my existing tables?

Horizon Epoch creates overlay tables alongside your existing tables to track changes. Your original table structure is not modified.

What happens if I modify data directly without using Epoch?

Changes made outside of Horizon Epoch’s context won’t be tracked. For full version control, all changes should go through Epoch or you should periodically sync.


Performance

How fast is branch creation?

Branch creation is O(1) - constant time regardless of data size. Creating a branch from a 1TB dataset takes the same time as from a 1KB dataset (~milliseconds).

What’s the storage overhead?

Storage overhead is proportional to changes, not data size:

  • Base data: Original size (no overhead)
  • Per branch: Only modified records are stored
  • Typical overhead: 1-10% for active branches

Are queries slower on branches?

There’s a small overhead for branch queries due to overlay resolution. For typical branch sizes (< 1% of data modified), overhead is negligible (<10%). For large overlays, consider materializing the branch.

Can it handle large datasets?

Yes. Horizon Epoch is designed for large-scale data:

  • Zero-copy branching works at any scale
  • Only changed records are processed
  • Streaming for large operations
  • Pagination for history

Operations

Can I undo a merge?

Yes. Use epoch revert HEAD to create a new commit that undoes the merge. The original merge commit remains in history.

How do I see what changed?

# Changes since last commit
epoch status

# Changes between branches
epoch diff main feature/branch

# Commit history
epoch log

Can I query historical data?

Yes. Use time-travel queries:

# At specific commit
epoch query "SELECT * FROM users" --at abc123

# At specific time
epoch query "SELECT * FROM users" --at "2024-01-15 10:00:00"

# At tag
epoch query "SELECT * FROM users" --at v1.0.0

How do I resolve merge conflicts?

# See conflicts
epoch conflicts show

# Interactive resolution
epoch conflicts resolve --interactive

# Accept one side
epoch conflicts resolve --ours   # or --theirs

# Complete merge
epoch merge --continue

Storage Backends

Which storage systems are supported?

Horizon Epoch supports 8 storage backends:

Relational Databases (Full Constraint Support):

  • PostgreSQL
  • MySQL
  • Microsoft SQL Server
  • SQLite (partial constraint support)

Object Storage (Delta Lake Format):

  • AWS S3 (and S3-compatible like MinIO)
  • Azure Blob Storage
  • Google Cloud Storage
  • Local Filesystem

Can I use multiple storage backends?

Yes. You can register tables from different backends and manage them in a single repository. Operations like branching and merging work across backends.

Does it work with AWS RDS?

Yes. Use the PostgreSQL or MySQL adapter with your RDS connection string. IAM authentication is supported for enhanced security.

What about cloud data warehouses?

Snowflake, BigQuery, and Redshift are not currently supported. These have unique architectures that would require specialized adapters.


Security

Are credentials stored securely?

Credentials can be sourced from:

  • Environment variables
  • Encrypted files
  • HashiCorp Vault
  • AWS Secrets Manager

Never store credentials in plain text configuration files.

Does it support SSO?

AWS SSO/Identity Center is supported for S3 access. Database SSO depends on your database’s support.

What access controls are available?

Horizon Epoch includes comprehensive security features:

  • Branch protection rules - Prevent direct commits to protected branches
  • Role-based access control (RBAC) - Define roles with specific permissions
  • Row-level security (RLS) - Fine-grained data access policies
  • Audit logging - Track all operations for compliance
  • Commit signing - PGP/OpenPGP signature verification
  • Secret scanning - Detect accidentally committed secrets
  • Field-level masking - Redact sensitive data

Integration

Does it work with dbt?

You can use Horizon Epoch to version the data that dbt transforms. Create a branch, run dbt, validate results, then merge.

Can I use it with Airflow/Dagster?

Yes. Use the Python SDK in your DAG tasks:

from horizon_epoch import Client

async def branch_and_transform():
    async with Client.connect("postgresql://...") as client:
        await client.branch("etl/daily-run")
        # ... run transformations ...
        await client.commit("Daily ETL run")
        await client.merge("etl/daily-run")

Is there a REST API?

Yes. The FastAPI server provides a REST API for all operations. See the REST API Reference.

Does it have a web UI?

A web UI is planned for future releases. Currently, use the CLI or SDK.


Comparison

vs. dbt snapshots?

dbt snapshots capture point-in-time state. Horizon Epoch provides:

  • Full branching and merging
  • Efficient storage (only changes)
  • Any storage backend

vs. Delta Lake time travel?

Delta Lake provides time travel within a single table. Horizon Epoch adds:

  • Cross-table consistency
  • Branching and merging
  • Multi-backend support
  • Git-like workflow

vs. lakeFS?

Both provide Git-like versioning for data. Key differences:

  • Horizon Epoch: Works with databases (PostgreSQL, MySQL, etc.) and object storage
  • lakeFS: Focused on object storage

vs. Nessie/Project Nessie?

Nessie provides Git-like versioning for Iceberg/Delta tables. Horizon Epoch:

  • Supports more storage backends (including relational databases)
  • Provides record-level tracking
  • Works with transactional databases

Troubleshooting

Where are logs?

# Verbose output
epoch --verbose command

# Debug logging
RUST_LOG=debug epoch command

How do I check system health?

# Run diagnostics
epoch doctor

# Check local services
epoch local status

Where can I get help?

  • Documentation: This site
  • Run epoch tips for getting started help
  • Run epoch --help for command reference

Licensing

Licensing terms to be announced. Please contact Horizon Analytic Studios for licensing inquiries.