FAQ
Frequently asked questions about Horizon Epoch.
General Questions
What is Horizon Epoch?
Horizon Epoch is a Git-like version control system for data. It provides branching, merging, commits, and history tracking for data stored in PostgreSQL, MySQL, SQL Server, SQLite, S3/Delta Lake, Azure Blob, GCS, and local filesystems.
How is it different from Git?
Git tracks changes to files (text). Horizon Epoch tracks changes to structured data (rows/records). Key differences:
- Record-level tracking instead of line-level
- Field-level conflict detection instead of text diff
- Works with live databases instead of files
- Storage-agnostic - supports multiple backends
Do I need to migrate my data?
No. Horizon Epoch works with data where it lives. You register existing tables and start versioning. Your data stays in its original location.
Is my data copied when I create a branch?
No. Branching is zero-copy. Only modified records are stored separately. Creating a branch is instant regardless of data size.
Architecture
Where is the metadata stored?
Metadata (commits, branches, change tracking) is stored in a dedicated PostgreSQL database. This is separate from your data storage.
Can I use different databases for metadata and data?
Yes. The metadata database and data storage are completely separate. For example:
- Metadata: PostgreSQL on localhost
- Data: PostgreSQL on AWS RDS + S3 for analytics
Does Horizon Epoch modify my existing tables?
Horizon Epoch creates overlay tables alongside your existing tables to track changes. Your original table structure is not modified.
What happens if I modify data directly without using Epoch?
Changes made outside of Horizon Epoch’s context won’t be tracked. For full version control, all changes should go through Epoch or you should periodically sync.
Performance
How fast is branch creation?
Branch creation is O(1) - constant time regardless of data size. Creating a branch from a 1TB dataset takes the same time as from a 1KB dataset (~milliseconds).
What’s the storage overhead?
Storage overhead is proportional to changes, not data size:
- Base data: Original size (no overhead)
- Per branch: Only modified records are stored
- Typical overhead: 1-10% for active branches
Are queries slower on branches?
There’s a small overhead for branch queries due to overlay resolution. For typical branch sizes (< 1% of data modified), overhead is negligible (<10%). For large overlays, consider materializing the branch.
Can it handle large datasets?
Yes. Horizon Epoch is designed for large-scale data:
- Zero-copy branching works at any scale
- Only changed records are processed
- Streaming for large operations
- Pagination for history
Operations
Can I undo a merge?
Yes. Use epoch revert HEAD to create a new commit that undoes the merge. The original merge commit remains in history.
How do I see what changed?
# Changes since last commit
epoch status
# Changes between branches
epoch diff main feature/branch
# Commit history
epoch log
Can I query historical data?
Yes. Use time-travel queries:
# At specific commit
epoch query "SELECT * FROM users" --at abc123
# At specific time
epoch query "SELECT * FROM users" --at "2024-01-15 10:00:00"
# At tag
epoch query "SELECT * FROM users" --at v1.0.0
How do I resolve merge conflicts?
# See conflicts
epoch conflicts show
# Interactive resolution
epoch conflicts resolve --interactive
# Accept one side
epoch conflicts resolve --ours # or --theirs
# Complete merge
epoch merge --continue
Storage Backends
Which storage systems are supported?
Horizon Epoch supports 8 storage backends:
Relational Databases (Full Constraint Support):
- PostgreSQL
- MySQL
- Microsoft SQL Server
- SQLite (partial constraint support)
Object Storage (Delta Lake Format):
- AWS S3 (and S3-compatible like MinIO)
- Azure Blob Storage
- Google Cloud Storage
- Local Filesystem
Can I use multiple storage backends?
Yes. You can register tables from different backends and manage them in a single repository. Operations like branching and merging work across backends.
Does it work with AWS RDS?
Yes. Use the PostgreSQL or MySQL adapter with your RDS connection string. IAM authentication is supported for enhanced security.
What about cloud data warehouses?
Snowflake, BigQuery, and Redshift are not currently supported. These have unique architectures that would require specialized adapters.
Security
Are credentials stored securely?
Credentials can be sourced from:
- Environment variables
- Encrypted files
- HashiCorp Vault
- AWS Secrets Manager
Never store credentials in plain text configuration files.
Does it support SSO?
AWS SSO/Identity Center is supported for S3 access. Database SSO depends on your database’s support.
What access controls are available?
Horizon Epoch includes comprehensive security features:
- Branch protection rules - Prevent direct commits to protected branches
- Role-based access control (RBAC) - Define roles with specific permissions
- Row-level security (RLS) - Fine-grained data access policies
- Audit logging - Track all operations for compliance
- Commit signing - PGP/OpenPGP signature verification
- Secret scanning - Detect accidentally committed secrets
- Field-level masking - Redact sensitive data
Integration
Does it work with dbt?
You can use Horizon Epoch to version the data that dbt transforms. Create a branch, run dbt, validate results, then merge.
Can I use it with Airflow/Dagster?
Yes. Use the Python SDK in your DAG tasks:
from horizon_epoch import Client
async def branch_and_transform():
async with Client.connect("postgresql://...") as client:
await client.branch("etl/daily-run")
# ... run transformations ...
await client.commit("Daily ETL run")
await client.merge("etl/daily-run")
Is there a REST API?
Yes. The FastAPI server provides a REST API for all operations. See the REST API Reference.
Does it have a web UI?
A web UI is planned for future releases. Currently, use the CLI or SDK.
Comparison
vs. dbt snapshots?
dbt snapshots capture point-in-time state. Horizon Epoch provides:
- Full branching and merging
- Efficient storage (only changes)
- Any storage backend
vs. Delta Lake time travel?
Delta Lake provides time travel within a single table. Horizon Epoch adds:
- Cross-table consistency
- Branching and merging
- Multi-backend support
- Git-like workflow
vs. lakeFS?
Both provide Git-like versioning for data. Key differences:
- Horizon Epoch: Works with databases (PostgreSQL, MySQL, etc.) and object storage
- lakeFS: Focused on object storage
vs. Nessie/Project Nessie?
Nessie provides Git-like versioning for Iceberg/Delta tables. Horizon Epoch:
- Supports more storage backends (including relational databases)
- Provides record-level tracking
- Works with transactional databases
Troubleshooting
Where are logs?
# Verbose output
epoch --verbose command
# Debug logging
RUST_LOG=debug epoch command
How do I check system health?
# Run diagnostics
epoch doctor
# Check local services
epoch local status
Where can I get help?
- Documentation: This site
- Run
epoch tipsfor getting started help - Run
epoch --helpfor command reference
Licensing
Licensing terms to be announced. Please contact Horizon Analytic Studios for licensing inquiries.