Architecture
System Overview
┌─────────────────────────────────────────────────────────────┐
│ User Interfaces │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ CLI │ │ Python SDK │ │ REST API │ │
│ └──────┬──────┘ └──────┬──────┘ └──────────┬──────────┘ │
└─────────┼────────────────┼─────────────────────┼────────────┘
│ │ │
└────────────────┼─────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────┐
│ Rust Core Engine │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Version Control Operations │ │
│ │ branch, commit, merge, diff, promote │ │
│ └────────────────────────────────────────────────────────┘ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Metadata Layer │ │
│ │ commit graph, branch pointers, table versions │ │
│ └────────────────────────────────────────────────────────┘ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Storage Abstraction │ │
│ │ unified interface for all backends │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────┬───────────────────────────────┘
│
┌─────────────────────────────▼───────────────────────────────┐
│ Storage Backends │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐ │
│ │ PostgreSQL │ │ MySQL │ │ SQL Server │ │ SQLite │ │
│ └────────────┘ └────────────┘ └────────────┘ └──────────┘ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐ │
│ │ AWS S3 │ │Azure Blob │ │ GCS │ │ Local │ │
│ └────────────┘ └────────────┘ └────────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
Design Principles
1. Git Semantics
If you know Git, you know Horizon Epoch. We use the same conceptual model:
- Repositories contain versioned data
- Branches are independent lines of development
- Commits are immutable snapshots
- Merges combine branches with conflict detection
2. Storage Agnostic
Work with data where it lives:
- No data migration required
- Each storage backend has an adapter
- Same operations work across all backends
- Mix and match storage types
3. Zero-Copy Branching
Creating a branch is instant:
- No data is duplicated
- Branch is just a pointer to a commit
- Changes are tracked incrementally
- Storage is only used for actual changes
4. Record-Level Tracking
Granular change detection:
- Track individual record changes (not just files)
- Field-level conflict detection
- Efficient storage of deltas
- Precise merge resolution
5. Metadata Separation
Version control metadata is separate from data:
- Metadata stored in dedicated PostgreSQL database
- Data stays in original storage
- No modification to existing data infrastructure
- Easy to add/remove without affecting data
Component Details
Rust Core Engine
The heart of Horizon Epoch, written in Rust for:
- Performance and memory safety
- Concurrent operations
- Cross-platform compatibility
Key modules:
operations/- Branch, commit, merge, diff logicmetadata/- Commit graph, branch managementstorage/- Storage adapter traits and implementations
Metadata Layer
Stores all versioning information:
┌─────────────────────────────────────────┐
│ Metadata Database │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Repositories│ │ Branches │ │
│ └─────────────┘ └─────────────────┘ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Commits │ │ Tables │ │
│ └─────────────┘ └─────────────────┘ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ Tags │ │ Change Tracking │ │
│ └─────────────┘ └─────────────────┘ │
└─────────────────────────────────────────┘
Storage Adapters
Each adapter implements a common trait:
#![allow(unused)]
fn main() {
trait StorageAdapter {
fn read_records(&self, table: &str, keys: &[RecordKey]) -> Result<Vec<Record>>;
fn write_records(&self, table: &str, records: &[Record]) -> Result<()>;
fn delete_records(&self, table: &str, keys: &[RecordKey]) -> Result<()>;
fn get_schema(&self, table: &str) -> Result<Schema>;
fn list_tables(&self) -> Result<Vec<String>>;
// ... more methods
}
}
Python Bindings (PyO3)
Python SDK wraps Rust core via PyO3:
- Zero-copy data exchange where possible
- Async support via tokio
- Pythonic API design
- Type hints for IDE support
REST API (FastAPI)
HTTP API for external integrations:
- OpenAPI documentation
- Authentication/authorization
- Rate limiting
- Webhook support
CLI
Command-line interface for developers:
- Git-like commands
- Interactive prompts
- JSON output option
- Shell completion
Data Flow
Commit Operation
1. User: epoch commit -m "Update users"
│
2. CLI: Parse command, validate
│
3. Core: Identify changed records
│
4. Core: Create commit metadata
│
5. Core: Store change delta
│
6. Meta: Update branch pointer
│
7. CLI: Return success + commit ID
Merge Operation
1. User: epoch merge feature/branch
│
2. Core: Find merge base (common ancestor)
│
3. Core: Compute changes from base to source
│
4. Core: Compute changes from base to target
│
5. Core: Identify conflicts
│
┌────┴────┐
│ │
No conflicts Has conflicts
│ │
▼ ▼
6a. Apply 6b. Report
changes conflicts
│ │
7a. Create 7b. Wait for
merge resolution
commit
Query Operation
1. User: SELECT * FROM users WHERE id = 42
│
2. Core: Determine current branch
│
3. Core: Check branch overlay for record
│
┌────┴────┐
│ │
Found in Not in
overlay overlay
│ │
▼ ▼
4a. Return 4b. Query
overlay base data
record │
┌──────┴──────┐
│ │
Found Not found
│ │
▼ ▼
Return Return
record empty
Scalability Considerations
Horizontal Scaling
- Metadata database can be replicated
- Storage backends scale independently
- API servers are stateless
Performance Optimization
- Connection pooling for databases
- Lazy loading of large datasets
- Caching of frequently accessed data
- Batch operations for bulk changes
Large Repository Support
- Incremental operations
- Streaming for large diffs
- Pagination for history
- Sparse checkout support
Security Model
Authentication
- Multiple auth methods supported
- Credential providers for secrets
- Token-based API access
Authorization
- Repository-level permissions
- Branch protection rules
- Audit logging
Data Protection
- TLS for all connections
- Encryption at rest (via storage backends)
- Credential encryption
Future Architecture
Planned enhancements:
- Distributed metadata for global deployments
- Real-time collaboration features
- Advanced caching layer
- Plugin system for custom adapters