Branch-Aware Queries
Query data from specific branches, compare between branches, and understand how branch context affects data access.
Overview
When you query data through Horizon Epoch:
- Each branch shows its own version of the data
- Changes on a branch don’t affect other branches
- You can query multiple branches simultaneously
Setting Branch Context
CLI
# Set current branch
epoch checkout feature/new-schema
# All subsequent operations use this branch
epoch query "SELECT * FROM users"
Python SDK
import asyncio
from horizon_epoch import Client
async def query_branch():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
# Checkout the branch
await client.checkout("feature/new-schema")
# Get status to see current branch
status = await client.status()
print(f"On branch: {status.branch}")
# Get data diff between branches
diff = await client.diff(
base="main",
target="feature/new-schema"
)
for table_diff in diff.table_diffs:
print(f"{table_diff.table_name}: {table_diff.total_changes}")
asyncio.run(query_branch())
How Branch Queries Work
Copy-on-Write Overlay
When you query a branch:
- Base data comes from the parent branch
- Branch-specific changes overlay the base
- You see the combined result
main: users = [Alice, Bob, Charlie]
│
feature: overlay = [Charlie → Charles]
│
query: result = [Alice, Bob, Charles]
Query Flow
-- On branch 'feature/update-names'
SELECT * FROM users WHERE id = 3;
-- Horizon Epoch executes:
-- 1. Check branch overlay for id=3
-- 2. If found, return overlay version
-- 3. If not found, return base version
Comparing Branches
Diff Query
Compare data between two branches:
# Get records that differ between branches
diff = client.diff_query(
table="users",
source="main",
target="feature/updates"
)
for change in diff:
print(f"{change.type}: {change.record}")
Side-by-Side Comparison
# Query both branches
main_data = client.query("SELECT * FROM users", branch="main")
feature_data = client.query("SELECT * FROM users", branch="feature/updates")
# Compare programmatically
import pandas as pd
main_df = pd.DataFrame(main_data)
feature_df = pd.DataFrame(feature_data)
comparison = main_df.compare(feature_df)
CLI Comparison
# Show diff
epoch diff main feature/updates --table users
# Show records only in one branch
epoch diff main feature/updates --only-in-target
Querying Multiple Branches
Union Across Branches
# Get all unique records across branches
results = client.query_branches(
query="SELECT * FROM users",
branches=["main", "feature/a", "feature/b"],
mode="union"
)
Compare Aggregate Results
# Run same query on multiple branches
for branch in ["main", "staging", "production"]:
result = client.query(
"SELECT COUNT(*) as user_count FROM users",
branch=branch
)
print(f"{branch}: {result[0]['user_count']} users")
Time-Travel Queries
Query data at a specific point in time:
By Commit ID
# Query at specific commit
client.query(
"SELECT * FROM users",
at_commit="abc123f"
)
By Tag
# Query at tagged version
client.query(
"SELECT * FROM users",
at_tag="v1.0.0"
)
By Timestamp
from datetime import datetime, timedelta
# Query data as of yesterday
yesterday = datetime.now() - timedelta(days=1)
client.query(
"SELECT * FROM users",
at_time=yesterday
)
CLI Time-Travel
# Query at commit
epoch query "SELECT * FROM users" --at abc123f
# Query at tag
epoch query "SELECT * FROM users" --at v1.0.0
# Query at time
epoch query "SELECT * FROM users" --at "2024-01-15 10:00:00"
Performance Considerations
Branch Depth
Deeply nested branches may be slower:
main → dev → feature → sub-feature → experiment
Each level adds overlay resolution. Keep branches shallow when possible.
Indexing
Branch overlays use primary keys for efficient lookup:
# Fast: uses primary key
client.query("SELECT * FROM users WHERE id = 123", branch="feature")
# Slower: scans overlay + base
client.query("SELECT * FROM users WHERE name = 'Alice'", branch="feature")
Caching
Query caching can be configured via the CLI:
epoch config set query_cache_enabled true
epoch config set query_cache_ttl 300
Branch Isolation Guarantees
Read Isolation
Each branch sees consistent data:
- Changes on other branches are invisible
- Commits create point-in-time snapshots
- Queries always return consistent results
Write Isolation
Changes on a branch don’t affect others until merged:
# On feature branch
await client.checkout("feature")
# Make changes via your database connection...
await client.stage_all()
await client.commit(message="Deactivate user 1")
# Main branch is unchanged - checkout and diff to compare
await client.checkout("main")
# Main still has the original data
Common Patterns
A/B Testing
# Set up test branches
await client.branch("experiment/new-algo", start_point="main")
# Run algorithm on each branch
for branch in ["main", "experiment/new-algo"]:
await client.checkout(branch)
run_algorithm()
results = measure_metrics()
print(f"{branch}: {results}")
Data Validation
# Validate changes before merge
await client.checkout("feature/migration")
# Use diff to check for issues
diff = await client.diff(base="main", target="feature/migration")
for table_diff in diff.table_diffs:
if table_diff.total_changes > 1000:
raise ValidationError(f"Too many changes in {table_diff.table_name}")
Environment Comparison
# Compare data across environments
for env in ["development", "staging", "production"]:
result = client.query(
"SELECT COUNT(*) as count, MAX(updated_at) as latest FROM orders",
branch=env
)
print(f"{env}: {result}")
Troubleshooting
Wrong Branch Data
Expected data from 'feature' but got 'main' data
- Verify branch context is set correctly
- Check if query specifies explicit branch
- Verify commit was made on correct branch
Missing Changes
Can't see changes I just made
- Changes must be committed to be visible in queries
- Uncommitted changes are in working state only
- Use
epoch statusto see uncommitted changes
Slow Queries
- Reduce branch nesting depth
- Ensure queries use primary keys when possible
- Consider materializing long-lived branches
Next Steps
- Copy-on-Write - How branch isolation works
- Merge Algorithm - Combining branch changes
- Time-Travel Queries - More on historical queries