Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Branch-Aware Queries

Query data from specific branches, compare between branches, and understand how branch context affects data access.

Overview

When you query data through Horizon Epoch:

  • Each branch shows its own version of the data
  • Changes on a branch don’t affect other branches
  • You can query multiple branches simultaneously

Setting Branch Context

CLI

# Set current branch
epoch checkout feature/new-schema

# All subsequent operations use this branch
epoch query "SELECT * FROM users"

Python SDK

import asyncio
from horizon_epoch import Client

async def query_branch():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Checkout the branch
        await client.checkout("feature/new-schema")

        # Get status to see current branch
        status = await client.status()
        print(f"On branch: {status.branch}")

        # Get data diff between branches
        diff = await client.diff(
            base="main",
            target="feature/new-schema"
        )
        for table_diff in diff.table_diffs:
            print(f"{table_diff.table_name}: {table_diff.total_changes}")

asyncio.run(query_branch())

How Branch Queries Work

Copy-on-Write Overlay

When you query a branch:

  1. Base data comes from the parent branch
  2. Branch-specific changes overlay the base
  3. You see the combined result
main:     users = [Alice, Bob, Charlie]
                            │
feature:  overlay = [Charlie → Charles]
                            │
query:    result = [Alice, Bob, Charles]

Query Flow

-- On branch 'feature/update-names'
SELECT * FROM users WHERE id = 3;

-- Horizon Epoch executes:
-- 1. Check branch overlay for id=3
-- 2. If found, return overlay version
-- 3. If not found, return base version

Comparing Branches

Diff Query

Compare data between two branches:

# Get records that differ between branches
diff = client.diff_query(
    table="users",
    source="main",
    target="feature/updates"
)

for change in diff:
    print(f"{change.type}: {change.record}")

Side-by-Side Comparison

# Query both branches
main_data = client.query("SELECT * FROM users", branch="main")
feature_data = client.query("SELECT * FROM users", branch="feature/updates")

# Compare programmatically
import pandas as pd
main_df = pd.DataFrame(main_data)
feature_df = pd.DataFrame(feature_data)
comparison = main_df.compare(feature_df)

CLI Comparison

# Show diff
epoch diff main feature/updates --table users

# Show records only in one branch
epoch diff main feature/updates --only-in-target

Querying Multiple Branches

Union Across Branches

# Get all unique records across branches
results = client.query_branches(
    query="SELECT * FROM users",
    branches=["main", "feature/a", "feature/b"],
    mode="union"
)

Compare Aggregate Results

# Run same query on multiple branches
for branch in ["main", "staging", "production"]:
    result = client.query(
        "SELECT COUNT(*) as user_count FROM users",
        branch=branch
    )
    print(f"{branch}: {result[0]['user_count']} users")

Time-Travel Queries

Query data at a specific point in time:

By Commit ID

# Query at specific commit
client.query(
    "SELECT * FROM users",
    at_commit="abc123f"
)

By Tag

# Query at tagged version
client.query(
    "SELECT * FROM users",
    at_tag="v1.0.0"
)

By Timestamp

from datetime import datetime, timedelta

# Query data as of yesterday
yesterday = datetime.now() - timedelta(days=1)
client.query(
    "SELECT * FROM users",
    at_time=yesterday
)

CLI Time-Travel

# Query at commit
epoch query "SELECT * FROM users" --at abc123f

# Query at tag
epoch query "SELECT * FROM users" --at v1.0.0

# Query at time
epoch query "SELECT * FROM users" --at "2024-01-15 10:00:00"

Performance Considerations

Branch Depth

Deeply nested branches may be slower:

main → dev → feature → sub-feature → experiment

Each level adds overlay resolution. Keep branches shallow when possible.

Indexing

Branch overlays use primary keys for efficient lookup:

# Fast: uses primary key
client.query("SELECT * FROM users WHERE id = 123", branch="feature")

# Slower: scans overlay + base
client.query("SELECT * FROM users WHERE name = 'Alice'", branch="feature")

Caching

Query caching can be configured via the CLI:

epoch config set query_cache_enabled true
epoch config set query_cache_ttl 300

Branch Isolation Guarantees

Read Isolation

Each branch sees consistent data:

  • Changes on other branches are invisible
  • Commits create point-in-time snapshots
  • Queries always return consistent results

Write Isolation

Changes on a branch don’t affect others until merged:

# On feature branch
await client.checkout("feature")
# Make changes via your database connection...
await client.stage_all()
await client.commit(message="Deactivate user 1")

# Main branch is unchanged - checkout and diff to compare
await client.checkout("main")
# Main still has the original data

Common Patterns

A/B Testing

# Set up test branches
await client.branch("experiment/new-algo", start_point="main")

# Run algorithm on each branch
for branch in ["main", "experiment/new-algo"]:
    await client.checkout(branch)
    run_algorithm()
    results = measure_metrics()
    print(f"{branch}: {results}")

Data Validation

# Validate changes before merge
await client.checkout("feature/migration")

# Use diff to check for issues
diff = await client.diff(base="main", target="feature/migration")
for table_diff in diff.table_diffs:
    if table_diff.total_changes > 1000:
        raise ValidationError(f"Too many changes in {table_diff.table_name}")

Environment Comparison

# Compare data across environments
for env in ["development", "staging", "production"]:
    result = client.query(
        "SELECT COUNT(*) as count, MAX(updated_at) as latest FROM orders",
        branch=env
    )
    print(f"{env}: {result}")

Troubleshooting

Wrong Branch Data

Expected data from 'feature' but got 'main' data
  • Verify branch context is set correctly
  • Check if query specifies explicit branch
  • Verify commit was made on correct branch

Missing Changes

Can't see changes I just made
  • Changes must be committed to be visible in queries
  • Uncommitted changes are in working state only
  • Use epoch status to see uncommitted changes

Slow Queries

  • Reduce branch nesting depth
  • Ensure queries use primary keys when possible
  • Consider materializing long-lived branches

Next Steps