Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Common Issues

Solutions to frequently encountered problems with Horizon Epoch.

Connection Problems

Could not connect to metadata database

Symptoms:

Error: Database connection failed: connection refused

Causes:

  1. PostgreSQL not running
  2. Wrong connection string
  3. Firewall blocking port
  4. Authentication failure

Solutions:

  1. Check PostgreSQL is running:

    pg_isready -h localhost -p 5432
    
  2. Verify connection string format:

    postgresql://user:password@host:port/database
    
  3. Check firewall settings:

    # Check if port is accessible
    nc -zv localhost 5432
    
  4. Test connection directly:

    psql "postgresql://user:pass@localhost/horizon_epoch"
    

S3 bucket not accessible

Symptoms:

Error: S3 error: Access Denied

Causes:

  1. Invalid credentials
  2. Bucket doesn’t exist
  3. Incorrect permissions
  4. Wrong region

Solutions:

  1. Verify AWS credentials:

    aws sts get-caller-identity
    
  2. Check bucket exists:

    aws s3 ls s3://bucket-name
    
  3. Review IAM permissions (need s3:GetObject, s3:PutObject, s3:ListBucket)

  4. Verify region matches bucket location:

    aws s3api get-bucket-location --bucket bucket-name
    

SSH tunnel connection failed

Symptoms:

Error: SSH tunnel failed: Connection refused

Solutions:

  1. Test SSH connection manually:

    ssh -i ~/.ssh/key user@bastion
    
  2. Check SSH key permissions:

    chmod 600 ~/.ssh/id_rsa
    
  3. Verify bastion host is accessible

  4. Check known_hosts:

    ssh-keyscan bastion.example.com >> ~/.ssh/known_hosts
    

Merge Issues

Merge conflict in table X

Symptoms:

CONFLICT (content): Merge conflict in users
Automatic merge failed; fix conflicts and then commit.

Understanding the conflict:

  • Same record modified in both branches
  • Same field changed to different values

Resolution:

  1. View conflicts:

    epoch conflicts show
    
  2. Resolve interactively:

    epoch conflicts resolve --interactive
    
  3. Or accept one side:

    epoch conflicts resolve --ours    # Keep target branch
    epoch conflicts resolve --theirs  # Keep source branch
    
  4. Complete merge:

    epoch merge --continue
    
  5. Or abort:

    epoch merge --abort
    

Merge base not found

Symptoms:

Error: Could not find common ancestor for branches

Causes:

  • Branches have no common history
  • Orphaned branch
  • Corrupted commit graph

Solutions:

  1. Check branch history:

    epoch log main --oneline
    epoch log feature/branch --oneline
    
  2. Find common commits:

    epoch log --all --graph
    
  3. If no common ancestor, use --allow-unrelated:

    epoch merge feature/branch --allow-unrelated
    

Branch Issues

Branch not found

Symptoms:

Error: Branch 'feature/xyz' not found

Solutions:

  1. List available branches:

    epoch branch list
    
  2. Check for typos in branch name

  3. Verify branch exists in repository:

    epoch branch list --all
    

Cannot delete branch

Symptoms:

Error: Cannot delete branch 'feature/xyz': not fully merged

Solutions:

  1. Merge the branch first:

    epoch merge feature/xyz
    epoch branch delete feature/xyz
    
  2. Force delete (loses unmerged changes):

    epoch branch delete feature/xyz --force
    

Table Issues

Table not found

Symptoms:

Error: Table 'users' not found

Solutions:

  1. List registered tables:

    epoch table list
    
  2. Register the table:

    epoch table add users --location "postgresql://mydb/public.users"
    
  3. Check table exists in storage backend


Schema mismatch

Symptoms:

Error: Schema mismatch for table 'users': expected 5 columns, found 6

Causes:

  • Table schema changed outside Horizon Epoch
  • Uncommitted schema changes

Solutions:

  1. View current schema:

    epoch table show users --schema
    
  2. Refresh schema:

    epoch table refresh users
    
  3. Commit schema changes:

    epoch commit -m "Update schema for users table"
    

Performance Issues

Slow queries on branches

Symptoms:

  • Queries on branches significantly slower than main
  • Query times increase with branch age

Causes:

  • Large overlay size
  • Deep branch hierarchy
  • Missing indexes

Solutions:

  1. Check overlay size:

    epoch branch info feature/branch --stats
    
  2. Materialize long-lived branches:

    epoch branch materialize feature/branch
    
  3. Merge to reduce hierarchy depth

  4. Add indexes to overlay tables


Slow commit operations

Symptoms:

  • Commits take longer than expected
  • Timeout during commit

Causes:

  • Large number of changes
  • Network latency
  • Lock contention

Solutions:

  1. Commit in smaller batches:

    epoch commit --tables users -m "Part 1"
    epoch commit --tables orders -m "Part 2"
    
  2. Check for locks:

    SELECT * FROM pg_locks WHERE relation = 'epoch_commits'::regclass;
    
  3. Increase timeout:

    epoch commit -m "Large update" --timeout 600
    

Authentication Issues

Vault authentication failed

Symptoms:

Error: Vault authentication failed: permission denied

Solutions:

  1. Check Vault connectivity:

    vault status
    
  2. Verify token/credentials:

    vault token lookup
    
  3. Check policy permissions:

    vault policy read horizon-epoch
    
  4. For AppRole, verify role_id and secret_id are correct


AWS credentials expired

Symptoms:

Error: ExpiredTokenException: The security token included in the request is expired

Solutions:

  1. For IAM roles, credentials refresh automatically

  2. For SSO:

    aws sso login
    
  3. For access keys, rotate them:

    aws iam create-access-key
    

Data Integrity Issues

Corrupted commit

Symptoms:

Error: Commit 'abc123' is corrupted or missing

Solutions:

  1. Check commit exists:

    epoch show abc123
    
  2. Verify metadata database integrity:

    SELECT * FROM epoch_commits WHERE id = 'abc123';
    
  3. If commit is referenced but missing, contact support


Orphaned records

Symptoms:

  • Records in overlay not connected to commits
  • Disk usage growing unexpectedly

Solutions:

  1. Run garbage collection:

    epoch gc
    
  2. Check for orphaned overlays:

    epoch gc --dry-run
    
  3. Manually clean up if needed:

    epoch gc --force
    

Certificate/TLS Issues

Certificate expired

Symptoms:

Error: SSL error: certificate has expired

Causes:

  1. Client certificate expired
  2. Server certificate expired
  3. CA certificate expired

Solutions:

  1. Check certificate expiry:

    # Client certificate
    openssl x509 -enddate -noout -in /path/to/client.crt
    
    # Server certificate (remote)
    openssl s_client -connect db.example.com:5432 -starttls postgres 2>/dev/null | \
      openssl x509 -noout -enddate
    
  2. Renew the certificate:

    # If using Vault PKI
    vault write pki/issue/my-role common_name="client"
    
    # Manual renewal - contact your CA
    
  3. If using Vault dynamic certificates, check renewal is working:

    epoch doctor --check vault
    

Certificate verification failed

Symptoms:

Error: SSL error: certificate verify failed
Error: CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate

Causes:

  1. Wrong CA certificate
  2. Incomplete certificate chain
  3. Self-signed certificate not trusted
  4. Server hostname mismatch

Solutions:

  1. Verify CA certificate is correct:

    # Check certificate chain
    openssl verify -CAfile /path/to/ca.crt /path/to/client.crt
    
  2. Check certificate chain is complete:

    # View full chain
    openssl crl2pkcs7 -nocrl -certfile /path/to/cert.pem | \
      openssl pkcs7 -print_certs -noout
    
  3. For hostname mismatch, check server certificate SANs:

    openssl x509 -noout -text -in server.crt | grep -A1 "Subject Alternative Name"
    
  4. Verify you’re connecting to the correct hostname matching the certificate


Permission denied reading certificate files

Symptoms:

Error: could not load private key file: Permission denied
Error: could not load certificate file: No such file or directory

Solutions:

  1. Check file permissions:

    ls -la /path/to/client.key
    # Should show -rw------- (600) or -r-------- (400)
    
    chmod 600 /path/to/client.key
    chmod 644 /path/to/client.crt
    
  2. Verify file exists and path is correct:

    ls -la /path/to/client.crt /path/to/client.key /path/to/ca.crt
    
  3. Check process user has access:

    # Run as same user
    sudo -u epoch_user cat /path/to/client.crt
    

Key doesn’t match certificate

Symptoms:

Error: key values mismatch
Error: SSL error: private key does not match certificate

Solutions:

  1. Verify key matches certificate:

    # Compare modulus hashes - they should match
    openssl x509 -noout -modulus -in client.crt | openssl md5
    openssl rsa -noout -modulus -in client.key | openssl md5
    
  2. If mismatched, regenerate key/cert pair or locate correct files


RDS IAM Authentication Issues

Failed to generate IAM auth token

Symptoms:

Error: Failed to generate RDS IAM authentication token
Error: The security token included in the request is invalid

Causes:

  1. IAM role doesn’t have rds-db:connect permission
  2. Wrong AWS region
  3. Invalid instance endpoint

Solutions:

  1. Verify IAM policy allows rds-db:connect:

    {
      "Effect": "Allow",
      "Action": "rds-db:connect",
      "Resource": "arn:aws:rds-db:REGION:ACCOUNT:dbuser:DBI_RESOURCE_ID/DB_USER"
    }
    
  2. Check AWS region matches RDS instance:

    aws rds describe-db-instances --db-instance-identifier mydb \
      --query 'DBInstances[0].DBInstanceArn'
    
  3. Verify you’re using the correct endpoint:

    # epoch.toml - use the actual endpoint, not a custom DNS
    [storage.postgres.mydb]
    host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com"
    use_iam_auth = true
    

RDS IAM token expired immediately

Symptoms:

Error: PAM authentication failed
Error: password authentication failed for user "iam_user"

Causes:

  1. System clock skew
  2. Token generated too far in advance
  3. Wrong database user

Solutions:

  1. Check system time is accurate:

    date -u
    # Compare with actual UTC time
    
    # Sync time if needed
    sudo ntpdate pool.ntp.org
    
  2. Verify database user is configured for IAM:

    -- In RDS
    CREATE USER iam_user WITH LOGIN;
    GRANT rds_iam TO iam_user;
    
  3. Check the token is being generated correctly:

    # Generate token manually for testing
    aws rds generate-db-auth-token \
      --hostname mydb.cluster-xxx.us-east-1.rds.amazonaws.com \
      --port 5432 \
      --username iam_user
    

Cannot assume role for RDS IAM

Symptoms:

Error: User is not authorized to perform: sts:AssumeRole
Error: Access denied when assuming role for RDS authentication

Solutions:

  1. Check assume role trust policy:

    aws iam get-role --role-name MyRdsRole \
      --query 'Role.AssumeRolePolicyDocument'
    
  2. Verify trust policy allows your principal:

    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::ACCOUNT:role/YourExecutionRole"
      },
      "Action": "sts:AssumeRole"
    }
    
  3. Check for external_id requirement:

    [storage.s3.mydb]
    assume_role_arn = "arn:aws:iam::123456789012:role/RdsRole"
    external_id = "required-external-id"  # If configured
    

Credential Refresh Issues

Credential refresh failed

Symptoms:

Error: Credential refresh failed: provider returned error
Error: Failed to refresh credentials: connection refused

Causes:

  1. Credential provider unavailable
  2. Network connectivity issues
  3. Token/secret expired

Solutions:

  1. Check provider connectivity:

    # For Vault
    vault status
    
    # For AWS
    aws sts get-caller-identity
    
  2. Verify credentials haven’t expired beyond refresh:

    # Check Vault token
    vault token lookup
    
    # Check current credentials are valid
    epoch doctor --check credentials
    
  3. Force credential refresh:

    # Clear cached credentials
    epoch config cache clear
    
    # Re-authenticate
    epoch auth login
    

Credentials expired and refresh not supported

Symptoms:

Error: Credentials expired
Error: Refresh not supported for static credentials

Causes:

  1. Using static credentials without refresh capability
  2. Credential cache disabled
  3. Refresh interval too long

Solutions:

  1. Use a provider that supports refresh:

    # Instead of static password
    [storage.postgres.mydb]
    vault_path = "secret/data/mydb"  # Dynamic refresh
    
    # Or environment variables (refresh on re-read)
    [storage.postgres.mydb]
    url = "${DB_URL}"
    
  2. Enable credential caching with refresh:

    [credentials]
    cache_enabled = true
    cache_ttl = 300
    refresh_before_expiry = 60  # Refresh 60s before expiry
    

Token lease expired (Vault)

Symptoms:

Error: Vault token expired
Error: permission denied (token expired)

Solutions:

  1. Check token status:

    vault token lookup
    
  2. Renew token if renewable:

    vault token renew
    
  3. For non-renewable tokens, re-authenticate:

    # AppRole
    vault write auth/approle/login \
      role_id=$ROLE_ID \
      secret_id=$SECRET_ID
    
  4. Configure automatic renewal:

    [vault]
    auto_renew_token = true
    renew_threshold = 0.7  # Renew when 70% of TTL elapsed
    

Multi-Backend Routing Issues

Storage backend not found

Symptoms:

Error: Storage backend 'mydb' not found
Error: Unknown storage location: postgresql://mydb/...

Causes:

  1. Backend not configured
  2. Typo in backend name
  3. Configuration not loaded

Solutions:

  1. List configured backends:

    epoch config show storage
    
  2. Check configuration file:

    # epoch.toml
    [storage.postgres.mydb]  # Backend name is 'mydb'
    url = "postgresql://..."
    
  3. Verify configuration is loaded:

    epoch doctor --check config
    

Wrong backend selected for table

Symptoms:

Error: Table 'users' not found in storage 'datalake'
Error: Cannot access PostgreSQL table through S3 backend

Causes:

  1. Table registered with wrong backend
  2. Backend mismatch in storage location

Solutions:

  1. Check table registration:

    epoch table show users
    
  2. Re-register with correct backend:

    epoch table remove users
    epoch table add users \
      --location "postgresql://correct_backend/public.users"
    
  3. Verify storage location format:

    PostgreSQL: postgresql://backend_name/schema.table
    S3:         s3://backend_name/path/to/table
    

Cannot connect to multiple backends simultaneously

Symptoms:

Error: Connection pool exhausted
Error: Too many connections

Causes:

  1. Pool size too small for multi-backend operations
  2. Connection leak
  3. Long-running transactions

Solutions:

  1. Increase pool sizes:

    [storage.postgres.backend1]
    pool_size = 20
    
    [storage.postgres.backend2]
    pool_size = 20
    
  2. Check for connection leaks:

    -- PostgreSQL
    SELECT * FROM pg_stat_activity
    WHERE application_name LIKE '%epoch%';
    
  3. Set idle connection timeouts:

    [storage.postgres.mydb]
    idle_timeout = 300  # Close idle connections after 5 min
    

Cross-backend operations failing

Symptoms:

Error: Cannot merge tables from different storage backends
Error: Cross-storage operation not supported

Causes:

  1. Tables on different backends can’t be joined directly
  2. Merge requires compatible storage types

Solutions:

  1. Understand cross-backend limitations:

    • Merges work within same storage type
    • Queries are executed per-backend
    • Results merged in memory
  2. For cross-backend data access, use Python SDK:

    # Read from both backends
    users = client.query_table("users", backend="postgres1")
    orders = client.query_table("orders", backend="postgres2")
    
    # Join in Python/Pandas
    merged = users.merge(orders, on="user_id")
    
  3. Consider consolidating frequently-joined tables on same backend


Backend credentials conflict

Symptoms:

Error: Credential 'default' conflicts with existing credential
Warning: Multiple credentials found for backend 'mydb'

Solutions:

  1. Use unique credential names per backend:

    [storage.postgres.backend1]
    vault_path = "secret/data/db1"
    
    [storage.postgres.backend2]
    vault_path = "secret/data/db2"  # Different path
    
  2. Clear credential cache if switching configurations:

    epoch config cache clear
    

Getting Help

If these solutions don’t resolve your issue:

  1. Check logs:

    epoch --verbose command
    
  2. Enable debug logging:

    RUST_LOG=debug epoch command
    
  3. Report issues:

    • Contact Horizon Analytic Studios for support
    • Include: version (epoch --version), error message, steps to reproduce