Common Issues
Solutions to frequently encountered problems with Horizon Epoch.
Connection Problems
Could not connect to metadata database
Symptoms:
Error: Database connection failed: connection refused
Causes:
- PostgreSQL not running
- Wrong connection string
- Firewall blocking port
- Authentication failure
Solutions:
-
Check PostgreSQL is running:
pg_isready -h localhost -p 5432 -
Verify connection string format:
postgresql://user:password@host:port/database -
Check firewall settings:
# Check if port is accessible nc -zv localhost 5432 -
Test connection directly:
psql "postgresql://user:pass@localhost/horizon_epoch"
S3 bucket not accessible
Symptoms:
Error: S3 error: Access Denied
Causes:
- Invalid credentials
- Bucket doesn’t exist
- Incorrect permissions
- Wrong region
Solutions:
-
Verify AWS credentials:
aws sts get-caller-identity -
Check bucket exists:
aws s3 ls s3://bucket-name -
Review IAM permissions (need s3:GetObject, s3:PutObject, s3:ListBucket)
-
Verify region matches bucket location:
aws s3api get-bucket-location --bucket bucket-name
SSH tunnel connection failed
Symptoms:
Error: SSH tunnel failed: Connection refused
Solutions:
-
Test SSH connection manually:
ssh -i ~/.ssh/key user@bastion -
Check SSH key permissions:
chmod 600 ~/.ssh/id_rsa -
Verify bastion host is accessible
-
Check known_hosts:
ssh-keyscan bastion.example.com >> ~/.ssh/known_hosts
Merge Issues
Merge conflict in table X
Symptoms:
CONFLICT (content): Merge conflict in users
Automatic merge failed; fix conflicts and then commit.
Understanding the conflict:
- Same record modified in both branches
- Same field changed to different values
Resolution:
-
View conflicts:
epoch conflicts show -
Resolve interactively:
epoch conflicts resolve --interactive -
Or accept one side:
epoch conflicts resolve --ours # Keep target branch epoch conflicts resolve --theirs # Keep source branch -
Complete merge:
epoch merge --continue -
Or abort:
epoch merge --abort
Merge base not found
Symptoms:
Error: Could not find common ancestor for branches
Causes:
- Branches have no common history
- Orphaned branch
- Corrupted commit graph
Solutions:
-
Check branch history:
epoch log main --oneline epoch log feature/branch --oneline -
Find common commits:
epoch log --all --graph -
If no common ancestor, use
--allow-unrelated:epoch merge feature/branch --allow-unrelated
Branch Issues
Branch not found
Symptoms:
Error: Branch 'feature/xyz' not found
Solutions:
-
List available branches:
epoch branch list -
Check for typos in branch name
-
Verify branch exists in repository:
epoch branch list --all
Cannot delete branch
Symptoms:
Error: Cannot delete branch 'feature/xyz': not fully merged
Solutions:
-
Merge the branch first:
epoch merge feature/xyz epoch branch delete feature/xyz -
Force delete (loses unmerged changes):
epoch branch delete feature/xyz --force
Table Issues
Table not found
Symptoms:
Error: Table 'users' not found
Solutions:
-
List registered tables:
epoch table list -
Register the table:
epoch table add users --location "postgresql://mydb/public.users" -
Check table exists in storage backend
Schema mismatch
Symptoms:
Error: Schema mismatch for table 'users': expected 5 columns, found 6
Causes:
- Table schema changed outside Horizon Epoch
- Uncommitted schema changes
Solutions:
-
View current schema:
epoch table show users --schema -
Refresh schema:
epoch table refresh users -
Commit schema changes:
epoch commit -m "Update schema for users table"
Performance Issues
Slow queries on branches
Symptoms:
- Queries on branches significantly slower than main
- Query times increase with branch age
Causes:
- Large overlay size
- Deep branch hierarchy
- Missing indexes
Solutions:
-
Check overlay size:
epoch branch info feature/branch --stats -
Materialize long-lived branches:
epoch branch materialize feature/branch -
Merge to reduce hierarchy depth
-
Add indexes to overlay tables
Slow commit operations
Symptoms:
- Commits take longer than expected
- Timeout during commit
Causes:
- Large number of changes
- Network latency
- Lock contention
Solutions:
-
Commit in smaller batches:
epoch commit --tables users -m "Part 1" epoch commit --tables orders -m "Part 2" -
Check for locks:
SELECT * FROM pg_locks WHERE relation = 'epoch_commits'::regclass; -
Increase timeout:
epoch commit -m "Large update" --timeout 600
Authentication Issues
Vault authentication failed
Symptoms:
Error: Vault authentication failed: permission denied
Solutions:
-
Check Vault connectivity:
vault status -
Verify token/credentials:
vault token lookup -
Check policy permissions:
vault policy read horizon-epoch -
For AppRole, verify role_id and secret_id are correct
AWS credentials expired
Symptoms:
Error: ExpiredTokenException: The security token included in the request is expired
Solutions:
-
For IAM roles, credentials refresh automatically
-
For SSO:
aws sso login -
For access keys, rotate them:
aws iam create-access-key
Data Integrity Issues
Corrupted commit
Symptoms:
Error: Commit 'abc123' is corrupted or missing
Solutions:
-
Check commit exists:
epoch show abc123 -
Verify metadata database integrity:
SELECT * FROM epoch_commits WHERE id = 'abc123'; -
If commit is referenced but missing, contact support
Orphaned records
Symptoms:
- Records in overlay not connected to commits
- Disk usage growing unexpectedly
Solutions:
-
Run garbage collection:
epoch gc -
Check for orphaned overlays:
epoch gc --dry-run -
Manually clean up if needed:
epoch gc --force
Certificate/TLS Issues
Certificate expired
Symptoms:
Error: SSL error: certificate has expired
Causes:
- Client certificate expired
- Server certificate expired
- CA certificate expired
Solutions:
-
Check certificate expiry:
# Client certificate openssl x509 -enddate -noout -in /path/to/client.crt # Server certificate (remote) openssl s_client -connect db.example.com:5432 -starttls postgres 2>/dev/null | \ openssl x509 -noout -enddate -
Renew the certificate:
# If using Vault PKI vault write pki/issue/my-role common_name="client" # Manual renewal - contact your CA -
If using Vault dynamic certificates, check renewal is working:
epoch doctor --check vault
Certificate verification failed
Symptoms:
Error: SSL error: certificate verify failed
Error: CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate
Causes:
- Wrong CA certificate
- Incomplete certificate chain
- Self-signed certificate not trusted
- Server hostname mismatch
Solutions:
-
Verify CA certificate is correct:
# Check certificate chain openssl verify -CAfile /path/to/ca.crt /path/to/client.crt -
Check certificate chain is complete:
# View full chain openssl crl2pkcs7 -nocrl -certfile /path/to/cert.pem | \ openssl pkcs7 -print_certs -noout -
For hostname mismatch, check server certificate SANs:
openssl x509 -noout -text -in server.crt | grep -A1 "Subject Alternative Name" -
Verify you’re connecting to the correct hostname matching the certificate
Permission denied reading certificate files
Symptoms:
Error: could not load private key file: Permission denied
Error: could not load certificate file: No such file or directory
Solutions:
-
Check file permissions:
ls -la /path/to/client.key # Should show -rw------- (600) or -r-------- (400) chmod 600 /path/to/client.key chmod 644 /path/to/client.crt -
Verify file exists and path is correct:
ls -la /path/to/client.crt /path/to/client.key /path/to/ca.crt -
Check process user has access:
# Run as same user sudo -u epoch_user cat /path/to/client.crt
Key doesn’t match certificate
Symptoms:
Error: key values mismatch
Error: SSL error: private key does not match certificate
Solutions:
-
Verify key matches certificate:
# Compare modulus hashes - they should match openssl x509 -noout -modulus -in client.crt | openssl md5 openssl rsa -noout -modulus -in client.key | openssl md5 -
If mismatched, regenerate key/cert pair or locate correct files
RDS IAM Authentication Issues
Failed to generate IAM auth token
Symptoms:
Error: Failed to generate RDS IAM authentication token
Error: The security token included in the request is invalid
Causes:
- IAM role doesn’t have rds-db:connect permission
- Wrong AWS region
- Invalid instance endpoint
Solutions:
-
Verify IAM policy allows rds-db:connect:
{ "Effect": "Allow", "Action": "rds-db:connect", "Resource": "arn:aws:rds-db:REGION:ACCOUNT:dbuser:DBI_RESOURCE_ID/DB_USER" } -
Check AWS region matches RDS instance:
aws rds describe-db-instances --db-instance-identifier mydb \ --query 'DBInstances[0].DBInstanceArn' -
Verify you’re using the correct endpoint:
# epoch.toml - use the actual endpoint, not a custom DNS [storage.postgres.mydb] host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com" use_iam_auth = true
RDS IAM token expired immediately
Symptoms:
Error: PAM authentication failed
Error: password authentication failed for user "iam_user"
Causes:
- System clock skew
- Token generated too far in advance
- Wrong database user
Solutions:
-
Check system time is accurate:
date -u # Compare with actual UTC time # Sync time if needed sudo ntpdate pool.ntp.org -
Verify database user is configured for IAM:
-- In RDS CREATE USER iam_user WITH LOGIN; GRANT rds_iam TO iam_user; -
Check the token is being generated correctly:
# Generate token manually for testing aws rds generate-db-auth-token \ --hostname mydb.cluster-xxx.us-east-1.rds.amazonaws.com \ --port 5432 \ --username iam_user
Cannot assume role for RDS IAM
Symptoms:
Error: User is not authorized to perform: sts:AssumeRole
Error: Access denied when assuming role for RDS authentication
Solutions:
-
Check assume role trust policy:
aws iam get-role --role-name MyRdsRole \ --query 'Role.AssumeRolePolicyDocument' -
Verify trust policy allows your principal:
{ "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::ACCOUNT:role/YourExecutionRole" }, "Action": "sts:AssumeRole" } -
Check for external_id requirement:
[storage.s3.mydb] assume_role_arn = "arn:aws:iam::123456789012:role/RdsRole" external_id = "required-external-id" # If configured
Credential Refresh Issues
Credential refresh failed
Symptoms:
Error: Credential refresh failed: provider returned error
Error: Failed to refresh credentials: connection refused
Causes:
- Credential provider unavailable
- Network connectivity issues
- Token/secret expired
Solutions:
-
Check provider connectivity:
# For Vault vault status # For AWS aws sts get-caller-identity -
Verify credentials haven’t expired beyond refresh:
# Check Vault token vault token lookup # Check current credentials are valid epoch doctor --check credentials -
Force credential refresh:
# Clear cached credentials epoch config cache clear # Re-authenticate epoch auth login
Credentials expired and refresh not supported
Symptoms:
Error: Credentials expired
Error: Refresh not supported for static credentials
Causes:
- Using static credentials without refresh capability
- Credential cache disabled
- Refresh interval too long
Solutions:
-
Use a provider that supports refresh:
# Instead of static password [storage.postgres.mydb] vault_path = "secret/data/mydb" # Dynamic refresh # Or environment variables (refresh on re-read) [storage.postgres.mydb] url = "${DB_URL}" -
Enable credential caching with refresh:
[credentials] cache_enabled = true cache_ttl = 300 refresh_before_expiry = 60 # Refresh 60s before expiry
Token lease expired (Vault)
Symptoms:
Error: Vault token expired
Error: permission denied (token expired)
Solutions:
-
Check token status:
vault token lookup -
Renew token if renewable:
vault token renew -
For non-renewable tokens, re-authenticate:
# AppRole vault write auth/approle/login \ role_id=$ROLE_ID \ secret_id=$SECRET_ID -
Configure automatic renewal:
[vault] auto_renew_token = true renew_threshold = 0.7 # Renew when 70% of TTL elapsed
Multi-Backend Routing Issues
Storage backend not found
Symptoms:
Error: Storage backend 'mydb' not found
Error: Unknown storage location: postgresql://mydb/...
Causes:
- Backend not configured
- Typo in backend name
- Configuration not loaded
Solutions:
-
List configured backends:
epoch config show storage -
Check configuration file:
# epoch.toml [storage.postgres.mydb] # Backend name is 'mydb' url = "postgresql://..." -
Verify configuration is loaded:
epoch doctor --check config
Wrong backend selected for table
Symptoms:
Error: Table 'users' not found in storage 'datalake'
Error: Cannot access PostgreSQL table through S3 backend
Causes:
- Table registered with wrong backend
- Backend mismatch in storage location
Solutions:
-
Check table registration:
epoch table show users -
Re-register with correct backend:
epoch table remove users epoch table add users \ --location "postgresql://correct_backend/public.users" -
Verify storage location format:
PostgreSQL: postgresql://backend_name/schema.table S3: s3://backend_name/path/to/table
Cannot connect to multiple backends simultaneously
Symptoms:
Error: Connection pool exhausted
Error: Too many connections
Causes:
- Pool size too small for multi-backend operations
- Connection leak
- Long-running transactions
Solutions:
-
Increase pool sizes:
[storage.postgres.backend1] pool_size = 20 [storage.postgres.backend2] pool_size = 20 -
Check for connection leaks:
-- PostgreSQL SELECT * FROM pg_stat_activity WHERE application_name LIKE '%epoch%'; -
Set idle connection timeouts:
[storage.postgres.mydb] idle_timeout = 300 # Close idle connections after 5 min
Cross-backend operations failing
Symptoms:
Error: Cannot merge tables from different storage backends
Error: Cross-storage operation not supported
Causes:
- Tables on different backends can’t be joined directly
- Merge requires compatible storage types
Solutions:
-
Understand cross-backend limitations:
- Merges work within same storage type
- Queries are executed per-backend
- Results merged in memory
-
For cross-backend data access, use Python SDK:
# Read from both backends users = client.query_table("users", backend="postgres1") orders = client.query_table("orders", backend="postgres2") # Join in Python/Pandas merged = users.merge(orders, on="user_id") -
Consider consolidating frequently-joined tables on same backend
Backend credentials conflict
Symptoms:
Error: Credential 'default' conflicts with existing credential
Warning: Multiple credentials found for backend 'mydb'
Solutions:
-
Use unique credential names per backend:
[storage.postgres.backend1] vault_path = "secret/data/db1" [storage.postgres.backend2] vault_path = "secret/data/db2" # Different path -
Clear credential cache if switching configurations:
epoch config cache clear
Getting Help
If these solutions don’t resolve your issue:
-
Check logs:
epoch --verbose command -
Enable debug logging:
RUST_LOG=debug epoch command -
Report issues:
- Contact Horizon Analytic Studios for support
- Include: version (
epoch --version), error message, steps to reproduce