Common Issues

Solutions to frequently encountered problems with Horizon Epoch.

Connection Problems

Could not connect to metadata database

Symptoms:

Error: Database connection failed: connection refused

Causes:

PostgreSQL not running
Wrong connection string
Firewall blocking port
Authentication failure

Solutions:

Check PostgreSQL is running:
```
pg_isready -h localhost -p 5432
```

Verify connection string format:

postgresql://user:password@host:port/database

Check firewall settings:

# Check if port is accessible
nc -zv localhost 5432

Test connection directly:

psql "postgresql://user:pass@localhost/horizon_epoch"

S3 bucket not accessible

Symptoms:

Error: S3 error: Access Denied

Causes:

Invalid credentials
Bucket doesn’t exist
Incorrect permissions
Wrong region

Solutions:

Verify AWS credentials:
```
aws sts get-caller-identity
```
Check bucket exists:
```
aws s3 ls s3://bucket-name
```
Review IAM permissions (need s3:GetObject, s3:PutObject, s3:ListBucket)

Verify region matches bucket location:

aws s3api get-bucket-location --bucket bucket-name

SSH tunnel connection failed

Symptoms:

Error: SSH tunnel failed: Connection refused

Solutions:

Test SSH connection manually:
```
ssh -i ~/.ssh/key user@bastion
```
Check SSH key permissions:
```
chmod 600 ~/.ssh/id_rsa
```
Verify bastion host is accessible

Check known_hosts:

ssh-keyscan bastion.example.com >> ~/.ssh/known_hosts

Merge Issues

Merge conflict in table X

Symptoms:

CONFLICT (content): Merge conflict in users
Automatic merge failed; fix conflicts and then commit.

Understanding the conflict:

Same record modified in both branches
Same field changed to different values

Resolution:

View conflicts:
```
epoch conflicts show
```
Resolve interactively:
```
epoch conflicts resolve --interactive
```

Or accept one side:

epoch conflicts resolve --ours    # Keep target branch
epoch conflicts resolve --theirs  # Keep source branch

Complete merge:
```
epoch merge --continue
```
Or abort:
```
epoch merge --abort
```

Merge base not found

Symptoms:

Error: Could not find common ancestor for branches

Causes:

Branches have no common history
Orphaned branch
Corrupted commit graph

Solutions:

Check branch history:

epoch log main --oneline
epoch log feature/branch --oneline

Find common commits:
```
epoch log --all --graph
```

If no common ancestor, use --allow-unrelated:

epoch merge feature/branch --allow-unrelated

Branch Issues

Branch not found

Symptoms:

Error: Branch 'feature/xyz' not found

Solutions:

List available branches:
```
epoch branch list
```
Check for typos in branch name
Verify branch exists in repository:
```
epoch branch list --all
```

Cannot delete branch

Symptoms:

Error: Cannot delete branch 'feature/xyz': not fully merged

Solutions:

Merge the branch first:

epoch merge feature/xyz
epoch branch delete feature/xyz

Force delete (loses unmerged changes):

epoch branch delete feature/xyz --force

Table Issues

Table not found

Symptoms:

Error: Table 'users' not found

Solutions:

List registered tables:
```
epoch table list
```

epoch table add users --location "postgresql://mydb/public.users"

Check table exists in storage backend

Schema mismatch

Symptoms:

Error: Schema mismatch for table 'users': expected 5 columns, found 6

Causes:

Table schema changed outside Horizon Epoch
Uncommitted schema changes

Solutions:

View current schema:
```
epoch table show users --schema
```
Refresh schema:
```
epoch table refresh users
```

Commit schema changes:

epoch commit -m "Update schema for users table"

Performance Issues

Slow queries on branches

Symptoms:

Queries on branches significantly slower than main
Query times increase with branch age

Causes:

Large overlay size
Deep branch hierarchy
Missing indexes

Solutions:

Check overlay size:

epoch branch info feature/branch --stats

Materialize long-lived branches:

epoch branch materialize feature/branch

Merge to reduce hierarchy depth
Add indexes to overlay tables

Slow commit operations

Symptoms:

Commits take longer than expected
Timeout during commit

Causes:

Large number of changes
Network latency
Lock contention

Solutions:

Commit in smaller batches:

epoch commit --tables users -m "Part 1"
epoch commit --tables orders -m "Part 2"

Check for locks:

SELECT * FROM pg_locks WHERE relation = 'epoch_commits'::regclass;

Increase timeout:

epoch commit -m "Large update" --timeout 600

Authentication Issues

Vault authentication failed

Symptoms:

Error: Vault authentication failed: permission denied

Solutions:

Check Vault connectivity:
```
vault status
```
Verify token/credentials:
```
vault token lookup
```
Check policy permissions:
```
vault policy read horizon-epoch
```
For AppRole, verify role_id and secret_id are correct

AWS credentials expired

Symptoms:

Error: ExpiredTokenException: The security token included in the request is expired

Solutions:

For IAM roles, credentials refresh automatically
For SSO:
```
aws sso login
```
For access keys, rotate them:
```
aws iam create-access-key
```

Data Integrity Issues

Corrupted commit

Symptoms:

Error: Commit 'abc123' is corrupted or missing

Solutions:

Check commit exists:
```
epoch show abc123
```

Verify metadata database integrity:

SELECT * FROM epoch_commits WHERE id = 'abc123';

If commit is referenced but missing, contact support

Orphaned records

Symptoms:

Records in overlay not connected to commits
Disk usage growing unexpectedly

Solutions:

Run garbage collection:
```
epoch gc
```
Check for orphaned overlays:
```
epoch gc --dry-run
```
Manually clean up if needed:
```
epoch gc --force
```

Certificate/TLS Issues

Certificate expired

Symptoms:

Error: SSL error: certificate has expired

Causes:

Client certificate expired
Server certificate expired
CA certificate expired

Solutions:

Check certificate expiry:

# Client certificate
openssl x509 -enddate -noout -in /path/to/client.crt

# Server certificate (remote)
openssl s_client -connect db.example.com:5432 -starttls postgres 2>/dev/null | \
  openssl x509 -noout -enddate

Renew the certificate:

# If using Vault PKI
vault write pki/issue/my-role common_name="client"

# Manual renewal - contact your CA

If using Vault dynamic certificates, check renewal is working:
```
epoch doctor --check vault
```

Certificate verification failed

Symptoms:

Error: SSL error: certificate verify failed
Error: CERTIFICATE_VERIFY_FAILED: unable to get local issuer certificate

Causes:

Wrong CA certificate
Incomplete certificate chain
Self-signed certificate not trusted
Server hostname mismatch

Solutions:

Verify CA certificate is correct:

# Check certificate chain
openssl verify -CAfile /path/to/ca.crt /path/to/client.crt

Check certificate chain is complete:

# View full chain
openssl crl2pkcs7 -nocrl -certfile /path/to/cert.pem | \
  openssl pkcs7 -print_certs -noout

For hostname mismatch, check server certificate SANs:

openssl x509 -noout -text -in server.crt | grep -A1 "Subject Alternative Name"

Verify you’re connecting to the correct hostname matching the certificate

Permission denied reading certificate files

Symptoms:

Error: could not load private key file: Permission denied
Error: could not load certificate file: No such file or directory

Solutions:

Check file permissions:

ls -la /path/to/client.key
# Should show -rw------- (600) or -r-------- (400)

chmod 600 /path/to/client.key
chmod 644 /path/to/client.crt

Verify file exists and path is correct:

ls -la /path/to/client.crt /path/to/client.key /path/to/ca.crt

Check process user has access:

# Run as same user
sudo -u epoch_user cat /path/to/client.crt

Key doesn’t match certificate

Symptoms:

Error: key values mismatch
Error: SSL error: private key does not match certificate

Solutions:

Verify key matches certificate:

# Compare modulus hashes - they should match
openssl x509 -noout -modulus -in client.crt | openssl md5
openssl rsa -noout -modulus -in client.key | openssl md5

If mismatched, regenerate key/cert pair or locate correct files

RDS IAM Authentication Issues

Failed to generate IAM auth token

Symptoms:

Error: Failed to generate RDS IAM authentication token
Error: The security token included in the request is invalid

Causes:

IAM role doesn’t have rds-db:connect permission
Wrong AWS region
Invalid instance endpoint

Solutions:

Verify IAM policy allows rds-db:connect:

{
  "Effect": "Allow",
  "Action": "rds-db:connect",
  "Resource": "arn:aws:rds-db:REGION:ACCOUNT:dbuser:DBI_RESOURCE_ID/DB_USER"
}

Check AWS region matches RDS instance:

aws rds describe-db-instances --db-instance-identifier mydb \
  --query 'DBInstances[0].DBInstanceArn'

Verify you’re using the correct endpoint:

# epoch.toml - use the actual endpoint, not a custom DNS
[storage.postgres.mydb]
host = "mydb.cluster-xxx.us-east-1.rds.amazonaws.com"
use_iam_auth = true

RDS IAM token expired immediately

Symptoms:

Error: PAM authentication failed
Error: password authentication failed for user "iam_user"

Causes:

System clock skew
Token generated too far in advance
Wrong database user

Solutions:

Check system time is accurate:

date -u
# Compare with actual UTC time

# Sync time if needed
sudo ntpdate pool.ntp.org

Verify database user is configured for IAM:

-- In RDS
CREATE USER iam_user WITH LOGIN;
GRANT rds_iam TO iam_user;

Check the token is being generated correctly:

# Generate token manually for testing
aws rds generate-db-auth-token \
  --hostname mydb.cluster-xxx.us-east-1.rds.amazonaws.com \
  --port 5432 \
  --username iam_user

Cannot assume role for RDS IAM

Symptoms:

Error: User is not authorized to perform: sts:AssumeRole
Error: Access denied when assuming role for RDS authentication

Solutions:

Check assume role trust policy:

aws iam get-role --role-name MyRdsRole \
  --query 'Role.AssumeRolePolicyDocument'

Verify trust policy allows your principal:

{
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::ACCOUNT:role/YourExecutionRole"
  },
  "Action": "sts:AssumeRole"
}

Check for external_id requirement:

[storage.s3.mydb]
assume_role_arn = "arn:aws:iam::123456789012:role/RdsRole"
external_id = "required-external-id"  # If configured

Credential Refresh Issues

Credential refresh failed

Symptoms:

Error: Credential refresh failed: provider returned error
Error: Failed to refresh credentials: connection refused

Causes:

Credential provider unavailable
Network connectivity issues
Token/secret expired

Solutions:

Check provider connectivity:

# For Vault
vault status

# For AWS
aws sts get-caller-identity

Verify credentials haven’t expired beyond refresh:

# Check Vault token
vault token lookup

# Check current credentials are valid
epoch doctor --check credentials

Force credential refresh:

# Clear cached credentials
epoch config cache clear

# Re-authenticate
epoch auth login

Credentials expired and refresh not supported

Symptoms:

Error: Credentials expired
Error: Refresh not supported for static credentials

Causes:

Using static credentials without refresh capability
Credential cache disabled
Refresh interval too long

Solutions:

Use a provider that supports refresh:

# Instead of static password
[storage.postgres.mydb]
vault_path = "secret/data/mydb"  # Dynamic refresh

# Or environment variables (refresh on re-read)
[storage.postgres.mydb]
url = "${DB_URL}"

Enable credential caching with refresh:

[credentials]
cache_enabled = true
cache_ttl = 300
refresh_before_expiry = 60  # Refresh 60s before expiry

Token lease expired (Vault)

Symptoms:

Error: Vault token expired
Error: permission denied (token expired)

Solutions:

Check token status:
```
vault token lookup
```
Renew token if renewable:
```
vault token renew
```

For non-renewable tokens, re-authenticate:

# AppRole
vault write auth/approle/login \
  role_id=$ROLE_ID \
  secret_id=$SECRET_ID

Configure automatic renewal:

[vault]
auto_renew_token = true
renew_threshold = 0.7  # Renew when 70% of TTL elapsed

Multi-Backend Routing Issues

Storage backend not found

Symptoms:

Error: Storage backend 'mydb' not found
Error: Unknown storage location: postgresql://mydb/...

Causes:

Backend not configured
Typo in backend name
Configuration not loaded

Solutions:

List configured backends:
```
epoch config show storage
```

Check configuration file:

# epoch.toml
[storage.postgres.mydb]  # Backend name is 'mydb'
url = "postgresql://..."

Verify configuration is loaded:
```
epoch doctor --check config
```

Wrong backend selected for table

Symptoms:

Error: Table 'users' not found in storage 'datalake'
Error: Cannot access PostgreSQL table through S3 backend

Causes:

Table registered with wrong backend
Backend mismatch in storage location

Solutions:

Check table registration:
```
epoch table show users
```

Re-register with correct backend:

epoch table remove users
epoch table add users \
  --location "postgresql://correct_backend/public.users"

Verify storage location format:

PostgreSQL: postgresql://backend_name/schema.table
S3:         s3://backend_name/path/to/table

Cannot connect to multiple backends simultaneously

Symptoms:

Error: Connection pool exhausted
Error: Too many connections

Causes:

Pool size too small for multi-backend operations
Connection leak
Long-running transactions

Solutions:

Increase pool sizes:

[storage.postgres.backend1]
pool_size = 20

[storage.postgres.backend2]
pool_size = 20

Check for connection leaks:

-- PostgreSQL
SELECT * FROM pg_stat_activity
WHERE application_name LIKE '%epoch%';

Set idle connection timeouts:

[storage.postgres.mydb]
idle_timeout = 300  # Close idle connections after 5 min

Cross-backend operations failing

Symptoms:

Error: Cannot merge tables from different storage backends
Error: Cross-storage operation not supported

Causes:

Tables on different backends can’t be joined directly
Merge requires compatible storage types

Solutions:

Understand cross-backend limitations:
- Merges work within same storage type
- Queries are executed per-backend
- Results merged in memory

For cross-backend data access, use Python SDK:

# Read from both backends
users = client.query_table("users", backend="postgres1")
orders = client.query_table("orders", backend="postgres2")

# Join in Python/Pandas
merged = users.merge(orders, on="user_id")

Consider consolidating frequently-joined tables on same backend

Backend credentials conflict

Symptoms:

Error: Credential 'default' conflicts with existing credential
Warning: Multiple credentials found for backend 'mydb'

Solutions:

Use unique credential names per backend:

[storage.postgres.backend1]
vault_path = "secret/data/db1"

[storage.postgres.backend2]
vault_path = "secret/data/db2"  # Different path

Clear credential cache if switching configurations:
```
epoch config cache clear
```

Getting Help

If these solutions don’t resolve your issue:

Check logs:
```
epoch --verbose command
```
Enable debug logging:
```
RUST_LOG=debug epoch command
```
Report issues:
- Contact Horizon Analytic Studios for support
- Include: version (epoch --version), error message, steps to reproduce

Keyboard shortcuts

Horizon Epoch Documentation