Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

How to Connect S3

Connect Horizon Epoch to S3-compatible storage for Delta Lake tables.

CLI Configuration

# Environment variables
export EPOCH_S3_ENDPOINT="http://localhost:9000"
export EPOCH_S3_BUCKET="horizon-data"
export EPOCH_S3_ACCESS_KEY="minioadmin"
export EPOCH_S3_SECRET_KEY="minioadmin"

# Add S3 storage backend
epoch storage add datalake \
    --type s3 \
    --bucket my-data-bucket \
    --region us-east-1

# Register a Delta table
epoch table add sales \
    --location "s3://my-bucket/delta/sales"

Python SDK

import asyncio
from horizon_epoch import Client, StorageBackend

async def main():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        await client.init("my-repo")

        # Add S3 storage backend
        await client.add_storage(
            name="datalake",
            backend=StorageBackend.S3,
            config={
                "bucket": "my-data-bucket",
                "region": "us-east-1"
            }
        )

asyncio.run(main())

Using Environment Variables

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"

The SDK will use AWS environment variables automatically.

MinIO / S3-Compatible Endpoints

# CLI
epoch storage add minio \
    --type s3 \
    --bucket horizon-data \
    --endpoint "http://localhost:9000" \
    --force-path-style
await client.add_storage(
    name="minio",
    backend=StorageBackend.S3,
    config={
        "bucket": "horizon-data",
        "endpoint": "http://localhost:9000",
        "access_key": "minioadmin",
        "secret_key": "minioadmin",
        "force_path_style": True
    }
)

IAM Roles (EC2/ECS/EKS)

When running on AWS infrastructure with IAM roles:

await client.add_storage(
    name="datalake",
    backend=StorageBackend.S3,
    config={
        "bucket": "my-data-bucket",
        "use_instance_credentials": True
    }
)

Assume Role

For cross-account access:

await client.add_storage(
    name="cross-account",
    backend=StorageBackend.S3,
    config={
        "bucket": "other-account-bucket",
        "assume_role_arn": "arn:aws:iam::123456789012:role/HorizonEpochAccess",
        "external_id": "optional-external-id"
    }
)

Registering Delta Tables

# CLI
epoch table add sales \
    --location "s3://my-bucket/delta/sales"
from horizon_epoch.client import _native

loc = _native.StorageLocation.s3("my-bucket", "delta/sales")
await client.track_table("sales", loc)

Table Paths

Tables are stored as Delta Lake tables:

s3://my-bucket/
  delta/
    sales/
      _delta_log/        # Delta transaction log
      part-00000.parquet
      part-00001.parquet
    users/
      _delta_log/
      part-00000.parquet

Configuration File

# ~/.epoch/config.toml
[storage.s3.default]
bucket = "horizon-data"
endpoint = "http://localhost:9000"
access_key = "minioadmin"
secret_key = "minioadmin"

Advanced Authentication

For enterprise environments:

Required Bucket Permissions

Minimum IAM permissions needed:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        }
    ]
}

Troubleshooting

Access Denied

  • Verify IAM permissions include all required actions
  • Check bucket policy allows access from your account/role
  • Ensure bucket region matches configuration

Endpoint Not Found

  • For MinIO, ensure force_path_style=True is set
  • Verify the endpoint URL is correct
  • Check network connectivity to the endpoint

Slow Operations

  • Consider enabling transfer acceleration for large files
  • Use regional endpoints when possible
  • Check for network latency to the S3 region