How to: DDL Migrations

This guide covers generating DDL migration scripts, integrating with external migration tools, and safely applying schema changes to production databases.

Prerequisites

Horizon Epoch installed (Installation Guide)
PostgreSQL database with schema versioning enabled
Understanding of DDL and database migrations

Generating Migrations with PostgresDdlGenerator

Horizon Epoch includes a DDL generator that produces properly ordered PostgreSQL migration scripts.

Basic Migration Generation

# Generate migration between branches
epoch schema diff main feature-branch --show-ddl

# Output migration to file
epoch schema diff main feature-branch --ddl-output migrations/V002__add_preferences.sql

Migration Script Structure

Generated migrations include both UP and DOWN sections:

-- Migration: main -> feature/add-preferences
-- Generated: 2024-12-18T10:30:00Z

-- ============================================================
-- UP: Apply changes
-- ============================================================

-- Create new enum types first
CREATE TYPE notification_type AS ENUM ('email', 'sms', 'push');

-- Drop constraints that will be modified
ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;

-- Add new columns
ALTER TABLE users ADD COLUMN preferences JSONB;
ALTER TABLE users ADD COLUMN notification_pref notification_type DEFAULT 'email';

-- Add new constraints
ALTER TABLE users ADD CONSTRAINT chk_users_email
    CHECK (email ~ '^[^@]+@[^@]+\.[^@]+$');

-- Create indexes
CREATE INDEX CONCURRENTLY idx_users_preferences
    ON users USING gin (preferences);

-- ============================================================
-- DOWN: Reverse changes
-- ============================================================

DROP INDEX CONCURRENTLY IF EXISTS idx_users_preferences;

ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;

ALTER TABLE users DROP COLUMN IF EXISTS notification_pref;
ALTER TABLE users DROP COLUMN IF EXISTS preferences;

-- WARNING: Cannot remove enum values in PostgreSQL
-- Enum 'notification_type' preserved (manual cleanup required)

Statement Ordering

The DDL generator follows a specific order to handle dependencies:

UP (Apply) Order:

Create new enum types (before columns using them)
Add values to modified enums
Drop constraints being modified/removed (FK first, then others)
Drop indexes being modified/removed
Handle primary key changes
Add/modify columns
Add new constraints (check, unique, exclusion, then FK last)
Create new indexes

DOWN (Reverse) Order:

Drop new indexes
Drop new constraints (FK first)
Drop new columns
Re-add dropped columns (if data preserved)
Re-add dropped constraints
Re-add dropped indexes

DDL Safety Features

Dry-Run Mode

Preview changes without applying them:

# Show what would be executed
epoch schema apply users --from feature-branch --dry-run

# Output:
# Dry-run mode - no changes will be applied
#
# Statements to execute:
#   1. ALTER TABLE users ADD COLUMN preferences JSONB
#   2. CREATE INDEX CONCURRENTLY idx_users_preferences ON users USING gin (preferences)
#
# Estimated impact:
#   Table size: 2.3 GB
#   Row count: 15,000,000
#   Recommended: Use --online mode for large table

Destructive Operation Warnings

The DDL generator identifies and warns about destructive operations:

epoch schema diff main feature-branch --show-ddl

# WARNING: Destructive operations detected:
#   - DROP COLUMN: users.legacy_field
#   - ALTER COLUMN TYPE: orders.amount (numeric -> integer, may lose precision)
#   - DROP INDEX: idx_users_legacy
#
# Use --force to generate DDL anyway

Destructive Operation Types:

Operation	Risk Level	Description
DROP COLUMN	High	Data permanently lost
DROP TABLE	Critical	Table and all data lost
ALTER COLUMN TYPE	Medium	May fail or lose precision
DROP CONSTRAINT	Low	Constraint removed
DROP INDEX	Low	Index removed (can rebuild)
DROP TYPE	Medium	Enum type removed
TRUNCATE TABLE	Critical	All data removed

Lock Timeout Configuration

Prevent long-running locks on production tables:

# Set lock timeout (default: 30 seconds)
epoch schema apply users --from feature-branch --lock-timeout 10s

# Set statement timeout (default: 5 minutes)
epoch schema apply users --from feature-branch --statement-timeout 2m

Retry Logic

Configure automatic retries for transient failures:

# Retry up to 5 times with exponential backoff
epoch schema apply users --from feature-branch --retries 5

# Configure backoff
epoch schema apply users --from feature-branch \
    --retries 5 \
    --initial-delay 500ms \
    --max-delay 30s

Integration with External Tools

pt-online-schema-change (Percona Toolkit for MySQL)

While Horizon Epoch primarily targets PostgreSQL, you can generate compatible scripts for MySQL migrations:

# Generate MySQL-compatible DDL
epoch schema diff main feature-branch --dialect mysql --ddl-output migration.sql

# Use with pt-online-schema-change
pt-online-schema-change \
    --execute \
    --alter "ADD COLUMN preferences JSON" \
    D=mydb,t=users

Recommended workflow:

Generate migration with Epoch
Extract ALTER TABLE statements
Apply using pt-online-schema-change for large tables
Update Epoch commit after successful application

# After applying via pt-osc
epoch commit -m "Applied preferences column via pt-osc" --rescan-schema

pg_repack (PostgreSQL)

For table reorganization and bloat reduction:

# Identify tables needing repack
epoch schema status --check-bloat

# Output:
# Table      | Size    | Dead Tuples | Bloat % | Recommendation
# -----------|---------|-------------|---------|----------------
# users      | 2.3 GB  | 150,000     | 15%     | Consider pg_repack
# orders     | 8.1 GB  | 2,500,000   | 28%     | Recommended

# Run pg_repack
pg_repack -d mydb -t users

# Rescan schema after repack
epoch commit -m "Reorganized users table" --rescan-schema

Using pg_repack with index changes:

# Generate index DDL
epoch schema diff main feature-branch --indexes-only --ddl-output indexes.sql

# Create indexes with pg_repack (minimal locking)
pg_repack -d mydb -t users --only-indexes

# Or use CONCURRENTLY option in Epoch
epoch schema apply users --from feature-branch --online

Flyway Integration

Export migrations for Flyway:

# Generate Flyway-compatible migration
epoch schema diff main feature-branch \
    --format flyway \
    --ddl-output db/migration/V2__add_preferences.sql

# Migration file follows Flyway naming convention
cat db/migration/V2__add_preferences.sql

Liquibase Integration

Export migrations for Liquibase:

# Generate Liquibase changeset
epoch schema diff main feature-branch \
    --format liquibase \
    --ddl-output db/changelog/changes/002-add-preferences.xml

Example output:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog">
    <changeSet id="002-add-preferences" author="epoch">
        <addColumn tableName="users">
            <column name="preferences" type="JSONB"/>
        </addColumn>
        <createIndex indexName="idx_users_preferences" tableName="users">
            <column name="preferences" type="gin"/>
        </createIndex>
        <rollback>
            <dropIndex indexName="idx_users_preferences" tableName="users"/>
            <dropColumn tableName="users" columnName="preferences"/>
        </rollback>
    </changeSet>
</databaseChangeLog>

Online Mode for Production

Using CONCURRENTLY

For indexes on large tables, use online mode to avoid blocking:

# Create indexes without blocking writes
epoch schema apply users --from feature-branch --online

# This generates:
# CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
#
# Instead of:
# CREATE INDEX idx_users_email ON users (email);

Limitations of CONCURRENTLY

Cannot run inside a transaction
May take longer than regular index creation
Requires more resources (CPU, I/O)
Index must be validated after creation

# Check index validity after CONCURRENTLY creation
epoch schema status users --check-indexes

# Output:
# Index                    | Valid | Size   | Last Used
# -------------------------|-------|--------|----------
# idx_users_email          | Yes   | 45 MB  | 2 min ago
# idx_users_preferences    | No    | 0 MB   | Never

If an index is invalid:

# Reindex with CONCURRENTLY
epoch schema apply users --reindex idx_users_preferences --online

Transactional DDL

PostgreSQL supports transactional DDL, allowing atomic migrations:

# Apply migration in a transaction (default)
epoch schema apply users --from feature-branch

# Disable transaction (required for CONCURRENTLY)
epoch schema apply users --from feature-branch --no-transaction --online

What happens on failure:

# With transaction (default):
# - All statements run in one transaction
# - On failure, all changes rolled back
# - Database remains unchanged

# Without transaction (--no-transaction):
# - Statements run individually
# - On failure, partial changes may remain
# - Manual cleanup may be needed

Table Size Considerations

Small Tables (< 100MB)

Standard migration is safe:

epoch schema apply users --from feature-branch

Medium Tables (100MB - 1GB)

Use lock timeouts:

epoch schema apply users --from feature-branch \
    --lock-timeout 10s \
    --retries 3

Large Tables (1GB - 10GB)

Use online mode:

epoch schema apply users --from feature-branch \
    --online \
    --statement-timeout 30m

Very Large Tables (> 10GB)

Consider specialized tools:

# 1. Generate DDL
epoch schema diff main feature-branch --ddl-output migration.sql

# 2. Review for large table operations
epoch schema diff main feature-branch --analyze-impact

# 3. Use pg_repack or manual process
pg_repack -d mydb -t users --no-superuser-check

Python SDK Usage

import asyncio
from horizon_epoch import Client

async def generate_migration():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Get constraint diff between branches
        diff = await client.diff_constraints("main", "feature-branch", table_name="users")

# Generate migration
generator = PostgresDdlGenerator()
migration = generator.generate_migration("users", diff.constraints)

# Print statements
print("UP statements:")
for stmt in migration.up:
    print(f"  {stmt};")

print("\nDOWN statements:")
for stmt in migration.down:
    print(f"  {stmt};")

# Check for warnings
if migration.warnings:
    print("\nWARNINGS:")
    for warn in migration.warnings:
        print(f"  - {warn}")

# Write to file
migration.write_to_file("migrations/V002__update_users.sql")

# Execute with safety features
from horizon_epoch.ddl import DdlExecutor, ExecutionConfig

executor = DdlExecutor(client.connection)
config = ExecutionConfig(
    dry_run=True,
    lock_timeout_seconds=10,
    statement_timeout_seconds=300,
    online=True,
)

result = executor.execute(migration.up, config)
print(f"Executed {result.statements_run} statements")

Best Practices

1. Always Use Dry-Run First

epoch schema apply users --from feature-branch --dry-run

2. Backup Before Destructive Changes

# Export table data
epoch export users --format parquet > backup/users_$(date +%Y%m%d).parquet

# Then apply
epoch schema apply users --from feature-branch --force

3. Use Online Mode for Production

epoch schema apply users --from feature-branch --online

4. Set Appropriate Timeouts

epoch schema apply users --from feature-branch \
    --lock-timeout 10s \
    --statement-timeout 5m

5. Monitor Long-Running Migrations

# In another terminal
epoch schema apply --status

# Or query PostgreSQL directly
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE query LIKE 'ALTER%' OR query LIKE 'CREATE INDEX%';

Horizon Epoch Documentation