Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

How to: DDL Migrations

This guide covers generating DDL migration scripts, integrating with external migration tools, and safely applying schema changes to production databases.

Prerequisites

  • Horizon Epoch installed (Installation Guide)
  • PostgreSQL database with schema versioning enabled
  • Understanding of DDL and database migrations

Generating Migrations with PostgresDdlGenerator

Horizon Epoch includes a DDL generator that produces properly ordered PostgreSQL migration scripts.

Basic Migration Generation

# Generate migration between branches
epoch schema diff main feature-branch --show-ddl

# Output migration to file
epoch schema diff main feature-branch --ddl-output migrations/V002__add_preferences.sql

Migration Script Structure

Generated migrations include both UP and DOWN sections:

-- Migration: main -> feature/add-preferences
-- Generated: 2024-12-18T10:30:00Z

-- ============================================================
-- UP: Apply changes
-- ============================================================

-- Create new enum types first
CREATE TYPE notification_type AS ENUM ('email', 'sms', 'push');

-- Drop constraints that will be modified
ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;

-- Add new columns
ALTER TABLE users ADD COLUMN preferences JSONB;
ALTER TABLE users ADD COLUMN notification_pref notification_type DEFAULT 'email';

-- Add new constraints
ALTER TABLE users ADD CONSTRAINT chk_users_email
    CHECK (email ~ '^[^@]+@[^@]+\.[^@]+$');

-- Create indexes
CREATE INDEX CONCURRENTLY idx_users_preferences
    ON users USING gin (preferences);

-- ============================================================
-- DOWN: Reverse changes
-- ============================================================

DROP INDEX CONCURRENTLY IF EXISTS idx_users_preferences;

ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;

ALTER TABLE users DROP COLUMN IF EXISTS notification_pref;
ALTER TABLE users DROP COLUMN IF EXISTS preferences;

-- WARNING: Cannot remove enum values in PostgreSQL
-- Enum 'notification_type' preserved (manual cleanup required)

Statement Ordering

The DDL generator follows a specific order to handle dependencies:

UP (Apply) Order:

  1. Create new enum types (before columns using them)
  2. Add values to modified enums
  3. Drop constraints being modified/removed (FK first, then others)
  4. Drop indexes being modified/removed
  5. Handle primary key changes
  6. Add/modify columns
  7. Add new constraints (check, unique, exclusion, then FK last)
  8. Create new indexes

DOWN (Reverse) Order:

  1. Drop new indexes
  2. Drop new constraints (FK first)
  3. Drop new columns
  4. Re-add dropped columns (if data preserved)
  5. Re-add dropped constraints
  6. Re-add dropped indexes

DDL Safety Features

Dry-Run Mode

Preview changes without applying them:

# Show what would be executed
epoch schema apply users --from feature-branch --dry-run

# Output:
# Dry-run mode - no changes will be applied
#
# Statements to execute:
#   1. ALTER TABLE users ADD COLUMN preferences JSONB
#   2. CREATE INDEX CONCURRENTLY idx_users_preferences ON users USING gin (preferences)
#
# Estimated impact:
#   Table size: 2.3 GB
#   Row count: 15,000,000
#   Recommended: Use --online mode for large table

Destructive Operation Warnings

The DDL generator identifies and warns about destructive operations:

epoch schema diff main feature-branch --show-ddl

# WARNING: Destructive operations detected:
#   - DROP COLUMN: users.legacy_field
#   - ALTER COLUMN TYPE: orders.amount (numeric -> integer, may lose precision)
#   - DROP INDEX: idx_users_legacy
#
# Use --force to generate DDL anyway

Destructive Operation Types:

OperationRisk LevelDescription
DROP COLUMNHighData permanently lost
DROP TABLECriticalTable and all data lost
ALTER COLUMN TYPEMediumMay fail or lose precision
DROP CONSTRAINTLowConstraint removed
DROP INDEXLowIndex removed (can rebuild)
DROP TYPEMediumEnum type removed
TRUNCATE TABLECriticalAll data removed

Lock Timeout Configuration

Prevent long-running locks on production tables:

# Set lock timeout (default: 30 seconds)
epoch schema apply users --from feature-branch --lock-timeout 10s

# Set statement timeout (default: 5 minutes)
epoch schema apply users --from feature-branch --statement-timeout 2m

Retry Logic

Configure automatic retries for transient failures:

# Retry up to 5 times with exponential backoff
epoch schema apply users --from feature-branch --retries 5

# Configure backoff
epoch schema apply users --from feature-branch \
    --retries 5 \
    --initial-delay 500ms \
    --max-delay 30s

Integration with External Tools

pt-online-schema-change (Percona Toolkit for MySQL)

While Horizon Epoch primarily targets PostgreSQL, you can generate compatible scripts for MySQL migrations:

# Generate MySQL-compatible DDL
epoch schema diff main feature-branch --dialect mysql --ddl-output migration.sql

# Use with pt-online-schema-change
pt-online-schema-change \
    --execute \
    --alter "ADD COLUMN preferences JSON" \
    D=mydb,t=users

Recommended workflow:

  1. Generate migration with Epoch
  2. Extract ALTER TABLE statements
  3. Apply using pt-online-schema-change for large tables
  4. Update Epoch commit after successful application
# After applying via pt-osc
epoch commit -m "Applied preferences column via pt-osc" --rescan-schema

pg_repack (PostgreSQL)

For table reorganization and bloat reduction:

# Identify tables needing repack
epoch schema status --check-bloat

# Output:
# Table      | Size    | Dead Tuples | Bloat % | Recommendation
# -----------|---------|-------------|---------|----------------
# users      | 2.3 GB  | 150,000     | 15%     | Consider pg_repack
# orders     | 8.1 GB  | 2,500,000   | 28%     | Recommended

# Run pg_repack
pg_repack -d mydb -t users

# Rescan schema after repack
epoch commit -m "Reorganized users table" --rescan-schema

Using pg_repack with index changes:

# Generate index DDL
epoch schema diff main feature-branch --indexes-only --ddl-output indexes.sql

# Create indexes with pg_repack (minimal locking)
pg_repack -d mydb -t users --only-indexes

# Or use CONCURRENTLY option in Epoch
epoch schema apply users --from feature-branch --online

Flyway Integration

Export migrations for Flyway:

# Generate Flyway-compatible migration
epoch schema diff main feature-branch \
    --format flyway \
    --ddl-output db/migration/V2__add_preferences.sql

# Migration file follows Flyway naming convention
cat db/migration/V2__add_preferences.sql

Liquibase Integration

Export migrations for Liquibase:

# Generate Liquibase changeset
epoch schema diff main feature-branch \
    --format liquibase \
    --ddl-output db/changelog/changes/002-add-preferences.xml

Example output:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog">
    <changeSet id="002-add-preferences" author="epoch">
        <addColumn tableName="users">
            <column name="preferences" type="JSONB"/>
        </addColumn>
        <createIndex indexName="idx_users_preferences" tableName="users">
            <column name="preferences" type="gin"/>
        </createIndex>
        <rollback>
            <dropIndex indexName="idx_users_preferences" tableName="users"/>
            <dropColumn tableName="users" columnName="preferences"/>
        </rollback>
    </changeSet>
</databaseChangeLog>

Online Mode for Production

Using CONCURRENTLY

For indexes on large tables, use online mode to avoid blocking:

# Create indexes without blocking writes
epoch schema apply users --from feature-branch --online

# This generates:
# CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
#
# Instead of:
# CREATE INDEX idx_users_email ON users (email);

Limitations of CONCURRENTLY

  • Cannot run inside a transaction
  • May take longer than regular index creation
  • Requires more resources (CPU, I/O)
  • Index must be validated after creation
# Check index validity after CONCURRENTLY creation
epoch schema status users --check-indexes

# Output:
# Index                    | Valid | Size   | Last Used
# -------------------------|-------|--------|----------
# idx_users_email          | Yes   | 45 MB  | 2 min ago
# idx_users_preferences    | No    | 0 MB   | Never

If an index is invalid:

# Reindex with CONCURRENTLY
epoch schema apply users --reindex idx_users_preferences --online

Transactional DDL

PostgreSQL supports transactional DDL, allowing atomic migrations:

# Apply migration in a transaction (default)
epoch schema apply users --from feature-branch

# Disable transaction (required for CONCURRENTLY)
epoch schema apply users --from feature-branch --no-transaction --online

What happens on failure:

# With transaction (default):
# - All statements run in one transaction
# - On failure, all changes rolled back
# - Database remains unchanged

# Without transaction (--no-transaction):
# - Statements run individually
# - On failure, partial changes may remain
# - Manual cleanup may be needed

Table Size Considerations

Small Tables (< 100MB)

Standard migration is safe:

epoch schema apply users --from feature-branch

Medium Tables (100MB - 1GB)

Use lock timeouts:

epoch schema apply users --from feature-branch \
    --lock-timeout 10s \
    --retries 3

Large Tables (1GB - 10GB)

Use online mode:

epoch schema apply users --from feature-branch \
    --online \
    --statement-timeout 30m

Very Large Tables (> 10GB)

Consider specialized tools:

# 1. Generate DDL
epoch schema diff main feature-branch --ddl-output migration.sql

# 2. Review for large table operations
epoch schema diff main feature-branch --analyze-impact

# 3. Use pg_repack or manual process
pg_repack -d mydb -t users --no-superuser-check

Python SDK Usage

import asyncio
from horizon_epoch import Client

async def generate_migration():
    async with Client.connect("postgresql://localhost/horizon_epoch") as client:
        # Get constraint diff between branches
        diff = await client.diff_constraints("main", "feature-branch", table_name="users")

# Generate migration
generator = PostgresDdlGenerator()
migration = generator.generate_migration("users", diff.constraints)

# Print statements
print("UP statements:")
for stmt in migration.up:
    print(f"  {stmt};")

print("\nDOWN statements:")
for stmt in migration.down:
    print(f"  {stmt};")

# Check for warnings
if migration.warnings:
    print("\nWARNINGS:")
    for warn in migration.warnings:
        print(f"  - {warn}")

# Write to file
migration.write_to_file("migrations/V002__update_users.sql")

# Execute with safety features
from horizon_epoch.ddl import DdlExecutor, ExecutionConfig

executor = DdlExecutor(client.connection)
config = ExecutionConfig(
    dry_run=True,
    lock_timeout_seconds=10,
    statement_timeout_seconds=300,
    online=True,
)

result = executor.execute(migration.up, config)
print(f"Executed {result.statements_run} statements")

Best Practices

1. Always Use Dry-Run First

epoch schema apply users --from feature-branch --dry-run

2. Backup Before Destructive Changes

# Export table data
epoch export users --format parquet > backup/users_$(date +%Y%m%d).parquet

# Then apply
epoch schema apply users --from feature-branch --force

3. Use Online Mode for Production

epoch schema apply users --from feature-branch --online

4. Set Appropriate Timeouts

epoch schema apply users --from feature-branch \
    --lock-timeout 10s \
    --statement-timeout 5m

5. Monitor Long-Running Migrations

# In another terminal
epoch schema apply --status

# Or query PostgreSQL directly
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE query LIKE 'ALTER%' OR query LIKE 'CREATE INDEX%';

See Also