How to: DDL Migrations
This guide covers generating DDL migration scripts, integrating with external migration tools, and safely applying schema changes to production databases.
Prerequisites
- Horizon Epoch installed (Installation Guide)
- PostgreSQL database with schema versioning enabled
- Understanding of DDL and database migrations
Generating Migrations with PostgresDdlGenerator
Horizon Epoch includes a DDL generator that produces properly ordered PostgreSQL migration scripts.
Basic Migration Generation
# Generate migration between branches
epoch schema diff main feature-branch --show-ddl
# Output migration to file
epoch schema diff main feature-branch --ddl-output migrations/V002__add_preferences.sql
Migration Script Structure
Generated migrations include both UP and DOWN sections:
-- Migration: main -> feature/add-preferences
-- Generated: 2024-12-18T10:30:00Z
-- ============================================================
-- UP: Apply changes
-- ============================================================
-- Create new enum types first
CREATE TYPE notification_type AS ENUM ('email', 'sms', 'push');
-- Drop constraints that will be modified
ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;
-- Add new columns
ALTER TABLE users ADD COLUMN preferences JSONB;
ALTER TABLE users ADD COLUMN notification_pref notification_type DEFAULT 'email';
-- Add new constraints
ALTER TABLE users ADD CONSTRAINT chk_users_email
CHECK (email ~ '^[^@]+@[^@]+\.[^@]+$');
-- Create indexes
CREATE INDEX CONCURRENTLY idx_users_preferences
ON users USING gin (preferences);
-- ============================================================
-- DOWN: Reverse changes
-- ============================================================
DROP INDEX CONCURRENTLY IF EXISTS idx_users_preferences;
ALTER TABLE users DROP CONSTRAINT IF EXISTS chk_users_email;
ALTER TABLE users DROP COLUMN IF EXISTS notification_pref;
ALTER TABLE users DROP COLUMN IF EXISTS preferences;
-- WARNING: Cannot remove enum values in PostgreSQL
-- Enum 'notification_type' preserved (manual cleanup required)
Statement Ordering
The DDL generator follows a specific order to handle dependencies:
UP (Apply) Order:
- Create new enum types (before columns using them)
- Add values to modified enums
- Drop constraints being modified/removed (FK first, then others)
- Drop indexes being modified/removed
- Handle primary key changes
- Add/modify columns
- Add new constraints (check, unique, exclusion, then FK last)
- Create new indexes
DOWN (Reverse) Order:
- Drop new indexes
- Drop new constraints (FK first)
- Drop new columns
- Re-add dropped columns (if data preserved)
- Re-add dropped constraints
- Re-add dropped indexes
DDL Safety Features
Dry-Run Mode
Preview changes without applying them:
# Show what would be executed
epoch schema apply users --from feature-branch --dry-run
# Output:
# Dry-run mode - no changes will be applied
#
# Statements to execute:
# 1. ALTER TABLE users ADD COLUMN preferences JSONB
# 2. CREATE INDEX CONCURRENTLY idx_users_preferences ON users USING gin (preferences)
#
# Estimated impact:
# Table size: 2.3 GB
# Row count: 15,000,000
# Recommended: Use --online mode for large table
Destructive Operation Warnings
The DDL generator identifies and warns about destructive operations:
epoch schema diff main feature-branch --show-ddl
# WARNING: Destructive operations detected:
# - DROP COLUMN: users.legacy_field
# - ALTER COLUMN TYPE: orders.amount (numeric -> integer, may lose precision)
# - DROP INDEX: idx_users_legacy
#
# Use --force to generate DDL anyway
Destructive Operation Types:
| Operation | Risk Level | Description |
|---|---|---|
| DROP COLUMN | High | Data permanently lost |
| DROP TABLE | Critical | Table and all data lost |
| ALTER COLUMN TYPE | Medium | May fail or lose precision |
| DROP CONSTRAINT | Low | Constraint removed |
| DROP INDEX | Low | Index removed (can rebuild) |
| DROP TYPE | Medium | Enum type removed |
| TRUNCATE TABLE | Critical | All data removed |
Lock Timeout Configuration
Prevent long-running locks on production tables:
# Set lock timeout (default: 30 seconds)
epoch schema apply users --from feature-branch --lock-timeout 10s
# Set statement timeout (default: 5 minutes)
epoch schema apply users --from feature-branch --statement-timeout 2m
Retry Logic
Configure automatic retries for transient failures:
# Retry up to 5 times with exponential backoff
epoch schema apply users --from feature-branch --retries 5
# Configure backoff
epoch schema apply users --from feature-branch \
--retries 5 \
--initial-delay 500ms \
--max-delay 30s
Integration with External Tools
pt-online-schema-change (Percona Toolkit for MySQL)
While Horizon Epoch primarily targets PostgreSQL, you can generate compatible scripts for MySQL migrations:
# Generate MySQL-compatible DDL
epoch schema diff main feature-branch --dialect mysql --ddl-output migration.sql
# Use with pt-online-schema-change
pt-online-schema-change \
--execute \
--alter "ADD COLUMN preferences JSON" \
D=mydb,t=users
Recommended workflow:
- Generate migration with Epoch
- Extract ALTER TABLE statements
- Apply using pt-online-schema-change for large tables
- Update Epoch commit after successful application
# After applying via pt-osc
epoch commit -m "Applied preferences column via pt-osc" --rescan-schema
pg_repack (PostgreSQL)
For table reorganization and bloat reduction:
# Identify tables needing repack
epoch schema status --check-bloat
# Output:
# Table | Size | Dead Tuples | Bloat % | Recommendation
# -----------|---------|-------------|---------|----------------
# users | 2.3 GB | 150,000 | 15% | Consider pg_repack
# orders | 8.1 GB | 2,500,000 | 28% | Recommended
# Run pg_repack
pg_repack -d mydb -t users
# Rescan schema after repack
epoch commit -m "Reorganized users table" --rescan-schema
Using pg_repack with index changes:
# Generate index DDL
epoch schema diff main feature-branch --indexes-only --ddl-output indexes.sql
# Create indexes with pg_repack (minimal locking)
pg_repack -d mydb -t users --only-indexes
# Or use CONCURRENTLY option in Epoch
epoch schema apply users --from feature-branch --online
Flyway Integration
Export migrations for Flyway:
# Generate Flyway-compatible migration
epoch schema diff main feature-branch \
--format flyway \
--ddl-output db/migration/V2__add_preferences.sql
# Migration file follows Flyway naming convention
cat db/migration/V2__add_preferences.sql
Liquibase Integration
Export migrations for Liquibase:
# Generate Liquibase changeset
epoch schema diff main feature-branch \
--format liquibase \
--ddl-output db/changelog/changes/002-add-preferences.xml
Example output:
<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog">
<changeSet id="002-add-preferences" author="epoch">
<addColumn tableName="users">
<column name="preferences" type="JSONB"/>
</addColumn>
<createIndex indexName="idx_users_preferences" tableName="users">
<column name="preferences" type="gin"/>
</createIndex>
<rollback>
<dropIndex indexName="idx_users_preferences" tableName="users"/>
<dropColumn tableName="users" columnName="preferences"/>
</rollback>
</changeSet>
</databaseChangeLog>
Online Mode for Production
Using CONCURRENTLY
For indexes on large tables, use online mode to avoid blocking:
# Create indexes without blocking writes
epoch schema apply users --from feature-branch --online
# This generates:
# CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
#
# Instead of:
# CREATE INDEX idx_users_email ON users (email);
Limitations of CONCURRENTLY
- Cannot run inside a transaction
- May take longer than regular index creation
- Requires more resources (CPU, I/O)
- Index must be validated after creation
# Check index validity after CONCURRENTLY creation
epoch schema status users --check-indexes
# Output:
# Index | Valid | Size | Last Used
# -------------------------|-------|--------|----------
# idx_users_email | Yes | 45 MB | 2 min ago
# idx_users_preferences | No | 0 MB | Never
If an index is invalid:
# Reindex with CONCURRENTLY
epoch schema apply users --reindex idx_users_preferences --online
Transactional DDL
PostgreSQL supports transactional DDL, allowing atomic migrations:
# Apply migration in a transaction (default)
epoch schema apply users --from feature-branch
# Disable transaction (required for CONCURRENTLY)
epoch schema apply users --from feature-branch --no-transaction --online
What happens on failure:
# With transaction (default):
# - All statements run in one transaction
# - On failure, all changes rolled back
# - Database remains unchanged
# Without transaction (--no-transaction):
# - Statements run individually
# - On failure, partial changes may remain
# - Manual cleanup may be needed
Table Size Considerations
Small Tables (< 100MB)
Standard migration is safe:
epoch schema apply users --from feature-branch
Medium Tables (100MB - 1GB)
Use lock timeouts:
epoch schema apply users --from feature-branch \
--lock-timeout 10s \
--retries 3
Large Tables (1GB - 10GB)
Use online mode:
epoch schema apply users --from feature-branch \
--online \
--statement-timeout 30m
Very Large Tables (> 10GB)
Consider specialized tools:
# 1. Generate DDL
epoch schema diff main feature-branch --ddl-output migration.sql
# 2. Review for large table operations
epoch schema diff main feature-branch --analyze-impact
# 3. Use pg_repack or manual process
pg_repack -d mydb -t users --no-superuser-check
Python SDK Usage
import asyncio
from horizon_epoch import Client
async def generate_migration():
async with Client.connect("postgresql://localhost/horizon_epoch") as client:
# Get constraint diff between branches
diff = await client.diff_constraints("main", "feature-branch", table_name="users")
# Generate migration
generator = PostgresDdlGenerator()
migration = generator.generate_migration("users", diff.constraints)
# Print statements
print("UP statements:")
for stmt in migration.up:
print(f" {stmt};")
print("\nDOWN statements:")
for stmt in migration.down:
print(f" {stmt};")
# Check for warnings
if migration.warnings:
print("\nWARNINGS:")
for warn in migration.warnings:
print(f" - {warn}")
# Write to file
migration.write_to_file("migrations/V002__update_users.sql")
# Execute with safety features
from horizon_epoch.ddl import DdlExecutor, ExecutionConfig
executor = DdlExecutor(client.connection)
config = ExecutionConfig(
dry_run=True,
lock_timeout_seconds=10,
statement_timeout_seconds=300,
online=True,
)
result = executor.execute(migration.up, config)
print(f"Executed {result.statements_run} statements")
Best Practices
1. Always Use Dry-Run First
epoch schema apply users --from feature-branch --dry-run
2. Backup Before Destructive Changes
# Export table data
epoch export users --format parquet > backup/users_$(date +%Y%m%d).parquet
# Then apply
epoch schema apply users --from feature-branch --force
3. Use Online Mode for Production
epoch schema apply users --from feature-branch --online
4. Set Appropriate Timeouts
epoch schema apply users --from feature-branch \
--lock-timeout 10s \
--statement-timeout 5m
5. Monitor Long-Running Migrations
# In another terminal
epoch schema apply --status
# Or query PostgreSQL directly
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE query LIKE 'ALTER%' OR query LIKE 'CREATE INDEX%';
See Also
- Schema Versioning - Schema versioning workflows
- Constraint Reference - Constraint types
- CLI Reference - Schema commands