opencode-workflow/skills/migration-rollout-design/SKILL.md

6.4 KiB

name description
migration-rollout-design Knowledge contract for migration and rollout design. Provides principles and patterns for backward compatibility, rollout strategies, canary deployments, feature flags, schema evolution, and rollback. Referenced by design-architecture when defining migration and rollout strategy.

This is a knowledge contract, not a workflow skill. It provides theoretical guidance that the Architect references when designing migration and rollout strategies. It does not produce artifacts directly.

Core Principles

Backward Compatibility First

  • New versions must coexist with old versions during migration
  • APIs must be backward-compatible until all consumers have migrated
  • Database schemas must support both old and new code during migration
  • Never break existing functionality during migration

Incremental Over Big-Bang

  • Migrate incrementally, one step at a time
  • Each step must be independently deployable and reversible
  • Test each step before proceeding to the next
  • Big-bang migrations have higher risk and harder rollback

Rollback by Default

  • Every migration step must have a clear rollback plan
  • Practice rollback before you need it
  • Automated rollback is preferred over manual rollback
  • Feature flags enable instant rollback without deployment

Rollout Strategies

Blue-Green Deployment

  • Maintain two identical environments (blue and green)
  • Deploy new version to the inactive environment
  • Switch traffic from active to inactive environment
  • If issues are detected, switch traffic back
  • Best for: Infrastructure-level deployments with full environment replication

Canary Deployment

  • Deploy new version to a small percentage of traffic (1%, 5%, 10%, 25%, 50%, 100%)
  • Monitor metrics at each stage before increasing traffic
  • If issues are detected, shift traffic back to the old version
  • Best for: Application-level deployments where you want to test with real traffic gradually

Rolling Deployment

  • Deploy new version to instances one at a time (or in small batches)
  • Old and new versions run side by side during the rollout
  • If issues are detected, stop the rollout and roll back the updated instances
  • Best for: Stateless services where instances can be updated independently

Feature Flag Deployment

  • Deploy new code with features disabled (feature flags set to false)
  • Enable features gradually using feature flags
  • Can enable per-user, per-tenant, per-percentage
  • If issues are detected, disable the feature flag instantly
  • Best for: Feature-level deployments where you want to decouple code deployment from feature release

Feature Flags

Types of Feature Flags

  • Release flags: Enable/disable new features during rollout (short-lived)
  • Operational flags: Enable/disable operational features (circuit breakers, maintenance mode)
  • Experiment flags: A/B testing and gradual rollout (medium-lived)
  • Permission flags: Enable features for specific users/tenants (long-lived)

Design Considerations

  • Feature flags must not add significant latency (evaluate quickly)
  • Feature flag evaluation must be consistent within a request (don't re-evaluate mid-request)
  • Feature flags must have a defined lifecycle: create, enable, monitor, remove
  • Remove feature flags after full rollout to prevent technical debt
  • Use a feature flag management service (not hardcoded flags)
  • Log feature flag evaluations for debugging

Feature Flag Rollout

  • Start with 0% (flag off)
  • Enable for internal users (dogfood)
  • Enable for a small percentage of users (canary)
  • Enable for all users (full rollout)
  • Monitor metrics at each stage
  • Remove the flag after full rollout

Schema Evolution

Additive Changes (Safe)

  • Add a new column with a default value
  • Add a new table
  • Add a new index (with caution for large tables)
  • Add a new optional field to an API response
  • Add a new API endpoint

Destructive Changes (Require Migration)

  • Remove a column (requires migration)
  • Rename a column (requires migration)
  • Change a column type (requires migration)
  • Remove a table (requires migration)
  • Remove an API endpoint (requires consumer migration)

Migration Strategy for Destructive Changes

  1. Expand: Add the new structure alongside the old (both exist)
  2. Migrate: Migrate data and code to use the new structure (both exist)
  3. Contract: Remove the old structure (only new exists)

Example: Renaming a column

  1. Add new column, keep old column, dual-write to both
  2. Migrate existing data from old to new column
  3. Update all reads to use new column
  4. Remove old column

Database Migration Best Practices

  • Every migration must be reversible (up and down migration)
  • Test migrations against production-like data volumes
  • Run migrations in a transaction when possible
  • For large tables, use online schema change tools (pt-online-schema-change, gh-ost)
  • Never lock a production table for more than seconds during a migration

Rollback

Application Rollback

  • Revert to previous deployment version
  • Feature flag disable (instant, no deployment needed)
  • Blue-green switch (instant, requires environment)
  • Canary shift-back (requires redirecting traffic)
  • Rolling redeploy of previous version (requires new deployment)

Database Rollback

  • Run the down migration (reverse of up migration)
  • Restore from backup (for destructive changes without down migration)
  • Feature flag to disable new code that uses new schema (code rollback, schema stays)

Rollback Decision Matrix

What Failed Rollback Method Data Loss Risk
Application bug Deploy previous version None
Feature bug Disable feature flag None
Schema migration bug Run down migration Low if reversible
Data migration bug Restore from backup High if not reversible
Integration failure Circuit breaker / fallback None

Anti-Patterns

  • Big-bang migration: Migrating everything at once has high risk and hard rollback
  • Breaking API changes without versioning: Old clients will break
  • Schema migration without backward compatibility: Old code will fail against new schema
  • Deploying without feature flags: Can't instantly rollback if issues are detected
  • Not testing rollback: Rollback must be tested before you need it
  • Removing old code before consumers have migrated: Premature removal breaks dependencies
  • Not monitoring during rollout: Issues must be detected quickly to prevent wider impact