opencode-workflow/skills/data-modeling/SKILL.md

4.8 KiB

name description
data-modeling Knowledge contract for defining database schemas, partition keys, indexes, query patterns, denormalization strategy, TTL/caching, and data ownership. Referenced by design-architecture when designing data models.

This is a knowledge contract, not a workflow skill. It is referenced by design-architecture when the architect is designing database schemas and data models.

Core Principles

  • Data models must be driven by query and write patterns, not theoretical purity
  • Each table or collection must serve a clear purpose traced to PRD requirements
  • Indexes must be justified by identified query patterns
  • Data ownership must be unambiguous: each data item belongs to exactly one service

Table Definitions

For each table or collection, define:

  • Table name and purpose (traced to PRD requirement)
  • Column definitions:
    • Name
    • Data type
    • Nullable or not null
    • Default value (if any)
    • Constraints (unique, check, etc.)
  • Primary key
  • Foreign keys and relationships
  • Data volume estimates (when relevant for storage selection)

Index Design

Indexes must be justified by query patterns:

  • Identify the queries this table must support
  • Design indexes to cover those queries
  • Avoid speculative indexes "just in case"
  • Consider write amplification: every index slows writes

Index justification format:

  • Index name
  • Columns (with sort direction)
  • Type (unique, non-unique, partial, composite)
  • Query pattern it serves
  • Estimated selectivity

Partition Keys

When designing distributed data stores:

  • Partition key must distribute data evenly across nodes
  • Partition key should align with the most common access pattern
  • Consider hot partition risks
  • Define partition strategy (hash, range, composite)

Relationships

Define relationships explicitly:

  • One-to-one
  • One-to-many (with foreign key placement)
  • Many-to-many (with junction table)

For each relationship:

  • Direction of access (which side queries the other)
  • Cardinality (exactly N, at most N, unbounded)
  • Nullability (is the relationship optional?)
  • Cascade behavior (what happens on delete?)

Denormalization Strategy

Denormalize when:

  • A query needs data from multiple entities and joins are expensive or unavailable
  • Read frequency significantly exceeds write frequency
  • The denormalized data has a clear source of truth that can be kept in sync

Do not denormalize when:

  • The data changes frequently and consistency is critical
  • Joins are cheap and the data store supports them well
  • The denormalization creates complex synchronization logic
  • There is no clear source of truth

For each denormalized field:

  • Identify the source of truth
  • Define the synchronization mechanism (eventual consistency, sync on read, sync on write)
  • Define the staleness tolerance

TTL and Caching

TTL (Time-To-Live)

Define TTL for:

  • Ephemeral data (sessions, temporary tokens, idempotency keys)
  • Time-bounded data (logs, analytics, expired records)
  • Data that must be purged after a regulatory period

For each TTL:

  • Duration and basis (absolute time, sliding window, last access)
  • Action on expiration (delete, archive, revoke)

Caching

Define caching for:

  • Frequently read, rarely written data
  • Computed aggregates that are expensive to recalculate
  • Data that is accessed across service boundaries

For each cache:

  • Cache type (in-process, distributed, CDN)
  • Invalidation strategy (TTL-based, event-based, write-through)
  • Staleness tolerance
  • Cache miss behavior (stale-while-recompute, block-and-fetch)

Data Ownership

Each piece of data must have exactly one owner:

  • The owning service is the single source of truth
  • Other services access that data via the owner's API or events
  • No service reads directly from another service's data store
  • If data is needed in multiple places, replicate via events with a clear source of truth

Data ownership format:

Data Item Owning Service Access Pattern Replication Strategy
... ... ... ...

Query Pattern Analysis

For each table, document:

  • Primary query patterns (by which columns/keys is data accessed)
  • Write patterns (insert-heavy, update-heavy, or mixed)
  • Read-to-write ratio (when relevant)
  • Consistency requirements (strong, eventual, or tunable)
  • Scale expectations (rows per day, rows total, growth rate)

This analysis drives:

  • Index selection
  • Partition key selection
  • Storage engine selection
  • Denormalization decisions

Anti-Patterns

  • Tables without a clear PRD requirement
  • Indexes without a documented query pattern
  • Shared tables across service boundaries
  • Premature denormalization without a read/write justification
  • Missing foreign key constraints where referential integrity is required
  • Data models that assume a specific storage engine without justification