opencode-workflow/SKILL.md at main

4.8 KiB

Raw Permalink Blame History

name	description
data-modeling	Knowledge contract for defining database schemas, partition keys, indexes, query patterns, denormalization strategy, TTL/caching, and data ownership. Referenced by design-architecture when designing data models.

This is a knowledge contract, not a workflow skill. It is referenced by design-architecture when the architect is designing database schemas and data models.

Core Principles

Data models must be driven by query and write patterns, not theoretical purity
Each table or collection must serve a clear purpose traced to PRD requirements
Indexes must be justified by identified query patterns
Data ownership must be unambiguous: each data item belongs to exactly one service

Table Definitions

For each table or collection, define:

Table name and purpose (traced to PRD requirement)
Column definitions:
- Name
- Data type
- Nullable or not null
- Default value (if any)
- Constraints (unique, check, etc.)
Primary key
Foreign keys and relationships
Data volume estimates (when relevant for storage selection)

Index Design

Indexes must be justified by query patterns:

Identify the queries this table must support
Design indexes to cover those queries
Avoid speculative indexes "just in case"
Consider write amplification: every index slows writes

Index justification format:

Index name
Columns (with sort direction)
Type (unique, non-unique, partial, composite)
Query pattern it serves
Estimated selectivity

Partition Keys

When designing distributed data stores:

Partition key must distribute data evenly across nodes
Partition key should align with the most common access pattern
Consider hot partition risks
Define partition strategy (hash, range, composite)

Relationships

Define relationships explicitly:

One-to-one
One-to-many (with foreign key placement)
Many-to-many (with junction table)

For each relationship:

Direction of access (which side queries the other)
Cardinality (exactly N, at most N, unbounded)
Nullability (is the relationship optional?)
Cascade behavior (what happens on delete?)

Denormalization Strategy

Denormalize when:

A query needs data from multiple entities and joins are expensive or unavailable
Read frequency significantly exceeds write frequency
The denormalized data has a clear source of truth that can be kept in sync

Do not denormalize when:

The data changes frequently and consistency is critical
Joins are cheap and the data store supports them well
The denormalization creates complex synchronization logic
There is no clear source of truth

For each denormalized field:

Identify the source of truth
Define the synchronization mechanism (eventual consistency, sync on read, sync on write)
Define the staleness tolerance

TTL and Caching

TTL (Time-To-Live)

Define TTL for:

Ephemeral data (sessions, temporary tokens, idempotency keys)
Time-bounded data (logs, analytics, expired records)
Data that must be purged after a regulatory period

For each TTL:

Duration and basis (absolute time, sliding window, last access)
Action on expiration (delete, archive, revoke)

Caching

Define caching for:

Frequently read, rarely written data
Computed aggregates that are expensive to recalculate
Data that is accessed across service boundaries

For each cache:

Cache type (in-process, distributed, CDN)
Invalidation strategy (TTL-based, event-based, write-through)
Staleness tolerance
Cache miss behavior (stale-while-recompute, block-and-fetch)

Data Ownership

Each piece of data must have exactly one owner:

The owning service is the single source of truth
Other services access that data via the owner's API or events
No service reads directly from another service's data store
If data is needed in multiple places, replicate via events with a clear source of truth

Data ownership format:

Data Item	Owning Service	Access Pattern	Replication Strategy
...	...	...	...

Query Pattern Analysis

For each table, document:

Primary query patterns (by which columns/keys is data accessed)
Write patterns (insert-heavy, update-heavy, or mixed)
Read-to-write ratio (when relevant)
Consistency requirements (strong, eventual, or tunable)
Scale expectations (rows per day, rows total, growth rate)

This analysis drives:

Index selection
Partition key selection
Storage engine selection
Denormalization decisions

Anti-Patterns

Tables without a clear PRD requirement
Indexes without a documented query pattern
Shared tables across service boundaries
Premature denormalization without a read/write justification
Missing foreign key constraints where referential integrity is required
Data models that assume a specific storage engine without justification

4.8 KiB Raw Permalink Blame History