Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .github/workflows/skill-review.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: Skill Review
on:
pull_request:
paths: ['**/SKILL.md']
jobs:
review:
runs-on: ubuntu-latest
permissions:
pull-requests: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: tesslio/skill-review@22e928dd837202b2b1d1397e0114c92e0fae5ead # main
27 changes: 0 additions & 27 deletions skills/commit-forge/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,30 +121,3 @@ git show --stat

Confirm the commit looks right.

## What Makes a Good Commit

| Good | Bad |
|------|-----|
| One logical change | Multiple unrelated changes |
| All tests pass | Breaks something |
| Message says why | Message says "update stuff" |
| Can be reverted cleanly | Reverting would break other things |
| Specific files staged | `git add .` |

## What Makes a Bad Commit Message

- "fix stuff" / "update" / "changes" / "WIP"
- Using "and" to describe two unrelated things
- Describing how instead of what/why
- Over 72 characters in the subject line

## Checklist

Before committing:

- [ ] Changes are atomic (one logical unit)
- [ ] Specific files staged (not `git add .`)
- [ ] Tests pass (if applicable)
- [ ] Message follows `type(scope): description` format
- [ ] No co-author tags
- [ ] No sensitive files staged (.env, credentials, keys)
44 changes: 44 additions & 0 deletions skills/model-forge/CODEX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Codex Reference

Detailed guidance for using gpt-5.4 (codex) effectively in a claude + codex hybrid workflow.

## Failure Modes

Codex is **too literal**. It will fulfill the exact request and break adjacent things. Real example from HN: asked to "fix compiler warnings", it made a bunch of values nullable to silence the warnings - technically correct, broke data integrity downstream.

**Codex will fail at (NEVER use it for):**
- **UI work of any kind** - layouts, components, styling, UX, design judgment. Codex has zero taste and no eye for visual hierarchy. Always claude.
- **Anything you need to discuss** - "yo i need to talk about this", brainstorming, exploration, "what do you think", trade-off conversations. Codex is a one-shot executor, not a collaborator. Always claude.
- **Reading user intent** - "make it better" β†’ won't make leaps, will pick the most literal interpretation
- **Architecture/product judgment** - no sense of tradeoffs beyond what's written
- **Obscure libraries** - hallucinates rather than admitting unknown
- **Multi-file dependency chains** - gets stuck in circles
- **Continuous conversation context** - context degrades fast, not built for back-and-forth

**Codex will excel at:**
- Tasks where being literal is a feature (test writing, mechanical refactors)
- Clearly-bounded specs with acceptance criteria
- Adversarial passes with explicit focus arguments

## Prompt Requirements

- One concern per prompt (don't bundle)
- Explicit scope boundaries (which files, what behavior, what done looks like)
- Structured sections: General β†’ Autonomy β†’ Code Implementation β†’ Editing Constraints β†’ Exploration β†’ Plan Tool
- Use the positional focus argument: `/codex:adversarial-review challenge whether this caching design was right`
- Heavy XML structure beats prose for multi-step specs

## Optimal Workflow

The pattern that works (claude + codex hybrid):

1. **Claude builds** - architecture, exploration, back-and-forth iteration
2. **Codex reviews** - dispatch `/codex:adversarial-review` with a specific focus after a major change. 6-10 min wait, actionable findings
3. **Claude filters** - triage codex output: "which are real issues vs noise, which need design decisions"
4. **Codex fixes** - confirmed bugs with clear specs get delegated back to codex
5. **Batch maintenance** - queue codex with 3-5 P2 tasks at session start, work on real stuff while it churns

What doesn't work:
- Letting codex drive architecture decisions
- Vague task descriptions
- Auto-running codex on every claude response (drains usage fast)
56 changes: 7 additions & 49 deletions skills/model-forge/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,58 +144,16 @@ Adversarial engineer with different priors than claude. Mechanically excellent o
| bug hunt | codex | worktree | no | yes |
| security audit | codex | feat | no | yes |

## Codex-Specific Failure Modes

Codex is **too literal**. It will fulfill the exact request and break adjacent things. Real example from HN: asked to "fix compiler warnings", it made a bunch of values nullable to silence the warnings - technically correct, broke data integrity downstream.

**Codex will fail at (NEVER use it for):**
- **UI work of any kind** - layouts, components, styling, UX, design judgment. Codex has zero taste and no eye for visual hierarchy. Always claude.
- **Anything you need to discuss** - "yo i need to talk about this", brainstorming, exploration, "what do you think", trade-off conversations. Codex is a one-shot executor, not a collaborator. Always claude.
- **Reading user intent** - "make it better" β†’ won't make leaps, will pick the most literal interpretation
- **Architecture/product judgment** - no sense of tradeoffs beyond what's written
- **Obscure libraries** - hallucinates rather than admitting unknown
- **Multi-file dependency chains** - gets stuck in circles
- **Continuous conversation context** - context degrades fast, not built for back-and-forth

**Codex will excel at:**
- Tasks where being literal is a feature (test writing, mechanical refactors)
- Clearly-bounded specs with acceptance criteria
- Adversarial passes with explicit focus arguments

**Prompt requirements for codex:**
- One concern per prompt (don't bundle)
- Explicit scope boundaries (which files, what behavior, what done looks like)
- Structured sections: General β†’ Autonomy β†’ Code Implementation β†’ Editing Constraints β†’ Exploration β†’ Plan Tool
- Use the positional focus argument: `/codex:adversarial-review challenge whether this caching design was right`
- Heavy XML structure beats prose for multi-step specs

## Optimal Codex Workflow

The pattern that works (claude + codex hybrid):

1. **Claude builds** - architecture, exploration, back-and-forth iteration
2. **Codex reviews** - dispatch `/codex:adversarial-review` with a specific focus after a major change. 6-10 min wait, actionable findings
3. **Claude filters** - triage codex output: "which are real issues vs noise, which need design decisions"
4. **Codex fixes** - confirmed bugs with clear specs get delegated back to codex
5. **Batch maintenance** - queue codex with 3-5 P2 tasks at session start, work on real stuff while it churns

What doesn't work:
- Letting codex drive architecture decisions
- Vague task descriptions
- Auto-running codex on every claude response (drains usage fast)
## Codex Reference

For codex failure modes, prompt requirements, and the optimal claude + codex hybrid workflow, read `CODEX.md`.

## Anti-patterns

1. **Codex for time-sensitive work** - 10+ min latency kills interactive flow
2. **Codex with vague prompts** - it will interpret literally and break things
3. **Opus for trivial tool calls** - haiku does it at 5% cost
4. **Worktrees for non-overlapping single-agent work** - merge overhead with no upside
5. **Big refactors on main** - hard rollback, polluted history
6. **Parallel agents on same files without worktrees** - concurrent write conflicts
7. **Dispatching codex synchronously** - always background, never wait
8. **Forgetting `--model gpt-5.4`** - default selection is unreliable
9. **Codex for UI work** - zero taste, no visual judgment, will produce bricks
10. **Codex when you need to discuss/brainstorm** - it's a one-shot executor, not a collaborator
1. **Opus for trivial tool calls** - haiku does it at 5% cost
2. **Worktrees for non-overlapping single-agent work** - merge overhead with no upside
3. **Big refactors on main** - hard rollback, polluted history
4. **Parallel agents on same files without worktrees** - concurrent write conflicts

## Examples

Expand Down
34 changes: 12 additions & 22 deletions skills/prompt-forge/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: prompt-forge
description: Universal prompt engineering guide for writing, reviewing, and optimizing LLM prompts across Claude and OpenAI models. Use when writing system prompts, designing extraction pipelines, building classification or summarization prompts, optimizing for cost/latency, reviewing existing prompts for quality, or any task involving prompt design for production AI systems. Trigger on keywords like "prompt", "system prompt", "few-shot", "extraction prompt", "prompt engineering", "prompt review", or when the user is building any AI-powered feature that needs a well-crafted prompt.
description: "Write, review, and optimize LLM prompts across Claude and OpenAI models - design extraction pipelines, build classification or summarization prompts, optimize for cost/latency, and audit existing prompts for quality. Use when writing system prompts, designing extraction pipelines, building classification or summarization prompts, or reviewing prompts. Trigger on 'prompt', 'system prompt', 'few-shot', 'extraction prompt', 'prompt engineering', 'prompt review'. NOT for general AI application architecture or LLM integration plumbing."
---

# Prompt Forge
Expand All @@ -13,17 +13,17 @@ For SDK code examples and implementation patterns, read `references/code-pattern

1. **Be Explicit, Not Implicit.** Treat every prompt like onboarding a new hire - spell out role, task, constraints, output format, and edge cases.

2. **Structure Beats Prose.** Structured prompts with clear sections outperform wall-of-text instructions. Use XML tags for Claude, markdown headers or XML for GPT.
2. **Structure Beats Prose.** Use XML tags for Claude, markdown headers or XML for GPT.

3. **Show, Don't Just Tell.** Few-shot examples are the single highest-leverage technique. 3-5 examples covering happy paths and edge cases. With prompt caching, afford 20+.
3. **Show, Don't Just Tell.** 3-5 few-shot examples covering happy paths and edge cases. With prompt caching, afford 20+.

4. **Constrain the Output Space.** Define exactly what success looks like. Schemas, templates, or format specs. Tighter output contract = more reliable results.

5. **Null Over Hallucination.** For extraction tasks, always instruct the model to return null for missing fields rather than guessing.

6. **Positive Instructions Over Negative.** "Write in plain prose paragraphs" beats "Don't use markdown". Tell the model what TO do.

7. **Order Matters.** Place long documents ABOVE instructions. Place critical instructions at beginning and end (primacy and recency effects).
7. **Order Matters.** Place long documents ABOVE instructions. Place critical instructions at beginning and end.

## Prompt Section Ordering

Expand All @@ -43,12 +43,6 @@ System prompts are not monolithic strings. They are ordered arrays of sections.
10. **User instructions** - user-provided rules (config files, overrides)
11. **Memory** - persistent cross-session context

**Why this order works:**
- Static sections first = cacheable prefix (cache matches prefixes)
- Identity and rules before tools = model internalizes constraints before seeing capabilities
- User instructions AFTER defaults but marked as overrides = user can override any default
- Dynamic sections last = only the tail changes between turns, maximizing cache hits

**User instruction override pattern:**
```
User instructions are shown below. Be sure to adhere to these instructions.
Expand Down Expand Up @@ -131,18 +125,6 @@ Preprocess inputs: strip signatures, disclaimers, HTML, whitespace. Set max_char
### Code Generation
Role + `<conventions>` (existing patterns, stack, style) + `<rules>` (scope tightly, error handling, follow patterns) + `<context>` (relevant existing code).

### Multi-Step Reasoning (ReAct)
```
For each step:
1. THOUGHT: reason about what information you need
2. ACTION: call the appropriate tool
3. OBSERVATION: analyze the result
4. Repeat until you have enough to answer
5. ANSWER: provide the final response
```

Use Claude's extended thinking or GPT's reasoning_effort for complex reasoning rather than forcing CoT when the model natively supports it.

### Agent Delegation
Subagents should NOT inherit the full parent prompt. Strip to: identity, task scope, constraints, environment. For full patterns, read `references/agentic-patterns.md`.

Expand Down Expand Up @@ -192,6 +174,14 @@ Build confidence tracking into schemas: per-field confidence (HIGH/MEDIUM/LOW/MI

- **Batch API:** 50% discount for non-realtime workloads (both providers)

## Development Workflow

1. **Draft** - define role, rules, output format, and examples using the patterns above
2. **Test** - run against 5+ real inputs covering happy path, edge cases, and malformed input
3. **Evaluate** - score outputs against expected results, identify weak fields or failure modes
4. **Iterate** - add examples targeting weak spots, tighten constraints, adjust model tier
5. **Ship** - enable caching, configure model tiering, set up monitoring

## Prompt Checklist

### Structure
Expand Down
16 changes: 3 additions & 13 deletions skills/readme-forge/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: readme-forge
description: Generate a README in alxx's personal style. Use when user says "write a README", "create README", "readme for this project", "update README", or needs a repo README.
description: "Generate a README with project overview, quickstart, features, and architecture diagram in alxx's personal style. Use when user says 'write a README', 'create README', 'readme for this project', 'update README', or needs a repo README."
disable-model-invocation: false
argument-hint: "[repo path or leave blank for current]"
---
Expand Down Expand Up @@ -112,22 +112,12 @@ One line linking to LICENSE file. e.g. `[MIT](LICENSE)`

These are non-negotiable:

- **h1 or banner** - either `# Name` or full-width banner image with `---` separator
- **Bold tagline** directly under h1/banner, no gap
- **Overview** - first section is always Overview. use πŸš€ emoji only when banner is present, plain `## Overview` otherwise
- **Blockquote pitch** for the audience, in or after overview
- **Bold analogy** in overview (comparing to something familiar)
- **Bold italic** (`***text***`) for punchy one-liners
- **Bold feature names** with dash separator in features list
- **Mermaid diagram** - always include one, flowchart LR preferred
- **Quickstart must be copy-paste** - someone should be able to run every command in order
- **FAQ is optional** - rarely used, only include for projects with common objections (e.g. "why not X?"). if something needs explaining, prefer putting it in the relevant section
- **No badges** unless the project has CI set up
- **Bold key phrases in longer text** - in paragraphs, `**bold**` the important terms/concepts so readers can scan (e.g. "Nebula is **Git for agent context & tasks**")
- **Bold key phrases in longer text** - `**bold**` important terms so readers can scan
- **Dense, no fluff** - every sentence carries information
- **No "Getting Started" header** - use "Quickstart" for apps/CLIs, "Install" for packages/skills/libraries
- **No separate "Installation" header** - fold into Quickstart or Install
- **No "Usage" header** - fold into Quickstart or Features
- **No separate "Installation" or "Usage" headers** - fold into Quickstart or Features
- **No long dashes** - never use "β€”" (em dash), use "-" or commas instead
- **No AI slop** - no "furthermore", "it's worth noting", "comprehensive"
- **Casual but professional** - not corporate, not too informal for a repo README
Expand Down
14 changes: 1 addition & 13 deletions skills/skill-forge/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: skill-forge
description: Write or improve an LLM agent skill. Use when user says "create skill", "write a skill", "improve this skill", "new skill", "skill for X", or wants to build a SKILL.md file.
description: "Write or improve an LLM agent skill - generate YAML frontmatter, define trigger conditions, structure step-by-step instructions, and validate against best practices. Use when user says 'create skill', 'write a skill', 'improve this skill', 'new skill', 'skill for X', or wants to build a SKILL.md file."
disable-model-invocation: true
argument-hint: "[skill name or path to existing SKILL.md]"
---
Expand Down Expand Up @@ -71,8 +71,6 @@ description: ... # THE most important field. See below.

## Step 4: Write the Description

The description is what Claude pattern-matches against to decide whether to load the skill. There is NO algorithmic routing - Claude's language model reads all descriptions and semantically matches.

### Formula

1. **What it does** - one sentence, outcome-first
Expand Down Expand Up @@ -165,13 +163,3 @@ Run this checklist on the generated skill:
- [ ] Steps are numbered when order matters
- [ ] Supporting files referenced if body would exceed 500 lines

## Common Mistakes

1. **Vague descriptions** - "Use when user wants to plan" triggers on everything
2. **Missing `disable-model-invocation: true` on side-effect skills** - Claude will auto-commit, auto-deploy
3. **Body too long** - 500+ lines burns context. Split into supporting files.
4. **Redundant description + body** - description matches, body instructs. Don't duplicate.
5. **`context: fork` on reference content** - subagent gets instructions with no actionable task
6. **Over-explaining in description** - the description isn't the instructions
7. **Missing NOT conditions** - for behavioral modes, add explicit "NEVER activate unless..."
8. **`user-invocable: false` on things users should manually trigger** - confusing