A curated collection of agents and skills for AI coding assistants. Plug them into Claude Code, Codex, or any agent that supports the .claude/ skill convention and instantly level up your development workflow.
- Why CasStack?
- What's Inside
- Quick Start
- Agent Reference
- Skill Reference
- Architecture
- Configuration
- Creating Your Own Skills
- Contributing
- License
AI coding assistants are powerful out of the box, but they lack procedural memory -- the step-by-step knowledge of how to do specialized tasks reliably. Agentarium fills that gap with:
- Battle-tested workflows -- multi-phase implementation, review, and deployment pipelines that enforce quality gates
- Security-first defaults -- automated secret scanning and sensitive file protection on every push
- Composable pipelines -- chain skills together with
skill-graphto orchestrate complex multi-step tasks - Self-improving skills --
capture-learningsandskill-creator-v2create a feedback loop where your skills get better over time
Agents are autonomous sub-processes that handle complex, multi-step tasks. They run in isolated contexts and return structured results.
| Agent | Model | Description |
|---|---|---|
| code-implementation | Opus | Plans, proposes, and implements code with approval gates and sub-agent delegation |
| code-reviewer | Inherit | Reviews completed work against plans, standards, and architectural patterns |
| integration-test-validator | Sonnet | Three-tier testing (unit, integration, system) with structured pass/fail reports |
| security-scanner | Inherit | Six-phase security audit covering secrets, OWASP, dependencies, and guardrails |
Skills are modular instruction sets that guide the agent through specialized workflows. They load on-demand and stay lean.
| Skill | Description |
|---|---|
| code-implementation | Full-stack feature implementation with TDD, planning, and code review |
| gitpush | Safe push workflow with repo/branch confirmation, secret scanning, and deploy options |
| skill-creator | Step-by-step guide for building new skills with scripts, references, and assets |
| skill-creator-v2 | Benchmark-driven skill creation with A/B testing via isolated sub-agents |
| skill-graph | Chain multiple skills into a Mermaid-rendered pipeline with approval gates |
| skill-guard | Intercept skill installs to detect overlap; audit existing skills for redundancy |
| capture-learnings | Extract bugs, gotchas, patterns, and decisions from sessions into learnings.md |
| find-skills | Discover and install skills from the open ecosystem via npx skills |
| screen-recording | Automated browser or Mac app screen capture with post-processing via Remotion |
git clone https://github.com/ashcastelinocs124/Agentarium.git && cd Agentarium && bash setup.shThe setup script will:
- Detect your agent (Claude Code or Codex)
- Ask where to install -- global (
~/.claude/skills/), project-local (./.claude/skills/), or custom path - Collect your GitHub identity for commit attribution in the
gitpushskill - Copy and personalize all skills and agents to your chosen directory
If you prefer to pick and choose:
# Clone the repo
git clone https://github.com/ashcastelinocs124/Agentarium.git
cd Agentarium
# Copy a single skill
cp -r skills/gitpush ~/.claude/skills/
# Copy a single agent
cp agents/code-reviewer.md ~/.claude/agents/
# Copy everything
cp -r skills/* ~/.claude/skills/
mkdir -p ~/.claude/agents && cp agents/*.md ~/.claude/agents/Model: Opus | Color: Red
An elite implementation engineer that follows a strict 7-phase workflow:
- Analysis & Planning -- break down requirements, identify components, consider edge cases
- Proposal & Suggestions -- present approach with design decisions, alternatives, and trade-offs
- Approval & Refinement -- wait for explicit user approval before coding
- Checklist Creation -- specific, measurable tasks with acceptance criteria
- Implementation -- work through checklist methodically; delegate heavy subtasks to sub-agents
- Quality Assurance -- comprehensive self-review for requirements, quality, and edge cases
- Completion & Documentation -- summary, usage examples, and next steps
Principles: SOLID, DRY, KISS, YAGNI, Separation of Concerns, Defensive Programming, Security First.
When to use: Feature development, refactoring, complex bug fixes, multi-file changes.
Model: Inherit
A senior code reviewer that validates completed work across five dimensions:
- Plan Alignment -- implementation vs. original plan; justified vs. problematic deviations
- Code Quality -- error handling, type safety, conventions, maintainability
- Architecture & Design -- SOLID principles, separation of concerns, scalability
- Documentation & Standards -- comments, file headers, project-specific conventions
- Issue Classification -- Critical (must fix), Important (should fix), Suggestions (nice to have)
When to use: After completing a major implementation step or feature milestone.
Model: Sonnet | Color: Blue
A test engineer that validates implementations through a three-tier testing methodology:
| Tier | Focus |
|---|---|
| Unit | Individual functions with valid inputs, edge cases, error handling, boundary conditions |
| Integration | Component interactions, data flow, API contracts, database operations, auth integration |
| System | End-to-end workflows, regression, concurrency, load scenarios, observability |
Output format: Structured report with total tests, pass/fail counts, severity-rated issues, regression check, and deployment recommendation.
When to use: After code review approval, before deployment.
Model: Inherit
A comprehensive security auditor that executes six scan phases:
| Phase | What It Checks |
|---|---|
| 1. Secrets Detection | API keys, tokens, passwords, private keys, connection strings, .env files |
| 2. Git Hygiene | .gitignore coverage, tracked sensitive files, exposed .git/ |
| 3. OWASP Patterns | Prompt injection, SQL/command injection, XSS, insecure deserialization, broken access control |
| 4. Config Security | YAML/JSON configs, env var handling, file permissions, bot intent declarations |
| 5. Guardrails Alignment | Input/output sanitization, RBAC, rate limiting, hardened system prompts |
| 6. Dependency Audit | Known CVEs, typosquatted packages, unnecessary dependencies |
Output format: Tabular report with severity levels (Critical/High/Medium/Low), guardrails status matrix, and push recommendation.
When to use: Before any git push, after major implementations, periodic security audits.
Trigger: /code-implementation "task description"
A full-stack implementation workflow with seven phases:
Phase 0: Architecture Context (if available)
Phase 1: Understand & Bound (requirements, affected files, frontend surface audit)
Phase 2: Plan (checklist-driven, with test cases identified upfront)
Phase 2.5: Approval Gate (for complex changes)
Phase 3: Implement (test-first TDD: red -> green -> refactor)
Phase 4: Verify (run all tests, check coverage >80%, frontend builds clean)
Phase 5: Code Review (invoke code-reviewer agent)
Phase 6: Summarize
Phase 7: Explain (on request)
Key principle: Every backend feature needs a frontend. Unless explicitly told otherwise, the skill assumes full-stack delivery including API layer, store, components, pages, and routing.
Trigger: /gitpush or "push my changes"
A safe push workflow with 7 blocking gates:
| Step | Gate | What Happens |
|---|---|---|
| 0 | Repo confirmation | Detects remote or lets you pick from gh repo list |
| 1 | Identity verification | Validates git config matches your stored GitHub identity |
| 2 | Branch selection | Choose current, main, or create new branch |
| 2.5 | Screen recording | Optional: record a demo and embed in README |
| 2.6 | README check | Create or update README before pushing |
| 3 | Sensitive file scan | Blocks .env, credentials, .claude/, plan files; auto-unstages .gitignore, memory.md, CLAUDE.md, learnings.md |
| 3.5 | Security scan | Launches the security-scanner agent; walks through each finding individually |
| 4 | Final confirmation | Shows repo, branch, files, commit message, and author for explicit approval |
| 5 | Execute | Commit and push only after "Yes, push it" |
| 6 | Deploy | Optional deploy to Vercel, Railway, GitHub Pages, Netlify, or Chrome Web Store |
Safety rules: Never force-push unless explicitly requested. Never push secrets. Always confirm repo and branch. Every security finding is presented individually with explain-why-it-matters descriptions.
Trigger: /skill-creator or "create a skill for X"
A six-step process for building new skills:
- Understand -- gather concrete usage examples through interview questions
- Plan -- identify reusable scripts, references, and assets
- Initialize -- run
scripts/init_skill.pyto scaffold the skill directory - Edit -- implement resources and write
SKILL.mdwith proper frontmatter - Package -- validate and bundle into a distributable
.skillfile viascripts/package_skill.py - Iterate -- refine based on real usage
Included scripts:
scripts/init_skill.py-- scaffolds a new skill directory with template filesscripts/package_skill.py-- validates and packages a skill for distributionscripts/quick_validate.py-- fast validation of skill structure and frontmatter
Trigger: /skill-creator-v2 or "create a skill" (advanced)
An enhanced skill creation system with two modes, selected via a mandatory prehook gate:
Simple Mode:
Prehook (3 questions) -> Focused Interview (1-2 rounds) -> Quick Research -> Build + Validate -> Package
Advanced Mode (with benchmarking):
Prehook -> Deep Interview (2-3 rounds) -> Research -> Build + Generate Evals
-> A/B Benchmark (parallel sub-agents: with-skill vs without-skill)
-> Grade (4 dimensions: correctness, completeness, quality, adherence)
-> HTML Comparison Viewer -> Iterate until satisfied -> Package
Included assets:
assets/comparison-template.html-- side-by-side benchmark comparison viewerreferences/eval-format.md-- test case format specificationreferences/skill-design-patterns.md-- established patterns for effective skillsreferences/subagent-prompts.md-- prompt templates for A/B benchmark agentsscripts/generate_comparison.py-- generates the HTML comparison reportscripts/open_viewer.py-- opens the comparison in the browser
Trigger: /skill-graph "task description" or "chain skills for X"
Orchestrates multiple skills into an ordered pipeline:
Phase 1: Scan -- discover all installed skills, filter by relevance (two-pass: frontmatter first, deep read for matches)
Phase 2: Classify & Connect -- assign each skill to a workflow phase and infer edges:
| Phase | Role | Examples |
|---|---|---|
| 0 | Explore | brainstorming, explain |
| 1 | Design | system-arch, debate, validation |
| 2 | Research | doc-search |
| 3 | Build | code-implementation, frontend-design |
| 4 | Verify | code-reviewer, integration-test-validator |
| 5 | Ship | gitpush, document-changes, linkedin-post |
Detects parallel branches, feedback loops (max 3 iterations before escalation), and circular dependencies.
Phase 3: Render & Approve -- generates a Mermaid diagram, summary table, and exclusion list; requires explicit approval before execution.
Phase 4: Execute -- runs skills in phase order with context passing, progress updates, and error handling.
Trigger: /skill-guard or "should I install X" or "audit my skills"
Two operational modes:
Gate Mode (before installing a skill):
- Build fingerprint index of all installed skills (~2 lines each)
- Compare candidate against index using trigger/description overlap signals
- Categorize: No Match -> install, Close Match -> spawn sub-agent for deep diff, Obvious Duplicate -> block
- Present overlap report with Merge / Install Both / Skip options per candidate
- Execute user decisions (merge deltas into existing skill, install alongside, or skip)
Audit Mode (on-demand scan):
- Build fingerprint index
- Pairwise comparison of all installed skills
- Deep diff close matches via parallel sub-agents
- Read-only report with overlap findings and recommendations
Trigger: /capture-learnings or "save learnings" or at end of session
A two-phase process:
Phase 1: Extract learnings from the current session and append to learnings.md:
- Bugs and root causes
- API/library gotchas
- Architectural patterns and decisions
- Useful commands and configs
- Failed approaches (warnings)
Phase 2: Cross-reference learnings against existing skills and propose targeted improvements (e.g., adding a gotcha to a skill's caveats section).
Trigger: "how do I do X", "find a skill for X", "is there a skill that can..."
Discovers and installs skills from the open ecosystem using the Skills CLI:
npx skills find [query] # Search for skills
npx skills add <package> # Install a skill
npx skills check # Check for updates
npx skills update # Update all skillsBrowse the ecosystem at skills.sh.
Trigger: "record this flow", "make a screen recording of X", "demo this feature"
Automates polished screen recordings with two auto-detected modes:
| Mode | Trigger | Pipeline |
|---|---|---|
| Browser | URL in prompt | Steel Dev (headless browser) -> Remotion |
| Mac App | App name in prompt | ffmpeg + AppleScript + cliclick -> Remotion |
Both modes produce a moments.json (timestamped action log) that feeds into Remotion for post-processing: dead-time trimming, clip merging/splitting, smooth zoom keyframes, and gradient backgrounds.
Prerequisites: Remotion, Steel Dev (browser mode), ffmpeg + cliclick (Mac app mode).
| Agents | Skills | |
|---|---|---|
| What | Autonomous sub-processes | Instruction sets loaded into context |
| How they run | Launched via Agent tool in isolated contexts |
Invoked via Skill tool or /command |
| Format | Single .md file with YAML frontmatter |
SKILL.md + optional scripts/, references/, assets/ |
| Context cost | None until launched (runs in sub-process) | Metadata always in context (~100 words); body loaded on trigger |
| Best for | Heavy, isolated tasks (review, testing, scanning) | Workflows that guide the main agent (implementation, pushing, creating) |
~/.claude/ # Global installation (all projects)
skills/
gitpush/
SKILL.md
examples.md
code-implementation/
SKILL.md
skill-creator/
SKILL.md
scripts/
init_skill.py
package_skill.py
quick_validate.py
...
agents/
code-implementation.md
code-reviewer.md
integration-test-validator.md
security-scanner.md
Or install project-local at ./.claude/skills/ and ./.claude/agents/ for project-specific setups.
Skills use a three-level loading system to manage context efficiently:
- Metadata (~100 words) --
name+descriptionfrom frontmatter; always in context - SKILL.md body (<5k words) -- loaded only when the skill triggers
- Bundled resources (unlimited) -- scripts, references, and assets loaded as-needed by the agent
This ensures the context window isn't bloated with instructions for skills that aren't being used.
The setup.sh script handles initial configuration. It personalizes the gitpush skill with your GitHub identity by replacing YOUR_GITHUB_USERNAME and YOUR_EMAIL placeholders in SKILL.md.
Supported install targets:
| Option | Path | Scope |
|---|---|---|
| Claude Code (global) | ~/.claude/skills/ + ~/.claude/agents/ |
All projects |
| Claude Code (project) | ./.claude/skills/ + ./.claude/agents/ |
Current project only |
| Codex | ~/.agents/skills/ + ~/.agents/agents/ |
All Codex projects |
| Custom | Your choice | Your choice |
Use the built-in skill-creator or skill-creator-v2 to build new skills:
# Quick creation
/skill-creator "my-new-skill"
# Benchmark-driven creation with A/B testing
/skill-creator-v2Or scaffold manually:
# Initialize a skill directory
python3 ~/.claude/skills/skill-creator/scripts/init_skill.py my-skill --path ./.claude/skills/
# Validate
python3 ~/.claude/skills/skill-creator/scripts/quick_validate.py ./.claude/skills/my-skill/
# Package for distribution
python3 ~/.claude/skills/skill-creator/scripts/package_skill.py ./.claude/skills/my-skill/Every skill needs at minimum a SKILL.md with YAML frontmatter (name and description) and markdown instructions. See skills/skill-creator/SKILL.md for the full creation guide.
- Fork the repo
- Create a feature branch (
git checkout -b my-skill) - Add your skill under
skills/or agent underagents/ - Validate your skill:
python3 skills/skill-creator/scripts/quick_validate.py skills/your-skill/ - Open a pull request
Guidelines:
- Skills should be focused and modular -- one skill, one job
- Keep
SKILL.mdunder 500 lines; split detailed content intoreferences/ - Include at least one concrete example showing: user prompt -> skill behavior -> expected output
- Write comprehensive
descriptionfields in frontmatter (this is how agents decide when to use your skill)
See individual skill files for license information.