feat: expand moderation categories by walkerfrankenberg · Pull Request #163 · muxinc/ai

walkerfrankenberg · 2026-04-08T21:31:21Z

Overview

Expand the moderation endpoint from 2 categories (sexual, violence) to 5 by adding hate, self-harm, and drugs. Both OpenAI and Hive providers now extract and return the new scores. Also fixes a bogus Hive category (garm_death_injury_or_military_conflict) that never matched anything, and moves yes_self_harm/yes_emaciated_body out of the violence bucket into their own self-harm category.

What was changed

src/workflows/moderation.ts — Added hate, selfHarm, drugs fields to ThumbnailModerationScore, ModerationResult, and ModerationOptions interfaces. Added HIVE_HATE_CATEGORIES, HIVE_SELF_HARM_CATEGORIES, HIVE_ILLICIT_CATEGORIES constant arrays. Removed garm_death_injury_or_military_conflict from HIVE_VIOLENCE_CATEGORIES (invalid class name from Hive's GARM model, not the visual moderation model). Moved yes_self_harm/yes_emaciated_body from violence to self-harm. OpenAI extraction now reads hate, self-harm, and illicit from category_scores (mapped to drugs in our API). exceedsThreshold now checks all 5 categories.
tests/unit/moderation.test.ts — Snapshot tests for all 5 Hive category arrays.
tests/eval/moderation.eval.ts — Response-integrity scorer and eval columns updated for new fields.
docs/API.md — Documented new threshold options and response fields.

Suggested review order

src/workflows/moderation.ts — types and Hive category constants (lines 30-190)
src/workflows/moderation.ts — OpenAI extraction changes (search for categoryScores.hate)
src/workflows/moderation.ts — Hive extraction and aggregation (search for HIVE_HATE_CATEGORIES)
src/workflows/moderation.ts — getModerationScores max score + threshold logic (bottom of file)
tests/unit/moderation.test.ts
docs/API.md
tests/eval/moderation.eval.ts

Note

Medium Risk
Expands the getModerationScores API surface and thresholding logic from 2 to 5 categories, which can break downstream consumers expecting the old schema and changes what content gets flagged. Provider-specific category mapping tweaks (especially Hive category buckets) also risk behavior changes in production moderation results.

Overview
getModerationScores now returns and thresholds five moderation categories (adds hate, selfHarm, drugs alongside sexual/violence) for both OpenAI and Hive providers, including updated maxScores, per-sample scores, and exceedsThreshold evaluation.

Hive category handling is refactored into explicit HIVE_HATE_CATEGORIES, HIVE_SELF_HARM_CATEGORIES, and HIVE_ILLICIT_CATEGORIES, removing an invalid category and moving self-harm-related classes out of the violence bucket. Docs and eval/unit tests are updated to reflect the new fields and category constants.

^{Reviewed by Cursor Bugbot for commit a22dd6f. Bugbot is set up for automated code reviews on this repo. Configure here.}

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

snyk-io · 2026-04-08T21:31:31Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 2cca2be. Configure here.}

daniel-hayes

LGTM

walkerfrankenberg · 2026-04-09T18:50:58Z

Okay, so according to the OpenAI docs, they only support image classification for sexual, violence, and self-harm. They don't do it for drugs or hate 😭 So going to put this on the shelf until they release a more up-to-date model

feat: xpand moderation categories

2cca2be

claude Bot reviewed Apr 8, 2026

View reviewed changes

walkerfrankenberg requested a review from a team April 8, 2026 21:32

cursor Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/workflows/moderation.ts

tests

a22dd6f

daniel-hayes approved these changes Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: expand moderation categories#163

feat: expand moderation categories#163
walkerfrankenberg wants to merge 2 commits into
mainfrom
wf/moderation-categories

walkerfrankenberg commented Apr 8, 2026 •

edited by cursor Bot

Loading

Uh oh!

claude Bot left a comment

Uh oh!

snyk-io Bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

daniel-hayes left a comment

Uh oh!

walkerfrankenberg commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

walkerfrankenberg commented Apr 8, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What was changed

Suggested review order

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

snyk-io Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

daniel-hayes left a comment

Choose a reason for hiding this comment

Uh oh!

walkerfrankenberg commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

walkerfrankenberg commented Apr 8, 2026 •

edited by cursor Bot

Loading

snyk-io Bot commented Apr 8, 2026 •

edited

Loading