Skip to content

feat: add configurable egress redaction and privacy modes#59

Open
altantutar wants to merge 1 commit into
serrrfirat:mainfrom
altantutar:codex/issue-47-redaction-egress
Open

feat: add configurable egress redaction and privacy modes#59
altantutar wants to merge 1 commit into
serrrfirat:mainfrom
altantutar:codex/issue-47-redaction-egress

Conversation

@altantutar

@altantutar altantutar commented Feb 21, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • add a centralized egress privacy/redaction pipeline for activity context
  • introduce configurable privacy modes (minimal, standard, full) and redaction toggles for emails/secrets/ids/urls
  • apply the privacy pipeline consistently to API activity summaries, analysis prompt context, and memory-query term extraction
  • expose privacy controls via synapse config options

Validation

  • bunx tsc --noEmit
  • bun test

Closes #47

Summary by CodeRabbit

  • New Features
    • Added egress privacy modes (minimal, standard, full) to control sensitive data exposure in activity summaries and API responses.
    • Added configurable redaction controls for emails, secrets, identifiers, and URLs to protect sensitive information.
    • Added new CLI configuration options to manage privacy preferences and redaction settings.
    • Enabled privacy-aware activity formatting and memory query construction.

@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello @altantutar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances user privacy by introducing a comprehensive and configurable egress redaction system. It centralizes the handling of sensitive data, allowing users to define privacy modes and specific redaction rules for various data types. This ensures that personal information is appropriately sanitized before being used in external contexts like AI analysis or API responses, providing greater control and transparency over data sharing.

Highlights

  • Centralized Egress Privacy Pipeline: Introduced a new, centralized pipeline for applying privacy and redaction rules to activity context before it leaves the system.
  • Configurable Privacy Modes: Added support for configurable egress privacy modes: 'minimal', 'standard', and 'full', allowing users to control the level of detail shared.
  • Granular Redaction Toggles: Implemented individual toggles for redacting sensitive information such as emails, secrets, IDs (like UUIDs, SSNs, phone numbers), and URLs from outbound data.
  • Consistent Application: Ensured that the new privacy pipeline is consistently applied across API activity summaries, analysis prompt contexts, and memory-query term extraction.
  • User Configuration: Exposed all new privacy controls via the Synapse Configuration, allowing users to manage these settings through CLI options and the config file.
Changelog
  • src/api-server.ts
    • Updated getRecentActivity call to pass new privacy configuration settings for activity summaries.
  • src/config.ts
    • Imported EgressPrivacyMode type.
    • Added new configuration properties for egressPrivacyMode, egressRedactEmails, egressRedactSecrets, egressRedactIds, and egressRedactUrls to the SynapseConfig interface.
    • Initialized default values for the new privacy-related configuration options in DEFAULT_CONFIG.
  • src/db.test.ts
    • Imported formatActivityForClaude for testing.
    • Added new test suite egress privacy formatting to verify redaction of sensitive content in standard mode and hiding of window/OCR in minimal mode for formatActivityForClaude.
  • src/db.ts
    • Imported resolveEgressPrivacySettings, sanitizeSessionsForEgress, and EgressPrivacyInput from the new privacy.js module.
    • Modified formatActivityForClaude to accept privacy options, resolve privacy settings, and sanitize sessions before formatting.
    • Updated getRecentActivity to accept privacy options and pass them to formatActivityForClaude.
  • src/doctor.ts
    • Updated the default configuration in checkConfig to include the new egress privacy and redaction settings when creating or replacing the config file.
  • src/index.ts
    • Imported parseEgressPrivacyMode from the new privacy.js module.
    • Created an egressPrivacy object from runtime configuration and passed it to getRecentActivity and queryContext.
    • Added new CLI options for --privacy-mode, --redact-emails, --redact-secrets, --redact-ids, and --redact-urls.
    • Implemented logic to parse and save the new privacy configuration options via CLI.
    • Updated the config command output to display the current egress privacy mode and redaction settings.
  • src/memory-search.test.ts
    • Added a new test file to verify that buildMemoryQueries correctly redacts sensitive content and strips OCR/window terms based on privacy settings.
  • src/memory-search.ts
    • Imported privacy-related functions and types (resolveEgressPrivacySettings, sanitizeSessionsForEgress, EgressPrivacyInput).
    • Added an optional privacy property to the MemoryQueryContext interface.
    • Modified extractSessionTerms to accept and utilize privacy settings for sanitizing session data.
  • src/privacy.test.ts
    • Added a new test file to cover the functionality of the privacy.ts module, including redaction of various sensitive data types, behavior of minimal and full privacy modes, and parsing of privacy mode strings.
  • src/privacy.ts
    • Added a new module defining types (EgressPrivacyMode, EgressPrivacyInput, EgressPrivacySettings) and constants for privacy-related regex patterns (emails, URLs, UUIDs, SSNs, phone numbers, secrets).
    • Defined MODE_PRESETS for 'minimal', 'standard', and 'full' privacy modes, specifying window/OCR inclusion and character limits.
    • Implemented utility functions truncate and compactWhitespace.
    • Created redaction functions redactSecrets and redactIds.
    • Provided resolveEgressPrivacySettings to combine input with default presets.
    • Implemented sanitizeTextForEgress to apply redaction rules to a given string.
    • Developed sanitizeSessionsForEgress to process and sanitize an array of ActivitySession objects based on privacy settings.
    • Added parseEgressPrivacyMode to safely parse privacy mode strings.
  • src/scripts/daemon.ts
    • Imported resolveEgressPrivacySettings and sanitizeSessionsForEgress.
    • Collected egress privacy settings from the configuration.
    • Passed privacy settings to getRecentActivity and formatActivityForClaude.
    • Sanitized activity sessions using the resolved privacy settings before scoring and formatting for analysis.
  • src/scripts/export-activity.ts
    • Updated the getRecentActivity call to include egress privacy settings from the configuration when exporting activity.
Activity
  • The pull request introduces a new feature, indicating initial development work.
  • Validation was performed by running bun test v1.3.5 (1e86cebd), confirming that existing and new tests passed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai

coderabbitai Bot commented Feb 21, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR introduces configurable egress privacy controls across the system. It adds privacy modes (minimal, standard, full), redaction logic for sensitive data (emails, secrets, IDs, URLs), and threads privacy settings through configuration, database formatting, API responses, memory queries, and activity processing.

Changes

Cohort / File(s) Summary
Privacy Core Module
src/privacy.ts
Introduces new privacy types, modes, and utilities: parseEgressPrivacyMode, resolveEgressPrivacySettings, sanitizeTextForEgress, and sanitizeSessionsForEgress with redaction patterns for PII and sensitive content.
Configuration & Initialization
src/config.ts, src/doctor.ts
Adds five new egress privacy fields to SynapseConfig (egressPrivacyMode, egressRedactEmails, egressRedactSecrets, egressRedactIds, egressRedactUrls) with defaults; updates config creation and fallback initialization paths.
Database Formatting
src/db.ts, src/db.test.ts
Expands formatActivityForClaude and getRecentActivity signatures to accept privacy options; applies privacy resolution and sanitization to sessions before formatting; validates redaction behavior in tests.
Memory Query Privacy
src/memory-search.ts, src/memory-search.test.ts
Adds privacy parameter to MemoryQueryContext; refactors extractSessionTerms to resolve and apply egress privacy; validates sensitive content redaction in memory query construction.
Privacy Testing
src/privacy.test.ts
Tests privacy modes, text sanitization, session sanitization, and mode parsing across minimal/standard/full modes.
API & Runtime Integration
src/api-server.ts, src/index.ts
Threads privacy object through API activity endpoint; imports parseEgressPrivacyMode; adds CLI options for privacy configuration (--privacy-mode, --redact-\*); propagates privacy through activity queries and memory context.
Script Processing
src/scripts/daemon.ts, src/scripts/export-activity.ts
Constructs egressPrivacy context from config; passes privacy to getRecentActivity and formatActivityForClaude; sanitizes sessions before scoring and analysis.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

Suggested reviewers

  • serrrfirat

Poem

🐰 With whiskers twitched and nose in code,
A rabbit hops down privacy's road,
Emails redacted, secrets concealed,
Each token, each ID, and URL sealed,
Now egress flows safe, what joy! What delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: add configurable egress redaction and privacy modes' clearly and specifically summarizes the main change: adding configurable privacy controls for egress data, which aligns with the core objective of implementing redaction and data-minimization controls for OCR context.
Linked Issues check ✅ Passed All acceptance criteria from issue #47 are met: (1) sensitive data (emails, secrets, IDs, URLs) are redacted through sanitizeTextForEgress and sanitizeSessionsForEgress functions with default-safe settings, (2) users can configure privacy levels via egressPrivacyMode and redaction toggles (minimal/standard/full modes), and (3) privacy handling is applied consistently across API responses (src/api-server.ts), analysis prompts (src/scripts/daemon.ts), and memory queries (src/memory-search.ts).
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing egress privacy/redaction controls as specified in issue #47. File modifications target privacy pipeline implementation (privacy.ts), configuration management (config.ts, doctor.ts, index.ts), activity data handling (db.ts, api-server.ts), memory search (memory-search.ts), and daemon/export scripts. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive and well-designed privacy redaction pipeline. The changes are consistently applied across the codebase, including the API, CLI, and memory search functionalities. The new privacy.ts module is a great addition, centralizing the redaction logic and making it easy to maintain. The test coverage for the new features is also good. I have one suggestion to improve maintainability by reducing code duplication in the CLI configuration handling.

Comment thread src/index.ts
Comment on lines +1116 to +1162
if (options.redactEmails !== undefined) {
const parsed = parseToggle(String(options.redactEmails));
if (parsed === null) {
console.log(chalk.red('Invalid value for --redact-emails. Use: on or off'));
process.exitCode = 1;
return;
}
saveConfig({ egressRedactEmails: parsed });
console.log(chalk.green(`✓ Egress email redaction ${parsed ? 'enabled' : 'disabled'}`));
console.log(chalk.gray('Applies to outbound API/model/memory context.\n'));
}

if (options.redactSecrets !== undefined) {
const parsed = parseToggle(String(options.redactSecrets));
if (parsed === null) {
console.log(chalk.red('Invalid value for --redact-secrets. Use: on or off'));
process.exitCode = 1;
return;
}
saveConfig({ egressRedactSecrets: parsed });
console.log(chalk.green(`✓ Egress secret redaction ${parsed ? 'enabled' : 'disabled'}`));
console.log(chalk.gray('Applies to outbound API/model/memory context.\n'));
}

if (options.redactIds !== undefined) {
const parsed = parseToggle(String(options.redactIds));
if (parsed === null) {
console.log(chalk.red('Invalid value for --redact-ids. Use: on or off'));
process.exitCode = 1;
return;
}
saveConfig({ egressRedactIds: parsed });
console.log(chalk.green(`✓ Egress ID redaction ${parsed ? 'enabled' : 'disabled'}`));
console.log(chalk.gray('Applies to outbound API/model/memory context.\n'));
}

if (options.redactUrls !== undefined) {
const parsed = parseToggle(String(options.redactUrls));
if (parsed === null) {
console.log(chalk.red('Invalid value for --redact-urls. Use: on or off'));
process.exitCode = 1;
return;
}
saveConfig({ egressRedactUrls: parsed });
console.log(chalk.green(`✓ Egress URL redaction ${parsed ? 'enabled' : 'disabled'}`));
console.log(chalk.gray('Applies to outbound API/model/memory context.\n'));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These blocks for handling redaction toggles (--redact-emails, --redact-secrets, etc.) are very similar and contain duplicated logic. To improve maintainability and reduce redundancy, you can refactor this into a single helper function. This will make the code cleaner and easier to modify in the future if more redaction options are added.

    const handleRedactionToggle = (
      optionValue: unknown,
      optionCliName: string,
      configKey: keyof import('./config.js').SynapseConfig,
      displayName: string
    ): boolean => {
      if (optionValue !== undefined) {
        const parsed = parseToggle(String(optionValue));
        if (parsed === null) {
          console.log(chalk.red(`Invalid value for --${optionCliName}. Use: on or off`));
          process.exitCode = 1;
          return true; // Indicates should exit
        }
        saveConfig({ [configKey]: parsed } as Partial<import('./config.js').SynapseConfig>);
        console.log(chalk.green(`✓ Egress ${displayName} redaction ${parsed ? 'enabled' : 'disabled'}`));
        console.log(chalk.gray('Applies to outbound API/model/memory context.\n'));
      }
      return false; // Indicates should continue
    };

    if (handleRedactionToggle(options.redactEmails, 'redact-emails', 'egressRedactEmails', 'email')) return;
    if (handleRedactionToggle(options.redactSecrets, 'redact-secrets', 'egressRedactSecrets', 'secret')) return;
    if (handleRedactionToggle(options.redactIds, 'redact-ids', 'egressRedactIds', 'ID')) return;
    if (handleRedactionToggle(options.redactUrls, 'redact-urls', 'egressRedactUrls', 'URL')) return;

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/memory-search.ts (1)

365-383: Avoid placeholder terms leaking into memory queries in minimal mode. When window/OCR are hidden, the placeholder text (e.g., “[hidden by privacy mode]”) can inject noisy tokens like “hidden/privacy/mode” into queries. Consider skipping window/OCR terms when the policy excludes them.

♻️ Suggested refinement
-    const combined = [
-      session.appName,
-      session.windowName,
-      session.ocrTexts.slice(0, 2).join(' '),
-    ].join(' ');
+    const combinedParts = [session.appName];
+    if (policy.includeWindowTitle && session.windowName) {
+      combinedParts.push(session.windowName);
+    }
+    if (policy.includeOcrText && session.ocrTexts.length > 0) {
+      combinedParts.push(session.ocrTexts.slice(0, 2).join(' '));
+    }
+    const combined = combinedParts.join(' ');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/memory-search.ts` around lines 365 - 383, extractSessionTerms currently
builds combined tokens using session.windowName and session.ocrTexts even when
privacy sanitization replaced them with placeholder text, which leaks
placeholder tokens into queries; update extractSessionTerms to consult the
resolved policy (from resolveEgressPrivacySettings) and skip
appName/windowName/ocrTexts pieces that are suppressed by the policy (and also
ignore known placeholder strings like "[hidden by privacy mode]" or empty
strings) before joining, e.g., use the policy flags returned by
resolveEgressPrivacySettings and the sanitizedSessions from
sanitizeSessionsForEgress to conditionally include session.windowName and
session.ocrTexts (or filter out placeholder tokens) when creating combined for
frequency counting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/privacy.ts`:
- Around line 32-36: Summary: SECRET_ASSIGNMENT_REGEX can miss tokens when the
value is "Bearer <token>", leaving the actual token unredacted. Fix: update
SECRET_ASSIGNMENT_REGEX to explicitly handle Authorization: Bearer ... by adding
an alternative that matches optional "Bearer" and captures the following token
(e.g., add something like |authorization\b\s*:\s*Bearer\s*([^\s,;"']+) into the
pattern) so the regex consumes the bearer token text, keep the
global/case-insensitive flags, and apply the same change to the second
occurrence around SECRET_TOKEN_REGEX usage (the block you noted at lines ~72-78)
so all Authorization: Bearer headers are redacted.

---

Nitpick comments:
In `@src/memory-search.ts`:
- Around line 365-383: extractSessionTerms currently builds combined tokens
using session.windowName and session.ocrTexts even when privacy sanitization
replaced them with placeholder text, which leaks placeholder tokens into
queries; update extractSessionTerms to consult the resolved policy (from
resolveEgressPrivacySettings) and skip appName/windowName/ocrTexts pieces that
are suppressed by the policy (and also ignore known placeholder strings like
"[hidden by privacy mode]" or empty strings) before joining, e.g., use the
policy flags returned by resolveEgressPrivacySettings and the sanitizedSessions
from sanitizeSessionsForEgress to conditionally include session.windowName and
session.ocrTexts (or filter out placeholder tokens) when creating combined for
frequency counting.

Comment thread src/privacy.ts
Comment on lines +32 to +36
// Covers common assignment-style leaks: token=..., api_key: ..., Authorization: Bearer ...
const SECRET_ASSIGNMENT_REGEX = /\b(api[_-]?key|access[_-]?token|refresh[_-]?token|auth(?:orization)?|secret|password|passwd|bearer)\b\s*[:=]\s*([^\s,;"']+)/gi;
// Covers opaque credential formats (JWTs, GitHub/OpenAI-like keys, long random tokens).
const SECRET_TOKEN_REGEX = /\b(?:ghp_[A-Za-z0-9]{20,}|sk-[A-Za-z0-9]{16,}|AIza[0-9A-Za-z\-_]{20,}|eyJ[A-Za-z0-9_\-]{8,}\.[A-Za-z0-9._\-]{8,}\.[A-Za-z0-9._\-]{8,}|[A-Za-z0-9_\-]{32,})\b/g;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redaction can miss Bearer tokens in Authorization headers.
The assignment regex only replaces the first token after the delimiter, so Authorization: Bearer <token> can leave the actual token intact unless it matches the token regex. That’s a privacy leak for short/non-matching tokens.

🔧 Proposed fix
-const SECRET_ASSIGNMENT_REGEX = /\b(api[_-]?key|access[_-]?token|refresh[_-]?token|auth(?:orization)?|secret|password|passwd|bearer)\b\s*[:=]\s*([^\s,;"']+)/gi;
+const SECRET_ASSIGNMENT_REGEX = /\b(api[_-]?key|access[_-]?token|refresh[_-]?token|auth(?:orization)?|secret|password|passwd)\b\s*[:=]\s*(?:bearer\s+)?([^\s,;"']+)/gi;

Also applies to: 72-78

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/privacy.ts` around lines 32 - 36, Summary: SECRET_ASSIGNMENT_REGEX can
miss tokens when the value is "Bearer <token>", leaving the actual token
unredacted. Fix: update SECRET_ASSIGNMENT_REGEX to explicitly handle
Authorization: Bearer ... by adding an alternative that matches optional
"Bearer" and captures the following token (e.g., add something like
|authorization\b\s*:\s*Bearer\s*([^\s,;"']+) into the pattern) so the regex
consumes the bearer token text, keep the global/case-insensitive flags, and
apply the same change to the second occurrence around SECRET_TOKEN_REGEX usage
(the block you noted at lines ~72-78) so all Authorization: Bearer headers are
redacted.

Comment thread src/privacy.ts
// Covers common assignment-style leaks: token=..., api_key: ..., Authorization: Bearer ...
const SECRET_ASSIGNMENT_REGEX = /\b(api[_-]?key|access[_-]?token|refresh[_-]?token|auth(?:orization)?|secret|password|passwd|bearer)\b\s*[:=]\s*([^\s,;"']+)/gi;
// Covers opaque credential formats (JWTs, GitHub/OpenAI-like keys, long random tokens).
const SECRET_TOKEN_REGEX = /\b(?:ghp_[A-Za-z0-9]{20,}|sk-[A-Za-z0-9]{16,}|AIza[0-9A-Za-z\-_]{20,}|eyJ[A-Za-z0-9_\-]{8,}\.[A-Za-z0-9._\-]{8,}\.[A-Za-z0-9._\-]{8,}|[A-Za-z0-9_\-]{32,})\b/g;

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition overall. Potential over-redaction: SECRET_TOKEN_REGEX currently includes [A-Za-z0-9_-]{32,}, which may match normal long IDs/slugs/content that are not secrets. That can degrade signal in outbound context.\n\nConsider tightening this fallback (e.g., require known prefixes/pattern families or entropy heuristics) to reduce false positives while preserving secret redaction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add redaction and data-minimization controls for OCR context egress

2 participants