feat: add configurable egress redaction and privacy modes#59
Conversation
Summary of ChangesHello @altantutar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances user privacy by introducing a comprehensive and configurable egress redaction system. It centralizes the handling of sensitive data, allowing users to define privacy modes and specific redaction rules for various data types. This ensures that personal information is appropriately sanitized before being used in external contexts like AI analysis or API responses, providing greater control and transparency over data sharing. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughThis PR introduces configurable egress privacy controls across the system. It adds privacy modes (minimal, standard, full), redaction logic for sensitive data (emails, secrets, IDs, URLs), and threads privacy settings through configuration, database formatting, API responses, memory queries, and activity processing. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive and well-designed privacy redaction pipeline. The changes are consistently applied across the codebase, including the API, CLI, and memory search functionalities. The new privacy.ts module is a great addition, centralizing the redaction logic and making it easy to maintain. The test coverage for the new features is also good. I have one suggestion to improve maintainability by reducing code duplication in the CLI configuration handling.
| if (options.redactEmails !== undefined) { | ||
| const parsed = parseToggle(String(options.redactEmails)); | ||
| if (parsed === null) { | ||
| console.log(chalk.red('Invalid value for --redact-emails. Use: on or off')); | ||
| process.exitCode = 1; | ||
| return; | ||
| } | ||
| saveConfig({ egressRedactEmails: parsed }); | ||
| console.log(chalk.green(`✓ Egress email redaction ${parsed ? 'enabled' : 'disabled'}`)); | ||
| console.log(chalk.gray('Applies to outbound API/model/memory context.\n')); | ||
| } | ||
|
|
||
| if (options.redactSecrets !== undefined) { | ||
| const parsed = parseToggle(String(options.redactSecrets)); | ||
| if (parsed === null) { | ||
| console.log(chalk.red('Invalid value for --redact-secrets. Use: on or off')); | ||
| process.exitCode = 1; | ||
| return; | ||
| } | ||
| saveConfig({ egressRedactSecrets: parsed }); | ||
| console.log(chalk.green(`✓ Egress secret redaction ${parsed ? 'enabled' : 'disabled'}`)); | ||
| console.log(chalk.gray('Applies to outbound API/model/memory context.\n')); | ||
| } | ||
|
|
||
| if (options.redactIds !== undefined) { | ||
| const parsed = parseToggle(String(options.redactIds)); | ||
| if (parsed === null) { | ||
| console.log(chalk.red('Invalid value for --redact-ids. Use: on or off')); | ||
| process.exitCode = 1; | ||
| return; | ||
| } | ||
| saveConfig({ egressRedactIds: parsed }); | ||
| console.log(chalk.green(`✓ Egress ID redaction ${parsed ? 'enabled' : 'disabled'}`)); | ||
| console.log(chalk.gray('Applies to outbound API/model/memory context.\n')); | ||
| } | ||
|
|
||
| if (options.redactUrls !== undefined) { | ||
| const parsed = parseToggle(String(options.redactUrls)); | ||
| if (parsed === null) { | ||
| console.log(chalk.red('Invalid value for --redact-urls. Use: on or off')); | ||
| process.exitCode = 1; | ||
| return; | ||
| } | ||
| saveConfig({ egressRedactUrls: parsed }); | ||
| console.log(chalk.green(`✓ Egress URL redaction ${parsed ? 'enabled' : 'disabled'}`)); | ||
| console.log(chalk.gray('Applies to outbound API/model/memory context.\n')); | ||
| } |
There was a problem hiding this comment.
These blocks for handling redaction toggles (--redact-emails, --redact-secrets, etc.) are very similar and contain duplicated logic. To improve maintainability and reduce redundancy, you can refactor this into a single helper function. This will make the code cleaner and easier to modify in the future if more redaction options are added.
const handleRedactionToggle = (
optionValue: unknown,
optionCliName: string,
configKey: keyof import('./config.js').SynapseConfig,
displayName: string
): boolean => {
if (optionValue !== undefined) {
const parsed = parseToggle(String(optionValue));
if (parsed === null) {
console.log(chalk.red(`Invalid value for --${optionCliName}. Use: on or off`));
process.exitCode = 1;
return true; // Indicates should exit
}
saveConfig({ [configKey]: parsed } as Partial<import('./config.js').SynapseConfig>);
console.log(chalk.green(`✓ Egress ${displayName} redaction ${parsed ? 'enabled' : 'disabled'}`));
console.log(chalk.gray('Applies to outbound API/model/memory context.\n'));
}
return false; // Indicates should continue
};
if (handleRedactionToggle(options.redactEmails, 'redact-emails', 'egressRedactEmails', 'email')) return;
if (handleRedactionToggle(options.redactSecrets, 'redact-secrets', 'egressRedactSecrets', 'secret')) return;
if (handleRedactionToggle(options.redactIds, 'redact-ids', 'egressRedactIds', 'ID')) return;
if (handleRedactionToggle(options.redactUrls, 'redact-urls', 'egressRedactUrls', 'URL')) return;There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/memory-search.ts (1)
365-383: Avoid placeholder terms leaking into memory queries in minimal mode. When window/OCR are hidden, the placeholder text (e.g., “[hidden by privacy mode]”) can inject noisy tokens like “hidden/privacy/mode” into queries. Consider skipping window/OCR terms when the policy excludes them.♻️ Suggested refinement
- const combined = [ - session.appName, - session.windowName, - session.ocrTexts.slice(0, 2).join(' '), - ].join(' '); + const combinedParts = [session.appName]; + if (policy.includeWindowTitle && session.windowName) { + combinedParts.push(session.windowName); + } + if (policy.includeOcrText && session.ocrTexts.length > 0) { + combinedParts.push(session.ocrTexts.slice(0, 2).join(' ')); + } + const combined = combinedParts.join(' ');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/memory-search.ts` around lines 365 - 383, extractSessionTerms currently builds combined tokens using session.windowName and session.ocrTexts even when privacy sanitization replaced them with placeholder text, which leaks placeholder tokens into queries; update extractSessionTerms to consult the resolved policy (from resolveEgressPrivacySettings) and skip appName/windowName/ocrTexts pieces that are suppressed by the policy (and also ignore known placeholder strings like "[hidden by privacy mode]" or empty strings) before joining, e.g., use the policy flags returned by resolveEgressPrivacySettings and the sanitizedSessions from sanitizeSessionsForEgress to conditionally include session.windowName and session.ocrTexts (or filter out placeholder tokens) when creating combined for frequency counting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/privacy.ts`:
- Around line 32-36: Summary: SECRET_ASSIGNMENT_REGEX can miss tokens when the
value is "Bearer <token>", leaving the actual token unredacted. Fix: update
SECRET_ASSIGNMENT_REGEX to explicitly handle Authorization: Bearer ... by adding
an alternative that matches optional "Bearer" and captures the following token
(e.g., add something like |authorization\b\s*:\s*Bearer\s*([^\s,;"']+) into the
pattern) so the regex consumes the bearer token text, keep the
global/case-insensitive flags, and apply the same change to the second
occurrence around SECRET_TOKEN_REGEX usage (the block you noted at lines ~72-78)
so all Authorization: Bearer headers are redacted.
---
Nitpick comments:
In `@src/memory-search.ts`:
- Around line 365-383: extractSessionTerms currently builds combined tokens
using session.windowName and session.ocrTexts even when privacy sanitization
replaced them with placeholder text, which leaks placeholder tokens into
queries; update extractSessionTerms to consult the resolved policy (from
resolveEgressPrivacySettings) and skip appName/windowName/ocrTexts pieces that
are suppressed by the policy (and also ignore known placeholder strings like
"[hidden by privacy mode]" or empty strings) before joining, e.g., use the
policy flags returned by resolveEgressPrivacySettings and the sanitizedSessions
from sanitizeSessionsForEgress to conditionally include session.windowName and
session.ocrTexts (or filter out placeholder tokens) when creating combined for
frequency counting.
| // Covers common assignment-style leaks: token=..., api_key: ..., Authorization: Bearer ... | ||
| const SECRET_ASSIGNMENT_REGEX = /\b(api[_-]?key|access[_-]?token|refresh[_-]?token|auth(?:orization)?|secret|password|passwd|bearer)\b\s*[:=]\s*([^\s,;"']+)/gi; | ||
| // Covers opaque credential formats (JWTs, GitHub/OpenAI-like keys, long random tokens). | ||
| const SECRET_TOKEN_REGEX = /\b(?:ghp_[A-Za-z0-9]{20,}|sk-[A-Za-z0-9]{16,}|AIza[0-9A-Za-z\-_]{20,}|eyJ[A-Za-z0-9_\-]{8,}\.[A-Za-z0-9._\-]{8,}\.[A-Za-z0-9._\-]{8,}|[A-Za-z0-9_\-]{32,})\b/g; | ||
|
|
There was a problem hiding this comment.
Redaction can miss Bearer tokens in Authorization headers.
The assignment regex only replaces the first token after the delimiter, so Authorization: Bearer <token> can leave the actual token intact unless it matches the token regex. That’s a privacy leak for short/non-matching tokens.
🔧 Proposed fix
-const SECRET_ASSIGNMENT_REGEX = /\b(api[_-]?key|access[_-]?token|refresh[_-]?token|auth(?:orization)?|secret|password|passwd|bearer)\b\s*[:=]\s*([^\s,;"']+)/gi;
+const SECRET_ASSIGNMENT_REGEX = /\b(api[_-]?key|access[_-]?token|refresh[_-]?token|auth(?:orization)?|secret|password|passwd)\b\s*[:=]\s*(?:bearer\s+)?([^\s,;"']+)/gi;Also applies to: 72-78
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/privacy.ts` around lines 32 - 36, Summary: SECRET_ASSIGNMENT_REGEX can
miss tokens when the value is "Bearer <token>", leaving the actual token
unredacted. Fix: update SECRET_ASSIGNMENT_REGEX to explicitly handle
Authorization: Bearer ... by adding an alternative that matches optional
"Bearer" and captures the following token (e.g., add something like
|authorization\b\s*:\s*Bearer\s*([^\s,;"']+) into the pattern) so the regex
consumes the bearer token text, keep the global/case-insensitive flags, and
apply the same change to the second occurrence around SECRET_TOKEN_REGEX usage
(the block you noted at lines ~72-78) so all Authorization: Bearer headers are
redacted.
| // Covers common assignment-style leaks: token=..., api_key: ..., Authorization: Bearer ... | ||
| const SECRET_ASSIGNMENT_REGEX = /\b(api[_-]?key|access[_-]?token|refresh[_-]?token|auth(?:orization)?|secret|password|passwd|bearer)\b\s*[:=]\s*([^\s,;"']+)/gi; | ||
| // Covers opaque credential formats (JWTs, GitHub/OpenAI-like keys, long random tokens). | ||
| const SECRET_TOKEN_REGEX = /\b(?:ghp_[A-Za-z0-9]{20,}|sk-[A-Za-z0-9]{16,}|AIza[0-9A-Za-z\-_]{20,}|eyJ[A-Za-z0-9_\-]{8,}\.[A-Za-z0-9._\-]{8,}\.[A-Za-z0-9._\-]{8,}|[A-Za-z0-9_\-]{32,})\b/g; |
There was a problem hiding this comment.
Good addition overall. Potential over-redaction: SECRET_TOKEN_REGEX currently includes [A-Za-z0-9_-]{32,}, which may match normal long IDs/slugs/content that are not secrets. That can degrade signal in outbound context.\n\nConsider tightening this fallback (e.g., require known prefixes/pattern families or entropy heuristics) to reduce false positives while preserving secret redaction.
Summary
minimal,standard,full) and redaction toggles for emails/secrets/ids/urlssynapse configoptionsValidation
bunx tsc --noEmitbun testCloses #47
Summary by CodeRabbit