⏱ ~15 min · 📍 Module 3 of 7
This workshop module focuses on identifying and preventing the exposure of sensitive credentials, API keys, and other secrets in source code, configuration files, and git history.
This is an old problem, but it is still a common one (especially with the AI surge). Secrets are the keys to your kingdom. If a password or token is accidentally committed to code, it's immediately at risk, potentially leading to:
- Data Breaches - Exposed credentials can lead to unauthorized access
- Access to our Systems - Compromised secrets enable lateral movement
- Supply Chain Attacks - Compromised release artifacts
All of this can lead to:
- Compliance Issues - Regulatory requirements for data protection
- Financial Loss - Unauthorized usage of cloud services
- Reputation Damage - Public exposure of security incidents
Secrets scan involves scanning code repositories, commit history, and configuration files to identify exposed credentials such as API keys, passwords, tokens, certificates, and other sensitive information that should not be stored in version control.
Tip
Ideally, we should implement this before pushing the code to the repository, using pre-commit or similar tools. But accidents happen, so it's important to have a process to detect them in the pipeline as a safety net.
GitGuardian detected 28.65M new hardcoded secrets in public GitHub during 2025 (+34% YoY), with AI-related credentials behind 8 of the 10 fastest-growing categories. 1
- Core providers (OpenAI, Anthropic, Google AI, Mistral, xAI, DeepSeek), LLM infrastructure (RAG, orchestration, vector stores) and AI gateways (OpenRouter, Groq)
- +81% YoY in 2025 — LLM infrastructure leaks growing 5× faster than core providers
- Credentials embedded in Model Context Protocol server configs (API keys passed as CLI args or
envblocks committed to version control) - 24,008 unique secrets exposed in public MCP configs in 2025 — top types: Google API, PostgreSQL, Firecrawl, Perplexity, Brave Search
- Hardcoded passwords, database connection strings, custom tokens, plaintext encryption keys
- The largest single bucket year after year, often slipping past specific-pattern detectors
- MongoDB connection strings, MySQL/PostgreSQL credentials
- Frequently bundled inside
.envfiles or MCPDATABASE_URIfields
- AWS IAM access keys, Google Cloud keys, Azure SAS tokens
- High blast radius — direct path to lateral movement in cloud accounts
- Stripe, Twilio, SendGrid, payment processors, monitoring SaaS, etc.
- Modern apps integrate dozens of these — each one a potential leak source
- Telegram Bot tokens, Slack/Discord webhooks
- Used by attackers for impersonation, data exfiltration and social-engineering pivots
- RSA/SSH private keys, JWT signing keys, TLS certificates
- Consistently in the top-10 leak types
- GitHub PATs, fine-grained PATs, GitLab tokens
- Frequently classified as critical — the Shai-Hulud 2 supply-chain attack alone harvested 581 GitHub PATs + 386 OAuth tokens from compromised dev machines in 2025
⚠️ Detection ≠ remediation. 64% of secrets confirmed valid in 2022 are still valid in 2026. 1 Pair detection with a clear rotation/revocation playbook.
There are other types of secrets or sensitive data that may not be covered by the tools we will use in this workshop, but that are still important to keep in mind:
- Webhooks are used to trigger actions in other systems, and can be used to trigger malicious actions.
- Like Slack webhooks, that can be used to impersonate a user and send messages to a channel.
- AWS account IDs (or other cloud provider identifiers) can be used to enumerate resources and help attackers to map the attack surface.
- There's a lot of discussion about if this should be considered a secret or not. According to AWS they are not, but we think this article from Daniel Grzelak is a good explanation of why they are: The Final Answer: AWS Account IDs Are Secrets
- PDFs, log files, customer data dumps, identity scans and even bank statements that slip into repos during troubleshooting or demo work.
| Tool | What it does | Notes |
|---|---|---|
| TruffleHog | Git history secrets scanner | Verifies findings against live providers when possible |
| Gitleaks | Git history and filesystem secrets scanner | Rule-based; tunable via .gitleaks.toml |
By the end of this module, you will:
- Understand different types of secrets and their risks
- Learn to use automated secrets detection tools
- Know how to scan git history for leaked credentials
- Learn to prevent future secret exposures
- Use environment variables for secrets
- Implement proper .gitignore patterns
- No secrets in source code
- Git history cleaned of secrets
- Environment variables properly configured
- Secrets management solution implemented
- Pre-commit hooks for secrets detection
- Rotate compromised credentials immediately
- Regular secrets auditing process
- GitGuardian — The State of Secrets Sprawl 2026: the 5th-edition report covering AI-fueled leaks, MCP configuration sprawl and credential persistence trends.
- How did OpenAI detect that my API key was pushed to GitHub?: Reddit user surprised because OpenAI contacted him seconds after pushing an API key to GitHub.
The bait is two hardcoded AWS credentials at
code/src/simple-app.js:4-5. They're a didactic AKIA — not real and not validated against AWS — but secret scanners detect them by format. The CI failures are intentional.
TruffleHog — Detector Type: AWS at code/src/simple-app.js:4
What TruffleHog flagged: 1 unverified result on the AKIA… access-key literal. Look in the Summarize findings step:
❌ Found 1 TruffleHog finding(s):
- AWS | verified=false
code/src/simple-app.js:4
Why the snippet uses --results=verified,unknown,unverified: the workshop's AKIA is not a real AWS key, so TruffleHog can't validate it against AWS. Under the default verified-only filter the build would silently pass — the extra flags are what make the bait fire.
Why we run with --json: the snippet pipes JSON output into a follow-up Summarize findings step (same shape as the other workshop modules). The default verified=false field tells you the secret matched a pattern but couldn't be validated against the live provider.
Fix — move the hardcoded credentials to environment variables:
-const AWS_ACCESS_KEY_ID = 'AKIA…REDACTED'
-const AWS_SECRET_ACCESS_KEY = '[redacted-40-char-secret]'
+const AWS_ACCESS_KEY_ID = process.env.AWS_ACCESS_KEY_ID;
+const AWS_SECRET_ACCESS_KEY = process.env.AWS_SECRET_ACCESS_KEY;Gitleaks — aws-access-token + generic-api-key at code/src/simple-app.js:4-5
What Gitleaks flagged: 2 leaks — the access-key (rule aws-access-token) and the secret-key (rule generic-api-key, high-entropy match). The output has rich detail (Finding / Secret / RuleID / Entropy / File / Line / Fingerprint), but the wget-based install pollutes the log with a progress bar — scroll past it to find Finding:.
Snippet note: the snippet uses --no-git so only the current filesystem is scanned, not git history. Once your latest commit is clean, Gitleaks goes green even if older commits on the branch still contain the literals.
Fix — same as TruffleHog (move both to process.env.*). One edit clears both rules.
⚠️ If you write notes inside this repo (e.g. adocs/writeup that quotes the bait literals), Gitleaks will trip on your notes too. Either redact (AKIA…REDACTED,[redacted-40-char-secret]) or add a.gitleaksignorecovering your notes path.