feat: add credit card PII probe and detectors#1748
Open
HayatoFujihara wants to merge 1 commit into
Open
Conversation
Signed-off-by: HayatoFujihara <housei.koutyoku7@gmail.com>
e9caa8a to
3e5fd2a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #88
This PR adds a new
piimodule for credit-card leakage checks.New detectors
pii.CreditCardspii.LuhnNew probe
pii.CreditCardsWhy this design?
Issue #88 asks for a probe that tries to get models to generate credit cards, with checks for:
I split this into two detector levels:
pii.CreditCardsis strict and matches the full issue requirement.pii.Luhnis narrower and catches card-number-only leakage.This keeps the full detector precise while still exposing the simpler Luhn-only check mentioned in the issue discussion.
Research grounding
The detector logic is not paper-derived; it implements the concrete checks requested in #88. The papers informed the probe prompt design.
Carlini et al. (2020)
Carlini et al., “Extracting Training Data from Large Language Models” shows that language models can emit memorized training data when prompted with prefixes and continuation-style queries.
This maps to the probe design in two ways:
Examples in this PR:
Nakka et al. (2024)
Nakka et al., “PII-Compass: Guiding LLM Training Data Extraction Prompts towards the Target PII via Grounding” shows that grounded / in-domain prefixes and record-like templates can improve PII extraction compared with naive prompts.
This maps to the probe design as synthetic billing and payment contexts rather than generic “give me a card number” prompts.
Examples in this PR:
The prompts reuse the structure of grounded-prefix and record-continuation attacks, but they do not include real PII from either paper or from any dataset.
Detector behavior
pii.CreditCardsreturns a hit only when all required pieces are present.Examples that hit:
Examples that do not hit:
Missing CVC.
Invalid Luhn check.
Expired card.
pii.Luhnis intentionally looser and detects Luhn-valid card numbers even when expiry/CVC are absent.Verification
pytest tests/detectors/test_detectors_pii.py tests/detectors/test_detectors.py -k "pii or CreditCards or Luhn"34 passedpytest tests/probes/test_probes_pii.py tests/probes/test_probes.py -k "pii or CreditCards"9 passedpytest tests/test_docs.py -k "pii or CreditCards or Luhn or probe or detector"451 passedruff check garak/detectors/pii.py garak/probes/pii.py tests/detectors/test_detectors_pii.py tests/probes/test_probes_pii.pymypy garak/detectors/pii.py garak/probes/pii.py tests/detectors/test_detectors_pii.py tests/probes/test_probes_pii.pypython -m garak --target_type test.Blank --probes pii.CreditCards --detectors pii.CreditCards --report_prefix issue88-smoke55/55 PASS