Skip to content
Discussion options

You must be logged in to vote

Use --threshold as a similarity cutoff between 0.0 and 1.0.

Practical starting points

  • 0.90 to 0.95: strict matching (fewer false positives)
  • 0.84 to 0.89: balanced matching
  • 0.75 to 0.83: flexible matching (review outputs carefully)

Example command

python3 scripts/python/reconciliation/fuzzy_match_students.py \
  --source data/sample/student_records_source.csv \
  --target data/sample/student_records_target.csv \
  --output reports/fuzzy_matches.csv \
  --summary reports/fuzzy_matches_summary.json \
  --threshold 0.86

Relevant code references

Replies: 1 comment

Comment options

toughdave
Mar 3, 2026
Maintainer Author

You must be logged in to vote
0 replies
Answer selected by toughdave
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant