Skip to content

cu-library/NZDedupeCheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

NZDedupeCheck

Checks quality of matching between records in CF NZ Dedupe Report

Inputs:
•NZ Dedupe Report in .csv sorted by Identifier!
NOTE: Requires columns for Title, Publication Date, Language Of Cataloging, Author, ISBN (Normalized), Edition, and Publisher, along with standard columns. Use KB - NZ Dedupe Report with Comparison Fields template available in NZ Analytics instance.

Outputs:
•Report with confidence next to all records with matching values in Identifier column

Process:
•Prompts for file using tkinter filedialog
•Compares adjacent rows on value in Identifier column (file must be sorted by Identifier to ensure matches are adjacent)
•If a match is found, compares key fields (Title, Publication Date, Language Of Cataloging, Author, ISBN (Normalized), Edition, and Publisher) using fuzz.WRatio
•Adds a column for Similarity, populated with the average of all comparison fields or 0 if no matching record found
•Prompt user to select a directory for the output file
•Saves the output as a file with a unique name using date and time

Dependencies:
•Pandas
•FuzzyWuzzy
•NumPy
•DateTime
•TKinter
•Time

About

Checks quality of matching between records in CF NZ Dedupe Report

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages