Skip to content

cu-library/marclinkcheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

marclinkcheck

Includes three scripts: marclinkcheck (checks validity of links in MARC records), invalidurldelete (deletes 856 fields with invalid links from MARC records), recordsplit2.py (splits records into three files: records with no valid links, records with one valid link, and records with more than one valid link)

MARC Link Checker
Accepts as input a file with records in MARC Binary (.mrc). Collects URLs from 856$u and checks using the following process:
 •Creates a list of all record IDs and URLs in records from 856$u
 •Checks all domains in list for validity
 •Checks all URLs in list:
  •If domain is invalid, marks URL as invalid and adds to list of broken links
  •If domain is valid, sends a HEAD request to check status of result
  •If HEAD request is denied, sends a GET request instead
  •Returns status code. If status code != 200, adds to list of broken URLs
Returns a CSV with record IDs, broken URLs, and HTTP status codes or error messages


NOTES:
 •Requests are sent in sequence, not concurrently, to avoid hitting rate limits or IP blocks
 •Exponential backoff (up to 32 seconds) is implemented for URLs returning HTTP status code 429 (Too Many Requests)
 •Does not account for status code 403/405 beyond retrying with GET

Invalid URL Deleter
Accepts two inputs: the CSV output by marclinkcheck and a set of MARC records in MARC Binary. Checks each URL in the MARC records against the list of broken links and deletes any 856 fields with URLs that appear on the list. Saves the revised records to a new .mrc file.

Record Splitter
Accepts a set of MARC records in MARC Binary and splits them into three files: records with no 856 fields with URLs in $u, records with a single 856 with a URL in $u, and records with multiple 856 fields with URLs in $u.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages