Skip to content

goestfish/federal_register

 
 

Repository files navigation

federal_register

analyzation of federal register

pdf-txt's file:

"Find_cid.py" is used to find the txt file which just contain the cid code, after running the program, it will produce a txt file which contain all the cid code txt files' path, such as: "D:\pycharm\pythonProject\pdf-txt\FR(miner)\FR-2000\01\2000-01-03.txt".

"Find_empty.py" is used to check if there are any txt files which is empty because of the bad internet, after running the program, it will produce a txt file which contain all the empty txt files' name, such as "1939-08-18.txt".

"get_empty.py" is used to get the empty file, which need user run the "Find_empty.py" firstly to get the txt file which contain all the empty txt and then run the get_empty.py by changing the txt file's path.

"pdf-txt(miner).py" is used to transfer pdf file to the txt file, which use the pdfminer package. The txt file between 1936-1999 in oneDrive should be used this package.

"pdf-txt(pypdf2).py" is used to transfer pdf file to the txt file, which use the pypdf2 package. The txt file between 2000-2023 in oneDrive should be used this package.

"pdf-txt(pymupdf).py" is usdd to transfer pdf file to the txt file, which use the pymupdf package. None txt file use this one but it's the fastest package to transfer, only can be used after 2000 years

"minerWithHorizontal.py" is the final version and use 0.9 word margin

About

federal register project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%