-
- This module converts docx files into plain text then you can print it out or write it to a txt file etc
- Sometimes very handy for student papers to all be in the same format
pip install docx2txt- see this script for an implementation that takes all docx files in the local folder and converts them to txt files
- Be aware that if a document was created using office online this package cannot yet convert it. This will hopefully addressed soon (as of Nov 2020) the issue already has a temporary fix that can be used if needed. Progress on this issue can be tracked here.
- An alternative (docx2python) can be found here. Another alternative (pypandoc) can be found here
text = docx2txt.process(file)
- This module converts docx files into plain text then you can print it out or write it to a txt file etc
-
PrettyErrors - "Prettify Python exception output to make it legible." Just add
import pretty_errorsto the file and errors will be much more readable. -
Rich: Rich is a Python library for rich text and beautiful formatting in the terminal.
-
This package makes it easy to print output that has different formatting. For example if you want to print output that has red text do the following:
# simple example from rich.console import Console console = Console() console.print("Printing in red text using rich", style="red") # or in the form of a function with some different style options shown def rich_setup(): from rich.console import Console from rich.style import Style console = Console() # just the words will have the color, can use color names or hex (and others) rich_prog = Style(color="blue", bold=False) rich_temp = Style(color="purple", bold=False) rich_test = Style(color="red", bold=True) # can use bgcolor to set the background color rich_prog = Style(color="white", bgcolor="#0247FE", bold=True) rich_temp = Style(color="white", bgcolor="#8601AF", bold=True) rich_test = Style(color="white", bgcolor="#C21460", bold=True) return console, rich_prog, rich_temp, rich_test console, rich_prog, rich_temp, rich_test = rich_setup() test = "testing string" console.print(f"{test}: testing", style=rich_prog) console.print(f"{test}: testing", style=rich_temp) console.print(f"{test}: testing", style=rich_test) # there are other printing options such as rule (the style applies to the rule not the text) console.rule("testing", style=rich_prog) # another function can make the print function shorter and easier to use: def rprint(printing_string: str, style=""): console.print(printing_string, style=style) rprint(f"{test}: testing", style=rich_prog) rprint(f"{test}: testing", style=rich_temp) rprint(f"{test}: testing", style=rich_test)
-
-
pdfminer.six - "extracting information from PDF documents. It focuses on getting and analyzing text data."
-
- This module is very good graphing data taken from pandas. I choose this over other options such as matplotlib or seaborne because of the vega and vega-lite backend which also have plugins/packages in both Julia and R. You can also edit them manually in a browser if needed.
- Be aware that to save the charts as png or svg files you'll need to install node.js and npm the instructions for installing this on Ubuntu/WSL/Debian are below:
# install node.js sudo apt install nodejs # check to see if it worked nodejs -v # install npm sudo apt install npm # check to see if it worked npm --v # install vega lite and vega cli canvas for saving charts into the png, svg etc. formats npm install vega-lite vega-cli canvas
-
- This module allows you to create an executable file which can then be used easily in a terminal or if you have added GUI components you can use that too. I have used it to convert the script included here into an exe that I use frequently with just calling one command.
-
- Very useful to scraping websites
- Some more advanced commands with requests can be found here
-
- I use this over networkx for any networks that are very large, otherwise networkx takes a very long time to deal with large networks
- This is a network visualization package that must cannot be installed using pip the instructions for installation can be found here
- for Ubuntu 20.04 you can use this command:
sudo sh -c "echo 'deb [ arch=amd64 ] https://downloads.skewed.de/apt focal main' >> /etc/apt/sources.list"thensudo apt-key adv --keyserver keys.openpgp.org --recv-key 612DEFB798507F25thensudo apt-get update && sudo apt-get upgrade -yand thensudo apt-get install python3-graph-tool - It seems that the better way to go about installing and using graph-tool is to use anaconda, graph-tool has spicifically been designed for installation and use on anaconda, steps to install and use anaconda on Ubuntu is included below
# use pyenv to install anaconda and then activate it globally to create the virtual environment for graph-tools pyenv install anaconda3-2020.11 pyenv global anaconda3-2020.11 # create environment conda create -n graphtool python=3.8.5 -y # activate the environment conda activate graphtool # install required packages conda install numpy pandas tqdm flake8 black isort -y pip install pyarrow zstandard sandspythonfunctions # install graph-tool conda install -c conda-forge graph-tool For using igraph add the following conda install -y -c conda-forge python-igraph
-
- This module makes logging much easier
- The loguru_template.py file is curtesy of Acea Sands
This package allows for accessing the Windows API. The installation is a bit complicated so I've included a script that does most of the installation automatically. The file is called pywin32install. Run that and follow the instructions it should finish the installation correctly.
This is a big data analysis packages that requires a bit of setup
-
You need to install the Java JDK, then you can used pip to install pyspark
# for Windows choco install jdk8 # for Ubuntu sudo apt update && sudo apt -y upgrade && sudo apt install default-jdk && java -version
-
To automatically install packages from a list you can use the either of the two example code options below
# This will load the modules and if it is not installed it will install it using pip modules = ["pathlib", "docx2txt", "pprint", "pikepdf"] def install_and_import(modules): import importlib import os for module in modules: try: importlib.import_module(module) print(module, "imported") except ImportError: print(f"installing {module}") os.system(f"pip install {module}") print(module, "installed") try: globals()[module] = importlib.import_module(module) print(module, "imported") except ImportError: print(f"There was an error while trying to install {module}") install_and_import(modules)
import sys import subprocess import pkg_resources required = {'numpy','pandas','<etc>'} installed = {pkg.key for pkg in pkg_resources.working_set} missing = required - installed if missing: # implement pip as a subprocess: subprocess.check_call([sys.executable, '-m', 'pip', 'install',*missing])
-
So it turns out that creating a package in python is a mess. They have changed the way they've wanted to do it several times. They're still kind of in flux but they're settling on some improvements and there are now a few ways to go about making the package the "better way." I decided to use poetry which is kind of like a package maker and a virtual environment manager all in one. Below are some of the things I would need to use to do all of it again.
- A link that was very useful to me in doing this, this link was helpful in learning more about packaging a python package when using poetry.
-
First up is file structure below is an example of the structure I used for my package.
- This will result in having to import like this:
import SandsPythonFunctions.ParquetFunctions as pfthen to use a function within thatParquetFunctions.pyfilepf.function() - The tests folder and the file inside I will explain in the Pytest section
PackageName/ ┣ dist/ ┃ ┣ PackageName-0.0.1a4-py3-none-any.whl ┃ ┗ PackageName-0.0.1a4.tar.gz ┣ src/ ┃ ┣ PackageName/ ┃ ┃ ┣ EmailFunctions.py ┃ ┃ ┣ ParquetFunctions.py ┃ ┃ ┗ __init__.py ┃ ┗ tests/ ┃ ┃ ┣ parquet_test_files/ ┃ ┃ ┃ ┣ SNAPPY_pandas_pyarrow_nation.parquet ┃ ┃ ┣ EmailFunctions_test.py ┃ ┃ ┗ ParquetFunctions_test.py ┣ .gitignore ┣ LICENSE ┣ poetry.lock ┣ pyproject.toml ┗ README.md - This will result in having to import like this:
-
Poetry will more or less create something like this for you if you follow their instructions here when you enter the code below it'll create a file structure and more importantly the
pyproject.tomlfile which tells poetry and python what to do with the project. -
The second line of code below will help you create the
pyproject.tomlfile with some defailts and some prompts to get it started.- More details on the
pyproject.tomlfile is located here
- More details on the
-
Once you're done then run the build command and then the
poetry installcommands to test it out in the environment you have activated.- You'll then want to follow the instructions here that I created that goes over how to use an python environment using pyenv-virtualenv and poetry.
- You can make do the build command over and over and it'll just overwrite what you had before (in the
baseDirectory/.dist) and poetry each time you install usingpoetry installright after usingpoetry buildit should reinstall the newer version that you just built for more testing
-
If you want to publish this to PyPI for easy access anywhere then you will need to do the following after setting up your PyPI account and getting a token and then enter the publish command
- You'll then enter your username and your token which you will have gotten from PyPI. You will then be able to use
pip install packageNameanywhere to use your package - Each time you upload you will need to increase the version number contained in the
pyproject.tomlfile before you upload it
# create new package/project poetry new --src my-package # help with creating the pyproject.toml poetry init # activate a virtual environment pyevn activate TestEnv # build the project poetry build # update the build of the package/project poetry install # publish the project poetry publish
- You'll then enter your username and your token which you will have gotten from PyPI. You will then be able to use
- AutoScraper - "A Smart, Automatic, Fast and Lightweight Web Scraper for Python"
- CustomTkinter - "A modern and customizable python UI-library based on Tkinter"
- DearPyGui - "Dear PyGui: A fast and powerful Graphical User Interface for Python with minimal dependencies" Has MIT licence
- Fil - for memory profiling a script
- Flet - "The fastest way to build Flutter apps in Python"
- Gooey - "Turn (almost) any Python command line program into a full GUI application with one line"
- Helium - "Selenium-python is great for web automation. Helium makes it easier to use."
- huak: "A Python package manager written in Rust inspired by Cargo."
- Hypothesis - "Hypothesis is a Python library for creating unit tests which are simpler to write and more powerful when run, finding edge cases in your code you wouldn’t have thought to look for. It is stable, powerful and easy to add to any existing test suite. It works by letting you write tests that assert that something should be true for every case, not just the ones you happen to think of."
- Jupytext - "Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts"
- makeapp - "Simplifies Python application rollout and publishing."
- MyST - "MyST allows you to write Sphinx documentation entirely in markdown."
- PandasGUI - "A GUI for Pandas DataFrames"
- PyAutoGUI - "A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard."
- PyFlow - "An installation and dependency system for Python (it looks way easier than environments)"
- PySimpleGUI - "Python GUI For Humans - Transforms tkinter, Qt, Remi, WxPython into portable people-friendly Pythonic interfaces - RealPython Article"
- PySimpleGUI - "Create GUI applications trivially with a full set of widgets."
- PySnooper - "[Y]ou just add one decorator line to the function you're interested in. You'll get a play-by-play log of your function [...]"
- PyWebIO - "Write interactive web app in script way."
- Sidetable - "Create Simple Summary Tables in Pandas"
- Tabulate - "Pretty-print tabular data in Python, a library and a command-line utility."
- Typer - "Typer, build great CLIs. Easy to code. Based on Python type hints."