Skip to content

Latest commit

 

History

History
253 lines (218 loc) · 15.8 KB

File metadata and controls

253 lines (218 loc) · 15.8 KB

Useful Python Modules and Packages

Random Modules and Packages

  • docx2txt

    • This module converts docx files into plain text then you can print it out or write it to a txt file etc
      • Sometimes very handy for student papers to all be in the same format
    • pip install docx2txt
    • see this script for an implementation that takes all docx files in the local folder and converts them to txt files
    • Be aware that if a document was created using office online this package cannot yet convert it. This will hopefully addressed soon (as of Nov 2020) the issue already has a temporary fix that can be used if needed. Progress on this issue can be tracked here.
    • An alternative (docx2python) can be found here. Another alternative (pypandoc) can be found here
    text = docx2txt.process(file)
  • PrettyErrors - "Prettify Python exception output to make it legible." Just add import pretty_errors to the file and errors will be much more readable.

  • Rich: Rich is a Python library for rich text and beautiful formatting in the terminal.

    • This package makes it easy to print output that has different formatting. For example if you want to print output that has red text do the following:

      # simple example
      from rich.console import Console
      
      console = Console()
      console.print("Printing in red text using rich", style="red")
      
      # or in the form of a function with some different style options shown
      def rich_setup():
          from rich.console import Console
          from rich.style import Style
          
          console = Console()
          # just the words will have the color, can use color names or hex (and others)
          rich_prog = Style(color="blue", bold=False)
          rich_temp = Style(color="purple", bold=False)
          rich_test = Style(color="red", bold=True)
          # can use bgcolor to set the background color
          rich_prog = Style(color="white", bgcolor="#0247FE", bold=True)
          rich_temp = Style(color="white", bgcolor="#8601AF", bold=True)
          rich_test = Style(color="white", bgcolor="#C21460", bold=True)
          return console, rich_prog, rich_temp, rich_test
      console, rich_prog, rich_temp, rich_test = rich_setup()
      test = "testing string"
      console.print(f"{test}: testing", style=rich_prog)
      console.print(f"{test}: testing", style=rich_temp)
      console.print(f"{test}: testing", style=rich_test)
      # there are other printing options such as rule (the style applies to the rule not the text)
      console.rule("testing", style=rich_prog)
      
      # another function can make the print function shorter and easier to use:
      def rprint(printing_string: str, style=""):
          console.print(printing_string, style=style)
      rprint(f"{test}: testing", style=rich_prog)
      rprint(f"{test}: testing", style=rich_temp)
      rprint(f"{test}: testing", style=rich_test)
  • pdfminer.six - "extracting information from PDF documents. It focuses on getting and analyzing text data."

  • Altair

    • This module is very good graphing data taken from pandas. I choose this over other options such as matplotlib or seaborne because of the vega and vega-lite backend which also have plugins/packages in both Julia and R. You can also edit them manually in a browser if needed.
    • Be aware that to save the charts as png or svg files you'll need to install node.js and npm the instructions for installing this on Ubuntu/WSL/Debian are below:
    # install node.js
    sudo apt install nodejs
    # check to see if it worked
    nodejs -v
    # install npm
    sudo apt install npm
    # check to see if it worked
    npm --v
    # install vega lite and vega cli canvas for saving charts into the png, svg etc. formats
    npm install vega-lite vega-cli canvas
  • Pyinstaller

    • This module allows you to create an executable file which can then be used easily in a terminal or if you have added GUI components you can use that too. I have used it to convert the script included here into an exe that I use frequently with just calling one command.
  • Requests

    • Very useful to scraping websites
    • Some more advanced commands with requests can be found here
  • Graph-tool

    • I use this over networkx for any networks that are very large, otherwise networkx takes a very long time to deal with large networks
    • This is a network visualization package that must cannot be installed using pip the instructions for installation can be found here
    • for Ubuntu 20.04 you can use this command: sudo sh -c "echo 'deb [ arch=amd64 ] https://downloads.skewed.de/apt focal main' >> /etc/apt/sources.list" then sudo apt-key adv --keyserver keys.openpgp.org --recv-key 612DEFB798507F25 then sudo apt-get update && sudo apt-get upgrade -y and then sudo apt-get install python3-graph-tool
    • It seems that the better way to go about installing and using graph-tool is to use anaconda, graph-tool has spicifically been designed for installation and use on anaconda, steps to install and use anaconda on Ubuntu is included below
    # use pyenv to install anaconda and then activate it globally to create the virtual environment for graph-tools
    pyenv install anaconda3-2020.11
    pyenv global anaconda3-2020.11
    # create environment
    conda create -n graphtool python=3.8.5 -y
    # activate the environment
    conda activate graphtool
    # install required packages
    conda install numpy pandas tqdm flake8 black isort -y
    pip install pyarrow zstandard sandspythonfunctions
    # install graph-tool
    conda install -c conda-forge graph-tool
    
    For using igraph add the following
    conda install -y -c conda-forge python-igraph
  • Loguru

    • This module makes logging much easier
    • The loguru_template.py file is curtesy of Acea Sands

pywin32

This package allows for accessing the Windows API. The installation is a bit complicated so I've included a script that does most of the installation automatically. The file is called pywin32install. Run that and follow the instructions it should finish the installation correctly.

PySpark

This is a big data analysis packages that requires a bit of setup

  • You need to install the Java JDK, then you can used pip to install pyspark

    # for Windows
    choco install jdk8
    # for Ubuntu
    sudo apt update && sudo apt -y upgrade && sudo apt install default-jdk && java -version

Install Python Modules Within a Script

  • To automatically install packages from a list you can use the either of the two example code options below

    # This will load the modules and if it is not installed it will install it using pip
    modules = ["pathlib", "docx2txt", "pprint", "pikepdf"]
    def install_and_import(modules):
        import importlib
        import os
        for module in modules:
            try:
                importlib.import_module(module)
                print(module, "imported")
            except ImportError:
                print(f"installing {module}")
                os.system(f"pip install {module}")
                print(module, "installed")
                try:
                    globals()[module] = importlib.import_module(module)
                    print(module, "imported")
                except ImportError:
                    print(f"There was an error while trying to install {module}")
    install_and_import(modules)
    import sys
    import subprocess
    import pkg_resources
    
    required = {'numpy','pandas','<etc>'} 
    installed = {pkg.key for pkg in pkg_resources.working_set}
    missing = required - installed
    if missing:
        # implement pip as a subprocess:
        subprocess.check_call([sys.executable, '-m', 'pip', 'install',*missing])

Creating a Local Package

  • So it turns out that creating a package in python is a mess. They have changed the way they've wanted to do it several times. They're still kind of in flux but they're settling on some improvements and there are now a few ways to go about making the package the "better way." I decided to use poetry which is kind of like a package maker and a virtual environment manager all in one. Below are some of the things I would need to use to do all of it again.

  • First up is file structure below is an example of the structure I used for my package.

    • This will result in having to import like this: import SandsPythonFunctions.ParquetFunctions as pf then to use a function within that ParquetFunctions.py file pf.function()
    • The tests folder and the file inside I will explain in the Pytest section
    PackageName/
    ┣ dist/
    ┃ ┣ PackageName-0.0.1a4-py3-none-any.whl
    ┃ ┗ PackageName-0.0.1a4.tar.gz
    ┣ src/
    ┃ ┣ PackageName/
    ┃ ┃ ┣ EmailFunctions.py
    ┃ ┃ ┣ ParquetFunctions.py
    ┃ ┃ ┗ __init__.py
    ┃ ┗ tests/
    ┃ ┃ ┣ parquet_test_files/
    ┃ ┃ ┃ ┣ SNAPPY_pandas_pyarrow_nation.parquet
    ┃ ┃ ┣ EmailFunctions_test.py
    ┃ ┃ ┗ ParquetFunctions_test.py
    ┣ .gitignore
    ┣ LICENSE
    ┣ poetry.lock
    ┣ pyproject.toml
    ┗ README.md
    
  • Poetry will more or less create something like this for you if you follow their instructions here when you enter the code below it'll create a file structure and more importantly the pyproject.toml file which tells poetry and python what to do with the project.

  • The second line of code below will help you create the pyproject.toml file with some defailts and some prompts to get it started.

    • More details on the pyproject.toml file is located here
  • Once you're done then run the build command and then the poetry install commands to test it out in the environment you have activated.

    • You'll then want to follow the instructions here that I created that goes over how to use an python environment using pyenv-virtualenv and poetry.
    • You can make do the build command over and over and it'll just overwrite what you had before (in the baseDirectory/.dist) and poetry each time you install using poetry install right after using poetry build it should reinstall the newer version that you just built for more testing
  • If you want to publish this to PyPI for easy access anywhere then you will need to do the following after setting up your PyPI account and getting a token and then enter the publish command

    • You'll then enter your username and your token which you will have gotten from PyPI. You will then be able to use pip install packageName anywhere to use your package
    • Each time you upload you will need to increase the version number contained in the pyproject.toml file before you upload it
    # create new package/project
    poetry new --src my-package
    # help with creating the pyproject.toml
    poetry init
    # activate a virtual environment
    pyevn activate TestEnv
    # build the project
    poetry build
    # update the build of the package/project
    poetry install
    # publish the project
    poetry publish

Modules to Look Into

  • AutoScraper - "A Smart, Automatic, Fast and Lightweight Web Scraper for Python"
  • CustomTkinter - "A modern and customizable python UI-library based on Tkinter"
  • DearPyGui - "Dear PyGui: A fast and powerful Graphical User Interface for Python with minimal dependencies" Has MIT licence
  • Fil - for memory profiling a script
  • Flet - "The fastest way to build Flutter apps in Python"
  • Gooey - "Turn (almost) any Python command line program into a full GUI application with one line"
  • Helium - "Selenium-python is great for web automation. Helium makes it easier to use."
  • huak: "A Python package manager written in Rust inspired by Cargo."
  • Hypothesis - "Hypothesis is a Python library for creating unit tests which are simpler to write and more powerful when run, finding edge cases in your code you wouldn’t have thought to look for. It is stable, powerful and easy to add to any existing test suite. It works by letting you write tests that assert that something should be true for every case, not just the ones you happen to think of."
  • Jupytext - "Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts"
  • makeapp - "Simplifies Python application rollout and publishing."
  • MyST - "MyST allows you to write Sphinx documentation entirely in markdown."
  • PandasGUI - "A GUI for Pandas DataFrames"
  • PyAutoGUI - "A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard."
  • PyFlow - "An installation and dependency system for Python (it looks way easier than environments)"
  • PySimpleGUI - "Python GUI For Humans - Transforms tkinter, Qt, Remi, WxPython into portable people-friendly Pythonic interfaces - RealPython Article"
  • PySimpleGUI - "Create GUI applications trivially with a full set of widgets."
  • PySnooper - "[Y]ou just add one decorator line to the function you're interested in. You'll get a play-by-play log of your function [...]"
  • PyWebIO - "Write interactive web app in script way."
  • Sidetable - "Create Simple Summary Tables in Pandas"
  • Tabulate - "Pretty-print tabular data in Python, a library and a command-line utility."
  • Typer - "Typer, build great CLIs. Easy to code. Based on Python type hints."