Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

Useful Python Packages

As you work through a problem, you will find invaluable community developed packages at your disposal. Below are just a few packages that I have found useful.

DS_Util

DS_Util is a repository the data science team will be developing with useful functions like redshift connectors and s3 interfaces in this repository. If you want to use it you will need to:

git clone https://github.com/Ibotta/ds_util

pip install <path to cloned repository, usually ~/ds_util/>

Data Science

  • Pandas

    • Data Storage and manipulation workhorse. If you can do it in SQL you can do it in Pandas, and if you can't do it in SQL you can still do it in Pandas!

    • Built on top of Numpy, which is optimized array manipulation. Useful for working with arrays and performing aggregate operations.

  • Scikit-learn

    • Algorithms, advanced data manipulation, imputation, normalization, prediction, and all around data science.
  • Keras

    • Deep Learning building blocks. Unbelievably easy to get started with all flavors of Neural Nets!

Web Scraping

  • Requests

    • HTTP requests, to bring the html to your computer.
  • BeautifulSoup

    • If you need to parse HTML code when you web scrape, this is a great tool.

Some Useful basics worth knowing

  • os, sys, collections (DefaultDict)