Skip to content

alexhewson/prt-cookiecutter-data-science

 
 

Repository files navigation

Prison Reform Trust Cookiecutter Data Science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work—adapted for Prison Reform Trust.

Getting started

Setting up a development environment

Before starting your project, your computer needs to be able to interpret all of this code. We have created a separate section which sets out a step-by-step guide to help make this process as straightforward as possible, with information that we have found helpful along the way.

!!! tip If you've never created a Python environment before, or you need a refresher on how we approach this seemingly straightforward task, then this section should be your starting point.

Setting up a development environment

Requirements to use the cookiecutter template


Starting a new project

Starting a new project is as easy as running this command at the command line. No need to create a directory first, the cookiecutter will do it for you.

cookiecutter https://github.com/Prison-Reform-Trust/prt-cookiecutter-data-science

Example

Now that you've got your project, you're ready to go! You should do the following:

  • Check out the directory structure below so you know what's in the project and how to use it.
  • Read the opinions that are baked into the project so you understand best practices and the philosophy behind the project structure.

Directory structure

├── LICENSE
├── Makefile           <- Makefile with commands like `make create_environment`, 
│                         `make update_environment or `make data`.
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for analysis.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project for adding documentation to this project; 
│                         see sphinx-doc.org for details.
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `conda list --export > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
|
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── analysis       <- Scripts to process raw data for analysis
│   │   └── process_data.py
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Contributing

This project is currently in development and borrows heavily on the excellent work of DrivenData and their cookiecutter-data-science template (CCDS). Which we have relied on for a considerable period. The development of this template is also indebted to EasyData and their excellent cookiecutter template which inspired the development of the PRT project, and provided a deeper understanding of how to create reproducible environments; datasets and workflows for data analysis.

As such, this project is primarily focused on adapting the project to suit our own organisational needs and isn't actively seeking contributions from outside of the organisation.

If you would like to find out more about the DrivenData cookiecutter-data-science template and how to contribute to their project then see their docs for guidelines.

Links to related projects and references

Here are some projects and blog posts that have provided a huge amount of information and guidance to inform this cookiecutter template and which you may find useful.

Finally, a huge thanks to the Cookiecutter project (github), which is helping us all spend less time thinking about and writing boilerplate and more time getting things done.

About

Development of PRT cookiecutter-data-science template

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 49.7%
  • Makefile 33.6%
  • Batchfile 16.7%