Prison Reform Trust Cookiecutter Data Science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work—adapted for Prison Reform Trust.

Getting started

Setting up a development environment

Before starting your project, your computer needs to be able to interpret all of this code. We have created a separate section which sets out a step-by-step guide to help make this process as straightforward as possible, with information that we have found helpful along the way.

!!! tip If you've never created a Python environment before, or you need a refresher on how we approach this seemingly straightforward task, then this section should be your starting point.

Setting up a development environment

Requirements to use the cookiecutter template

Python 3.5+
Cookiecutter Python package >= 1.4.0: We recommend that this is installed with Conda's Miniconda Python package management:

Starting a new project

Starting a new project is as easy as running this command at the command line. No need to create a directory first, the cookiecutter will do it for you.

cookiecutter https://github.com/Prison-Reform-Trust/prt-cookiecutter-data-science

Example

Now that you've got your project, you're ready to go! You should do the following:

Check out the directory structure below so you know what's in the project and how to use it.
Read the opinions that are baked into the project so you understand best practices and the philosophy behind the project structure.

Directory structure

├── LICENSE
├── Makefile           <- Makefile with commands like `make create_environment`, 
│                         `make update_environment or `make data`.
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for analysis.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project for adding documentation to this project; 
│                         see sphinx-doc.org for details.
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `conda list --export > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
|
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── analysis       <- Scripts to process raw data for analysis
│   │   └── process_data.py
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Contributing

This project is currently in development and borrows heavily on the excellent work of DrivenData and their cookiecutter-data-science template (CCDS). Which we have relied on for a considerable period. The development of this template is also indebted to EasyData and their excellent cookiecutter template which inspired the development of the PRT project, and provided a deeper understanding of how to create reproducible environments; datasets and workflows for data analysis.

As such, this project is primarily focused on adapting the project to suit our own organisational needs and isn't actively seeking contributions from outside of the organisation.

If you would like to find out more about the DrivenData cookiecutter-data-science template and how to contribute to their project then see their docs for guidelines.

Links to related projects and references

Here are some projects and blog posts that have provided a huge amount of information and guidance to inform this cookiecutter template and which you may find useful.

Cookiecutter Data Science (CCDS) - The original template on which this is based.
EasyData - A revised implementation of the CCDS template and which triggered the development of this project.
Coding for Economists - An excellent blog by Arthur Turrell covering a wide range of topics, from getting your development environment started to suggested workflows; data transformation; designing reproducible analysis and much, much more.
Government Analysis Function guidance aka The Duck Book - Another great guidance doc to draw from, which govers guiding principles; modular coding; documentation; version control and loads more.

Finally, a huge thanks to the Cookiecutter project (github), which is helping us all spend less time thinking about and writing boilerplate and more time getting things done.

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
.github/workflows		.github/workflows
docs		docs
hooks		hooks
tests		tests
{{ cookiecutter.repo_name }}		{{ cookiecutter.repo_name }}
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cookiecutter.json		cookiecutter.json
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prison Reform Trust Cookiecutter Data Science

Getting started

Setting up a development environment

Requirements to use the cookiecutter template

Starting a new project

Example

Directory structure

Contributing

Links to related projects and references

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prison Reform Trust Cookiecutter Data Science

Getting started

Setting up a development environment

Requirements to use the cookiecutter template

Starting a new project

Example

Directory structure

Contributing

Links to related projects and references

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages