Skip to content

Alessandrarye/datascience-devcontainer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data science development environment

Sync release Python scikit-learn XGBoost Plotly CUDA Docker Pulls datascience-nvidia Docker Pulls datascience-cpu Docker Pulls datascience-mac

A ready-to-use data science environment for VS Code, designed for intro Python and ML bootcamp students. Covers data visualization, data cleaning, feature engineering, and traditional machine learning. Available in three configurations: NVIDIA GPU, CPU-only, and Mac (Apple Silicon).

What's included

Package Purpose
numpy, pandas, scipy Core data science stack
scikit-learn, xgboost, statsmodels Machine learning and statistics
matplotlib, seaborn, plotly Visualization
optuna Hyperparameter optimization
jupyterlab Interactive notebooks
cupy-cuda12x GPU-accelerated arrays (NVIDIA only)

Devcontainer configurations

Configuration Image Use when
DataScience NVIDIA gperdrizet/datascience-nvidia You have an NVIDIA GPU
DataScience CPU gperdrizet/datascience-cpu CPU-only machine (any OS)
DataScience Mac gperdrizet/datascience-mac Apple Silicon Mac (M1/M2/M3)

Project structure

datascience-devcontainer/
├── .devcontainer/
│   ├── nvidia/
│   │   └── devcontainer.json   # NVIDIA GPU dev container configuration
│   ├── cpu/
│   │   └── devcontainer.json   # CPU dev container configuration
│   └── mac/
│       └── devcontainer.json   # Mac (ARM64) dev container configuration
├── data/                       # Store datasets here
├── notebooks/
│   └── environment_test.ipynb  # Verify your setup
├── .gitignore
├── LICENSE
└── README.md

Requirements

NVIDIA configuration (additional requirements)

  • NVIDIA GPU (Pascal or newer) with driver ≥570
  • NVIDIA Container Toolkit (Linux): install guide

Mac configuration

Note: GPU acceleration is not available inside Docker containers on Apple Silicon. Metal/MPS is a macOS-only framework with no Docker passthrough. The Mac configuration provides native ARM64 CPU performance.

GPU compatibility (NVIDIA)

Architecture Example GPUs Compute Capability
Pascal GTX 1050-1080, Tesla P100 6.0-6.1
Volta Tesla V100, Titan V 7.0
Turing RTX 2060-2080, GTX 1660 7.5
Ampere RTX 3060-3090, A100 8.0-8.6
Ada Lovelace RTX 4060-4090 8.9
Hopper H100, H200 9.0
Blackwell RTX 5070-5090, B100, B200 10.0

Quick start

  1. Fork this repository (click "Fork" button above)

  2. Clone your fork:

    git clone https://github.com/<your-username>/datascience-devcontainer.git
  3. Open VS Code

  4. Open Folder in Container from the VS Code command palette (Ctrl+Shift+P), start typing Open Folder in...

    VS Code will prompt you to choose a devcontainer configuration. Select the one that matches your hardware.

  5. Verify by running notebooks/environment_test.ipynb

Using as a template for new projects

One-time setup: Make your fork a template

  1. Go to your fork on GitHub
  2. Click Settings → scroll to Template repository
  3. Check the box to enable it

Creating a new project from your template

  1. Go to your fork on GitHub
  2. Click the green Use this template button → Create a new repository
  3. Enter your new repository name and settings, click Create repository
  4. Clone your new repository:
    git clone https://github.com/<your-username>/my-new-project.git

Now you have a fresh data science project with the dev container configuration ready to go!

Adding Python packages

Using pip directly

Install packages in the container terminal:

pip install <package-name>

Note: Packages installed this way will be lost when the container is rebuilt.

Using requirements.txt (recommended)

  1. Create a requirements.txt file in the repository root:

    lightgbm
    shap
    
  2. Update the appropriate devcontainer.json to install packages on container creation:

    "postCreateCommand": "pip install -r requirements.txt"
  3. Rebuild the container (F1 → "Dev Containers: Rebuild Container")

Keeping your fork updated

# Add upstream (once)
git remote add upstream https://github.com/gperdrizet/datascience-devcontainer.git

# Sync
git fetch upstream
git merge upstream/main

Troubleshooting

Problem Solution
Docker won't start Enable virtualization in BIOS
Permission denied (Linux) Add user to docker group, then log out/in
GPU not detected Update NVIDIA drivers (≥570), install NVIDIA Container Toolkit
Container build fails Check internet connection
Module not found Rebuild container after adding to requirements.txt

About

Containerized development environment for data science projects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 100.0%