A ready-to-use data science environment for VS Code, designed for intro Python and ML bootcamp students. Covers data visualization, data cleaning, feature engineering, and traditional machine learning. Available in three configurations: NVIDIA GPU, CPU-only, and Mac (Apple Silicon).
| Package | Purpose |
|---|---|
| numpy, pandas, scipy | Core data science stack |
| scikit-learn, xgboost, statsmodels | Machine learning and statistics |
| matplotlib, seaborn, plotly | Visualization |
| optuna | Hyperparameter optimization |
| jupyterlab | Interactive notebooks |
| cupy-cuda12x | GPU-accelerated arrays (NVIDIA only) |
| Configuration | Image | Use when |
|---|---|---|
| DataScience NVIDIA | gperdrizet/datascience-nvidia |
You have an NVIDIA GPU |
| DataScience CPU | gperdrizet/datascience-cpu |
CPU-only machine (any OS) |
| DataScience Mac | gperdrizet/datascience-mac |
Apple Silicon Mac (M1/M2/M3) |
datascience-devcontainer/
├── .devcontainer/
│ ├── nvidia/
│ │ └── devcontainer.json # NVIDIA GPU dev container configuration
│ ├── cpu/
│ │ └── devcontainer.json # CPU dev container configuration
│ └── mac/
│ └── devcontainer.json # Mac (ARM64) dev container configuration
├── data/ # Store datasets here
├── notebooks/
│ └── environment_test.ipynb # Verify your setup
├── .gitignore
├── LICENSE
└── README.md
- Docker (Windows | Linux)
- VS Code with the Dev Containers extension
- NVIDIA GPU (Pascal or newer) with driver ≥570
- NVIDIA Container Toolkit (Linux): install guide
- Docker Desktop for Mac (Apple Silicon): install guide
Note: GPU acceleration is not available inside Docker containers on Apple Silicon. Metal/MPS is a macOS-only framework with no Docker passthrough. The Mac configuration provides native ARM64 CPU performance.
| Architecture | Example GPUs | Compute Capability |
|---|---|---|
| Pascal | GTX 1050-1080, Tesla P100 | 6.0-6.1 |
| Volta | Tesla V100, Titan V | 7.0 |
| Turing | RTX 2060-2080, GTX 1660 | 7.5 |
| Ampere | RTX 3060-3090, A100 | 8.0-8.6 |
| Ada Lovelace | RTX 4060-4090 | 8.9 |
| Hopper | H100, H200 | 9.0 |
| Blackwell | RTX 5070-5090, B100, B200 | 10.0 |
-
Fork this repository (click "Fork" button above)
-
Clone your fork:
git clone https://github.com/<your-username>/datascience-devcontainer.git
-
Open VS Code
-
Open Folder in Container from the VS Code command palette (
Ctrl+Shift+P), start typingOpen Folder in...VS Code will prompt you to choose a devcontainer configuration. Select the one that matches your hardware.
-
Verify by running
notebooks/environment_test.ipynb
- Go to your fork on GitHub
- Click Settings → scroll to Template repository
- Check the box to enable it
- Go to your fork on GitHub
- Click the green Use this template button → Create a new repository
- Enter your new repository name and settings, click Create repository
- Clone your new repository:
git clone https://github.com/<your-username>/my-new-project.git
Now you have a fresh data science project with the dev container configuration ready to go!
Install packages in the container terminal:
pip install <package-name>Note: Packages installed this way will be lost when the container is rebuilt.
-
Create a
requirements.txtfile in the repository root:lightgbm shap -
Update the appropriate
devcontainer.jsonto install packages on container creation:"postCreateCommand": "pip install -r requirements.txt"
-
Rebuild the container (
F1→ "Dev Containers: Rebuild Container")
# Add upstream (once)
git remote add upstream https://github.com/gperdrizet/datascience-devcontainer.git
# Sync
git fetch upstream
git merge upstream/main| Problem | Solution |
|---|---|
| Docker won't start | Enable virtualization in BIOS |
| Permission denied (Linux) | Add user to docker group, then log out/in |
| GPU not detected | Update NVIDIA drivers (≥570), install NVIDIA Container Toolkit |
| Container build fails | Check internet connection |
| Module not found | Rebuild container after adding to requirements.txt |