| title | TorchCode |
|---|---|
| emoji | 🔥 |
| colorFrom | red |
| colorTo | yellow |
| sdk | docker |
| app_port | 7860 |
| pinned | false |
Crack the PyTorch interview.
Practice implementing operators and architectures from scratch — the exact skills top ML teams test for.
Like LeetCode, but for tensors. Self-hosted. Jupyter-based. Instant feedback.
duoan/TorchCode のフォークをベースにした自作練習問題集。 PyTorch で CNN 学習の主要トピックを W2-W5 の 4 週分に整理した 29 問を全問日本語化。本家の 40 問に加え、典型的な CNN 学習レシピ(pooling / augmentation / 評価指標 / 現代 optimizer 系)に直結する 16 問を spec-driven 生成インフラ と一緒に追加。
| Week | テーマ | 問題数 | フォルダ | 1問目を Colab で開く |
|---|---|---|---|---|
| W2 | MLP / 基本分類 / 基礎 optimization | 8 | practice/W2/ |
|
| W3 | 正則化 / 正規化 / advanced optimization | 9 | practice/W3/ |
|
| W4 | CNN 基礎 + 基本 transforms | 7 | practice/W4/ |
|
| W5 | CIFAR-10 advanced レシピ | 5 | practice/W5/ |
各週フォルダの README.md に 学習順 で問題リスト、各 .ipynb は実装 → check("...") で自動採点(5 テスト/問、計 145 テスト)。1 問目以外を開きたい時は各週の README から任意の .ipynb の Colab badge をクリック。
Colab で(推奨・セットアップ不要):
practice/W{n}/ 配下の .ipynb 右上の Colab badge をクリック → Run All → ✏️ セルに実装を書く → 最後の check("...") セルで採点。
ローカル(Docker / JupyterLab):
make run # Docker 起動
# ブラウザで http://localhost:8888 → practice/W2/ に移動変更タイプごとに対応する再生成スクリプトを走らせる:
| やりたいこと | 編集対象 | 再生成コマンド |
|---|---|---|
| 新規問題を追加 | problem_specs/{id}.py を新規作成 |
python scripts/build.py --verify |
| spec 問題(#41-56)の説明/テスト/解答を修正 | problem_specs/{id}.py |
python scripts/build.py --verify |
| upstream 問題(#01-40)の intro 修正 | templates/{file}.ipynb の cell 0 |
python scripts/sync_solutions.py |
| 週マッピング変更 | scripts/week_mapping.py |
python scripts/build_weeks.py |
| 週フォルダ完全リセット(in-progress 破棄) | (上記) | python scripts/build_weeks.py --reset |
| 全 56 解答の健全性チェック | (なし) | python scripts/verify_all_solutions.py |
このリポジトリは spec-driven (16 問) と upstream-hand-written (40 問) のハイブリッド。編集禁止のファイルを直接いじると次の再生成で消える。
| ソース(編集 OK) | 生成物(編集禁止、再生成される) |
|---|---|
problem_specs/*.py (16 問) |
torch_judge/tasks/{id}.py + templates/{4,5}*.ipynb + solutions/{4,5}*_solution.ipynb |
templates/0*-40_*.ipynb (40 既存) cell 0 |
対応する solutions/*_solution.ipynb の cell 0 (intro 部分のみ、code は upstream のまま) |
scripts/week_mapping.py |
practice/W{n}/, practice/W{n}/README.md, practice/README.md |
大きな変更後は verify_all_solutions.py で 56 解答が全 pass することを確認するのが安全。
詳細は下記 Architecture / Adding Your Own Problems も参照。
本家 duoan/TorchCode は MIT License で公開されている。本フォークもそれを継承し MIT License で公開する。フォーク独自の追加・改変部分も MIT で利用可能。詳細は LICENSE を参照。
以下は 本家 TorchCode の README(英語、56 問全体の解説)。フォーク独自の追加問題は #41 以降。
Top companies (Meta, Google DeepMind, OpenAI, etc.) expect ML engineers to implement core operations from memory on a whiteboard. Reading papers isn't enough — you need to write softmax, LayerNorm, MultiHeadAttention, and full Transformer blocks code.
TorchCode gives you a structured practice environment with:
No cloud. No signup. No GPU needed. Just make run — or try it instantly on Hugging Face.
Launch on Hugging Face Spaces — opens a full JupyterLab environment in your browser. Nothing to install.
Or open any problem directly in Google Colab — every notebook has an badge.
In Google Colab, install the judge from this fork's git URL (so you get the full task set, including the additions in this fork that aren't on the upstream PyPI package):
!pip install -q --force-reinstall --no-deps git+https://github.com/alextfkd/TorchCode.git(The notebook templates already have this install cell at the top — just Run All in Colab.)
Then in a notebook cell:
from torch_judge import check, status, hint, reset_progress
status() # list all problems and your progress
check("relu") # run tests for the "relu" task
hint("relu") # show a hintdocker run -p 8888:8888 -e PORT=8888 ghcr.io/duoan/torchcode:latestIf the registry image is unavailable for your platform, use Option 2 instead. This is the common path on Apple Silicon / arm64.
make runmake run will try the prebuilt image first and automatically fall back to a local build when needed.
Open http://localhost:8888 — that's it. Works with both Docker and Podman (auto-detected).
Frequency: 🔥 = very likely in interviews, ⭐ = commonly asked, 💡 = emerging / differentiator
The bread and butter of ML coding interviews. You'll be asked to write these without torch.nn.
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 1 | ReLU |
relu(x) |
🔥 | Activation functions, element-wise ops | |
| 2 | Softmax |
my_softmax(x, dim) |
🔥 | Numerical stability, exp/log tricks | |
| 16 | Cross-Entropy Loss |
cross_entropy_loss(logits, targets) |
🔥 | Log-softmax, logsumexp trick | |
| 17 | Dropout |
MyDropout (nn.Module) |
🔥 | Train/eval mode, inverted scaling | |
| 18 | Embedding |
MyEmbedding (nn.Module) |
🔥 | Lookup table, weight[indices] |
|
| 19 | GELU |
my_gelu(x) |
⭐ | Gaussian error linear unit, torch.erf |
|
| 20 | Kaiming Init |
kaiming_init(weight) |
⭐ | std = sqrt(2/fan_in), variance scaling |
|
| 21 | Gradient Clipping |
clip_grad_norm(params, max_norm) |
⭐ | Norm-based clipping, direction preservation | |
| 31 | Gradient Accumulation |
accumulated_step(model, opt, ...) |
💡 | Micro-batching, loss scaling | |
| 40 | Linear Regression |
LinearRegression (3 methods) |
🔥 | Normal equation, GD from scratch, nn.Linear | |
| 3 | Linear Layer |
SimpleLinear (nn.Module) |
🔥 | y = xW^T + b, Kaiming init, nn.Parameter |
|
| 4 | LayerNorm |
my_layer_norm(x, γ, β) |
🔥 | Normalization, running stats, affine transform | |
| 7 | BatchNorm |
my_batch_norm(x, γ, β) |
⭐ | Batch vs layer statistics, train/eval behavior | |
| 8 | RMSNorm |
rms_norm(x, weight) |
⭐ | LLaMA-style norm, simpler than LayerNorm | |
| 15 | SwiGLU MLP |
SwiGLUMLP (nn.Module) |
⭐ | Gated FFN, SiLU(gate) * up, LLaMA/Mistral-style |
|
| 22 | Conv2d |
my_conv2d(x, weight, ...) |
🔥 | Convolution, unfold, stride/padding | |
| 41 | 2D Max Pooling |
my_max_pool2d(x, k, stride, padding) |
🔥 | Unfold + amax, pad with -inf for negative inputs |
|
| 49 | 2D Average Pooling |
my_avg_pool2d(x, k, stride, padding) |
⭐ | Unfold + mean, count_include_pad=True default |
|
| 50 | Global Average Pooling |
global_avg_pool(x) |
🔥 | Mean over (H, W), ResNet/MobileNet head replacing FC | |
| 51 | Label Smoothing CE |
label_smoothing_ce(logits, targets, ε) |
⭐ | Smoothed target dist, modern training recipe | |
| 52 | Top-k Accuracy |
top_k_accuracy(logits, targets, k) |
🔥 | topk indices + any, ImageNet eval standard |
|
| 53 | NLL Loss |
my_nll_loss(log_probs, targets) |
⭐ | Advanced indexing, CE = log_softmax + NLL |
If you're interviewing for any role touching LLMs or Transformers, expect at least one of these.
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 23 | Cross-Attention |
MultiHeadCrossAttention (nn.Module) |
⭐ | Encoder-decoder, Q from decoder, K/V from encoder | |
| 5 | Scaled Dot-Product Attention |
scaled_dot_product_attention(Q, K, V) |
🔥 | softmax(QK^T/√d_k)V, the foundation of everything |
|
| 6 | Multi-Head Attention |
MultiHeadAttention (nn.Module) |
🔥 | Parallel heads, split/concat, projection matrices | |
| 9 | Causal Self-Attention |
causal_attention(Q, K, V) |
🔥 | Autoregressive masking with -inf, GPT-style |
|
| 10 | Grouped Query Attention |
GroupQueryAttention (nn.Module) |
⭐ | GQA (LLaMA 2), KV sharing across heads | |
| 11 | Sliding Window Attention |
sliding_window_attention(Q, K, V, w) |
⭐ | Mistral-style local attention, O(n·w) complexity | |
| 12 | Linear Attention |
linear_attention(Q, K, V) |
💡 | Kernel trick, φ(Q)(φ(K)^TV), O(n·d²) |
|
| 14 | KV Cache Attention |
KVCacheAttention (nn.Module) |
🔥 | Incremental decoding, cache K/V, prefill vs decode | |
| 24 | RoPE |
apply_rope(q, k) |
🔥 | Rotary position embedding, relative position via rotation | |
| 25 | Flash Attention |
flash_attention(Q, K, V, block_size) |
💡 | Tiled attention, online softmax, memory-efficient |
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 26 | LoRA |
LoRALinear (nn.Module) |
⭐ | Low-rank adaptation, frozen base + BA update |
|
| 27 | ViT Patch Embedding |
PatchEmbedding (nn.Module) |
💡 | Image → patches → linear projection | |
| 13 | GPT-2 Block |
GPT2Block (nn.Module) |
⭐ | Pre-norm, causal MHA + MLP (4x, GELU), residual connections | |
| 28 | Mixture of Experts |
MixtureOfExperts (nn.Module) |
⭐ | Mixtral-style, top-k routing, expert MLPs |
The data side of the recipe. Together with normalization + cosine LR, these turn a baseline CNN into a competitive one.
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 42 | Per-Channel Normalize |
my_normalize(x, mean, std) |
🔥 | Channel-wise (x − μ) / σ, broadcast to (C, 1, 1) |
|
| 43 | Random Horizontal Flip |
random_horizontal_flip(x, p) |
🔥 | Per-sample Bernoulli mask + torch.flip |
|
| 44 | Random Crop with Padding |
random_crop(x, size, padding) |
🔥 | F.pad + per-sample random offset slice |
|
| 45 | Cutout / RandomErasing |
cutout(x, size) |
⭐ | Random rectangle zero-mask, DeVries 2017 | |
| 46 | Mixup |
mixup(x, y, α) |
⭐ | Beta(α, α), 4-tuple (x_mix, y_a, y_b, lam) interface |
|
| 47 | CutMix |
cutmix(x, y, α) |
⭐ | Area-based λ recomputed after boundary clipping | |
| 48 | TTA (Horizontal Flip) |
tta_hflip(model, x) |
💡 | Probability-space averaging, free 0.3–1% bump |
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 29 | Adam Optimizer |
MyAdam |
⭐ | Momentum + RMSProp, bias correction | |
| 30 | Cosine LR Scheduler |
cosine_lr_schedule(step, ...) |
⭐ | Linear warmup + cosine annealing | |
| 54 | SGD with Momentum |
MySGDMomentum |
🔥 | v = μ·v + g (PyTorch convention — no (1−μ) factor) |
|
| 55 | Weight Decay (L2) |
apply_weight_decay(params, wd) |
⭐ | g += wd·p, compare with decoupled WD (#56) |
|
| 56 | AdamW |
MyAdamW |
🔥 | Decoupled WD: p *= (1 − lr·λ), Transformer default |
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 32 | Top-k / Top-p Sampling |
sample_top_k_top_p(logits, ...) |
🔥 | Nucleus sampling, temperature scaling | |
| 33 | Beam Search |
beam_search(log_prob_fn, ...) |
🔥 | Hypothesis expansion, pruning, eos handling | |
| 34 | Speculative Decoding |
speculative_decode(target, draft, ...) |
💡 | Accept/reject, draft model acceleration |
| # | Problem | What You'll Implement | Difficulty | Freq | Key Concepts |
|---|---|---|---|---|---|
| 35 | BPE Tokenizer |
SimpleBPE |
💡 | Byte-pair encoding, merge rules, subword splits | |
| 36 | INT8 Quantization |
Int8Linear (nn.Module) |
💡 | Per-channel quantize, scale/zero-point, buffer vs param | |
| 37 | DPO Loss |
dpo_loss(chosen, rejected, ...) |
💡 | Direct preference optimization, alignment training | |
| 38 | GRPO Loss |
grpo_loss(logps, rewards, group_ids, eps) |
💡 | Group relative policy optimization, RLAIF, within-group normalized advantages | |
| 39 | PPO Loss |
ppo_loss(new_logps, old_logps, advantages, clip_ratio) |
💡 | PPO clipped surrogate loss, policy gradient, trust region |
Each problem has two notebooks:
| File | Purpose |
|---|---|
01_relu.ipynb |
✏️ Blank template — write your code here |
01_relu_solution.ipynb |
📖 Reference solution — check when stuck |
1. Open a blank notebook → Read the problem description
2. Implement your solution → Use only basic PyTorch ops
3. Debug freely → print(x.shape), check gradients, etc.
4. Run the judge cell → check("relu")
5. See instant colored feedback → ✅ pass / ❌ fail per test case
6. Stuck? Get a nudge → hint("relu")
7. Review the reference solution → 01_relu_solution.ipynb
8. Click 🔄 Reset in the toolbar → Blank slate — practice again!
from torch_judge import check, hint, status
check("relu") # Judge your implementation
hint("causal_attention") # Get a hint without full spoiler
status() # Progress dashboard — solved / attempted / todoTotal: ~12–16 hours spread across 3–4 weeks. Perfect for interview prep on a deadline.
| Week | Focus | Problems | Time |
|---|---|---|---|
| 1 | 🧱 Foundations | ReLU → Softmax → CE Loss → Dropout → Embedding → GELU → Linear → LayerNorm → BatchNorm → RMSNorm → SwiGLU MLP → Conv2d | 2–3 hrs |
| 2 | 🧠 Attention Deep Dive | SDPA → MHA → Cross-Attn → Causal → GQA → KV Cache → Sliding Window → RoPE → Linear Attn → Flash Attn | 3–4 hrs |
| 3 | 🏗️ Architecture + Training | GPT-2 Block → LoRA → MoE → ViT Patch → Adam → Cosine LR → Grad Clip → Grad Accumulation → Kaiming Init | 3–4 hrs |
| 4 | 🎯 Inference + Advanced | Top-k/p Sampling → Beam Search → Speculative Decoding → BPE → INT8 Quant → DPO Loss → GRPO Loss → PPO Loss + speed run | 3–4 hrs |
┌──────────────────────────────────────────┐
│ Docker / Podman Container │
│ │
│ JupyterLab (:8888) │
│ ├── templates/ (reset on each run) │
│ ├── solutions/ (reference impl) │
│ ├── torch_judge/ (auto-grading) │
│ ├── torchcode-labext (JLab plugin) │
│ │ 🔄 Reset — restore template │
│ │ 🔗 Colab — open in Colab │
│ └── PyTorch (CPU), NumPy │
│ │
│ Judge checks: │
│ ✓ Output correctness (allclose) │
│ ✓ Gradient flow (autograd) │
│ ✓ Shape consistency │
│ ✓ Edge cases & numerical stability │
└──────────────────────────────────────────┘
Single container. Single port. No database. No frontend framework. No GPU.
make run # Build & start (http://localhost:8888)
make stop # Stop the container
make clean # Stop + remove volumes + reset all progressTorchCode uses auto-discovery — just drop a new file in torch_judge/tasks/:
TASK = {
"id": "my_task",
"title": "My Custom Problem",
"difficulty": "medium",
"function_name": "my_function",
"hint": "Think about broadcasting...",
"tests": [ ... ],
}No registration needed. The judge picks it up automatically.
The judge is published as a separate package so Colab/users can pip install torch-judge without cloning the repo.
Pushing to master after changing the package version triggers .github/workflows/pypi-publish.yml, which builds and uploads to PyPI. No git tag is required.
- Bump version in
torch_judge/_version.py(e.g.__version__ = "0.1.1"). - Configure PyPI Trusted Publisher (one-time):
- PyPI → Your project torch-judge → Publishing → Add a new pending publisher
- Owner:
duoan, Repository:TorchCode, Workflow:pypi-publish.yml, Environment: (leave empty) - Run the workflow once (push a version bump to
masteror Actions → Publish torch-judge to PyPI → Run workflow); PyPI will then link the publisher.
- Release: commit the version bump and
git push origin master.
Alternatively, use an API token: add repository secret PYPI_API_TOKEN (value = pypi-... from PyPI) and set TWINE_USERNAME=__token__ and TWINE_PASSWORD from that secret in the workflow if you prefer not to use Trusted Publishing.
pip install build twine
python -m build
twine upload dist/*Version is in torch_judge/_version.py; bump it before each release.
Do I need a GPU?
No. Everything runs on CPU. The problems test correctness and understanding, not throughput.
Can I keep my solutions between runs?
Blank templates reset on every
make run so you practice from scratch. Save your work under a different filename if you want to keep it. You can also click the 🔄 Reset button in the notebook toolbar at any time to restore the blank template without restarting.
Can I use Google Colab instead?
Yes! Every notebook has an Open in Colab badge at the top. Click it to open the problem directly in Google Colab — no Docker or local setup needed. You can also use the Colab toolbar button inside JupyterLab.
How are solutions graded?
The judge runs your function against multiple test cases using
torch.allclose for numerical correctness, verifies gradients flow properly via autograd, and checks edge cases specific to each operation.
Who is this for?
Anyone preparing for ML/AI engineering interviews at top tech companies, or anyone who wants to deeply understand how PyTorch operations work under the hood.
Thanks to everyone who has contributed to TorchCode.
|
duoan |
Ando233 |
ThierryHJ |
Auto-generated from the GitHub contributors graph with avatars and GitHub usernames.
