Skip to content

Ahmedaltu/programming-parallel-computers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

programming-parallel-computers

An interactive learning platform for the Programming Parallel Computers course (CS-E4580) by Aalto University.

This repo turns the course material into hands-on, visual, and interactive content — making parallel programming concepts easier to understand and experiment with.

🌐 Live platform → ahmedaltu.github.io/programming-parallel-computers


What's inside

  • Interactive visualizations — diagrams, memory layouts, and performance charts you can explore in the browser
  • Code walkthroughs — step-by-step breakdowns of V0→V7 optimisations from the case study
  • Assembly analysis — annotated assembly showing what the CPU actually executes
  • Exercises — fully optimised solutions with documented reasoning

Course structure

Chapter Topic Key techniques
Chapter 1 Role of parallelism — why and how Moore's Law, latency vs throughput
Chapter 2 CPU optimisation case study — 0.6% → ~100% peak OpenMP, ILP, SIMD/AVX-512, register tiling, Z-order, prefetch
Chapter 3 Multithreading OpenMP memory model, false sharing, scheduling, atomics
Chapter 4 GPU programming CUDA V0→V4, coalescing, shared memory tiling, float4, occupancy, Nsight

Exercises

correlate — Pearson correlation matrix

Fully optimised C++ implementation targeting the course grader (AVX-512):

  • AVX-512 float16_t SIMD with 6×16 register-tiled kernel
  • Z-order (Morton) tile traversal for cache locality
  • Software prefetching
  • Two-pass normalisation
  • OpenMP parallelised over all tile pairs

is — Image segmentation

  • CPU version: 2D prefix sums reducing O(nx²·ny²·w·h) → O(nx²·ny²), OpenMP parallelised
  • GPU version: CUDA kernel with shared memory block reduction and prefix sums on device

Tech stack

Layer Tools
CPU parallelism OpenMP, AVX-512 (float16_t, ZMM registers)
GPU parallelism CUDA (nvcc), shared memory, float4 vectorised loads
Profiling Nsight Systems, Nsight Compute, perf
Language C++17, CUDA C++
Platform Aalto course grader (AVX-512), Maari GPU machines

Author

Built while studying the course — combining learning with building.

About

An interactive learning platform for the Programming Parallel Computers

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages