High Performance Computing Course Project (CS610)

Course: CS610 – High Performance Computing
Instructor: Prof. Swarnendu Biswas
Institute: IIT Kanpur
Duration: Aug 2025 – Dec 2025

This repository contains optimized CPU and GPU implementations of core numerical workloads, focusing on instruction-level, thread-level, and accelerator-based parallelism.

🚀 Implemented Projects

1. Serial Matrix Multiplication with AVX2 & SSE4

File: matmul.cpp
Optimized using:
- AVX2 and SSE4 vector intrinsics
- Loop unrolling and cache-friendly access
Achieved up to 6× speedup over naive implementation

2. Grid Search (Baseline – Serial)

File: gridsearch_original.cpp
Reference serial implementation
Used for correctness and performance comparison

3. Grid Search (Parallel – OpenMP)

File: gridsearch_openmp.cpp
Parallelized using OpenMP
Achieved ~6.5× speedup on multi-core CPU

4. 2D/3D Convolution using CUDA

File: convolution_gpu.cu
GPU implementation using CUDA
Achieved ~4× speedup compared to serial CPU version

🛠️ Build Instructions

Requirements

g++ (with OpenMP support)
nvcc (CUDA Toolkit)
Linux environment recommended

Compile All Programs

make

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Convolution GPU		Convolution GPU
Grid Search		Grid Search
MatMul		MatMul
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High Performance Computing Course Project (CS610)

🚀 Implemented Projects

1. Serial Matrix Multiplication with AVX2 & SSE4

2. Grid Search (Baseline – Serial)

3. Grid Search (Parallel – OpenMP)

4. 2D/3D Convolution using CUDA

🛠️ Build Instructions

Requirements

Compile All Programs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

High Performance Computing Course Project (CS610)

🚀 Implemented Projects

1. Serial Matrix Multiplication with AVX2 & SSE4

2. Grid Search (Baseline – Serial)

3. Grid Search (Parallel – OpenMP)

4. 2D/3D Convolution using CUDA

🛠️ Build Instructions

Requirements

Compile All Programs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages