Skip to content
View cotenthusiast's full-sized avatar

Block or report cotenthusiast

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
cotenthusiast/README.md

Hi, I'm Karl

Software Engineering student at Queen's University Belfast, focused on ML robustness, LLM evaluation, and building systems from first principles.

Current Work

  • Investigating positional bias in multiple-choice LLM evaluation using two-stage prompting and mitigation baselines.
  • Running open-source model generalization experiments on Kelvin2 HPC with Qwen 2.5 models.
  • Contributing to a survey on LLM/MLLM robustness.

Featured Projects

  • Two-Stage Prompting for MCQ Evaluation Research project evaluating whether decomposing MCQ answering into free-text reasoning and option matching reduces positional bias.

  • MCQ Bias Generalization Experiments Open-source model experiments testing whether MCQ bias-mitigation methods generalize across model scale, dataset, and method family.

  • Multilayer Perceptron from Scratch NumPy-only neural network trained on Fashion-MNIST, with reproducible training artifacts and evaluation.

  • Logistic Regression from Scratch From-scratch implementation focused on optimization, decision boundaries, and reproducible experiment outputs.

Tech

Python · NumPy · PyTorch · Hugging Face · LaTeX · Git · Linux · Bash · Docker

Pinned Loading

  1. model-generalization-paper model-generalization-paper Public

    Method generalization experiments across open-source LLMs on MMLU and ARC-Challenge. Built on the mcq-framework, runs on Kelvin2 HPC cluster.

    Python

  2. two-stage-prompting-paper two-stage-prompting-paper Public

    Code and experiments for "Two-Stage Prompting for Robust MCQ Evaluation" — investigating positional bias in LLM multiple choice evaluation.

    Python

  3. mcq-framework mcq-framework Public

    A reusable framework for evaluating LLM robustness on multiple choice questions. Pluggable backends, runner templates, and config-driven experiments. Under active development.

    Python

  4. neural-network-from-scratch neural-network-from-scratch Public

    Neural network from scratch in NumPy with a clean src/ layout, unit tests, and runnable train/eval scripts that save reproducible artifacts (loss curves, configs, metrics).

    Python