AI engineer focused on LLM safety, alignment, and reliable systems. Mississauga, ON · pahealyai@gmail.com · LinkedIn
I build LLM-powered systems with an emphasis on hallucination mitigation, adversarial robustness, and auditable decision-making. My projects ship runnable code and the evaluation harnesses to know when the code stops working.
I care about scalable oversight, interpretability, and the kind of boring engineering discipline that makes alignment techniques actually usable in production — structured outputs, refusal strategies, per-claim validation, and eval sets you can read in an afternoon.
Grounded Q&A over AI safety research papers, with multi-layer source validation and a 30-prompt adversarial eval harness.
Every claim has to trace back to a retrieved chunk — or the system refuses. Built to make citation hallucination a first-class failure mode. Ships with a MockLLM so the pipeline runs without an API key, plus OpenAI + FAISS for real deployments.
Python · LangChain · FAISS · OpenAI API
Chain-of-thought reasoning agent that produces auditable, Pydantic-validated JSON across 500+ synthetic test cases.
Three-layer architecture (reason → classify → score), each layer unit-tested in isolation. Retry + fallback logic; NEEDS_REVIEW is a first-class class for the cases the model is genuinely unsure about.
Python · LangChain · Pydantic · Structured Reasoning
CrewAI-style multi-agent workflow with a Reviewer critique loop and a built-in prompt-sensitivity study.
Models scalable-oversight patterns on a mundane task: one source → three formats (Twitter, LinkedIn, email). Every run writes a full transcript. The sensitivity command measures how much the output varies under prompt perturbations — because "it worked last time" is not a guarantee.
Python · CrewAI · Prompt Engineering
Red-teaming toolkit with 40+ curated attacks spanning jailbreaks, prompt injections, goal misspecification, and boundary probes.
Each attack has an expected field — so the evaluator catches over-refusal, not just under-refusal. Generates a readable markdown report with per-category rates and paraphrase-consistency checks. Works against any target model via the Defender protocol.
Python · OpenAI API · Red-teaming
- Reading: Constitutional AI (Anthropic), Scaling Monosemanticity (Anthropic), Eliciting Latent Knowledge (ARC), Sleeper Agents (Anthropic)
- Following: Anthropic, ARC, Redwood Research, EleutherAI, Apollo Research
- Building: adversarial evaluation tooling and better refusal-boundary measurement
AI/ML: LangChain · CrewAI · LlamaIndex · Hugging Face Transformers · OpenAI API · Anthropic API RAG / search: FAISS · ChromaDB · embedding pipelines · source grounding Languages: Python (advanced) · Bash · TypeScript Practice: RLHF concepts · Constitutional AI · adversarial prompt testing · structured evaluation frameworks · refusal design
Open to AI safety / alignment engineering roles, research assistantships, and fellowship programs.