Skip to content

kaustubh202/inside_llms

Repository files navigation

icon hand-wave

Inside LLMs

We’re a small group of undergrads from the Data Science Group at IIT Roorkee who got unusually curious about one question:

What’s actually going on inside neural networks?

The mechanisms, circuits, representations, algebraic structure, failure modes, all of it.

This GitBook is where we document that curiosity.


Why this exists

A lot of ML research focuses on what models can do.
Mechanistic interpretability asks something different:

How are they learning it?

That question pulled us into probing attention layers, dissecting MLPs, testing causal interventions, studying fine-tuning shifts, and even poking at ideas like self-modeling, model introspection, and more interpretable foundational architectures.

Some of this became workshop papers, some rejected submissions, some ongoing experiments, and some things we’re still confused about.


What you’ll find here

Think of this GitBook as a public research notebook.

Inside:

  • Work on domain specialization and circuit discovery in LLMs
  • Our AAAI workshop paper on bilinear MLP interpretability
  • Experiments on self-modeling and model introspection
  • Probing studies, causal interventions, fine-tuning analyses
  • Random mech-interp curiosities we didn't resist exploring

If something worked, we explain it.
If it didn’t, we try to explain that too.


Who we are (and who we aren’t)

We’re not a formal lab.

Just a bunch of students who:

  • ask slightly obsessive questions about models
  • try to read and understand mech-interp papers
  • run questionable experiments at 2 AM
  • and somehow keep coming back for more

You’ll find the humans behind this in the Our Team section below.


Why make this public?

Because mech-interp benefits from openness:

  • ideas cross-pollinate fast
  • half-formed intuitions sometimes help others
  • and honestly, we learned most of this from people who shared generously.

So this is our attempt to do the same.

If you’re into interpretability, alignment, or just curious how neural nets tick, our guess is you’ll probably find something interesting here.

And if not, at least you’ll know some IITR undergrads tried 🙂

---

P.S. Contact details are on the Team page if you want to reach out.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors