Inside LLMs

icon	hand-wave

Inside LLMs

We’re a small group of undergrads from the Data Science Group at IIT Roorkee who got unusually curious about one question:

What’s actually going on inside neural networks?

The mechanisms, circuits, representations, algebraic structure, failure modes, all of it.

This GitBook is where we document that curiosity.

Why this exists

A lot of ML research focuses on what models can do.
Mechanistic interpretability asks something different:

How are they learning it?

That question pulled us into probing attention layers, dissecting MLPs, testing causal interventions, studying fine-tuning shifts, and even poking at ideas like self-modeling, model introspection, and more interpretable foundational architectures.

Some of this became workshop papers, some rejected submissions, some ongoing experiments, and some things we’re still confused about.

What you’ll find here

Think of this GitBook as a public research notebook.

Inside:

Work on domain specialization and circuit discovery in LLMs
Our AAAI workshop paper on bilinear MLP interpretability
Experiments on self-modeling and model introspection
Probing studies, causal interventions, fine-tuning analyses
Random mech-interp curiosities we didn't resist exploring

If something worked, we explain it.
If it didn’t, we try to explain that too.

Who we are (and who we aren’t)

We’re not a formal lab.

Just a bunch of students who:

ask slightly obsessive questions about models
try to read and understand mech-interp papers
run questionable experiments at 2 AM
and somehow keep coming back for more

You’ll find the humans behind this in the Our Team section below.

Why make this public?

Because mech-interp benefits from openness:

ideas cross-pollinate fast
half-formed intuitions sometimes help others
and honestly, we learned most of this from people who shared generously.

So this is our attempt to do the same.

If you’re into interpretability, alignment, or just curious how neural nets tick, our guess is you’ll probably find something interesting here.

And if not, at least you’ll know some IITR undergrads tried 🙂

---

P.S. Contact details are on the Team page if you want to reach out.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.gitbook/assets		.gitbook/assets
belief-state-geometry		belief-state-geometry
bilinear-mlp		bilinear-mlp
phase1		phase1
self-awareness		self-awareness
self-modeling		self-modeling
team_imgs		team_imgs
.gitbook.yaml		.gitbook.yaml
README.md		README.md
SUMMARY.md		SUMMARY.md
book.json		book.json
conclusion.md		conclusion.md
formal-objectives.md		formal-objectives.md
team-members.md		team-members.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inside LLMs

Why this exists

What you’ll find here

Who we are (and who we aren’t)

Why make this public?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Inside LLMs

Why this exists

What you’ll find here

Who we are (and who we aren’t)

Why make this public?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages