feat(implicit): add theoretical bridge between VCMs and neural models#89
Open
cnellington wants to merge 1 commit into
Open
feat(implicit): add theoretical bridge between VCMs and neural models#89cnellington wants to merge 1 commit into
cnellington wants to merge 1 commit into
Conversation
Adds a varying-coefficient regression view of differentiable models with context inputs, showing the intermediate regression parameters can be recovered by differentiating with respect to context — a first-order Taylor approximation.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a short theoretical link in the “Foundations of Implicit Adaptation” section arguing that context-as-input differentiable models (e.g., neural nets) can be related to varying-coefficient regression via local linearization / post-hoc derivatives.
Changes:
- Renames the subsection to “Theoretical Bridge: Architectural Conditioning via Context Inputs”.
- Adds a math-based bridge describing how intermediate coefficients could be recovered from a differentiable model via differentiation, with a citation to post-hoc interpretation work.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| The connection is explicit for differentiable models $g$. Consider the model $P(Y | X, C)$ as a varying-coefficient regression model. An explicit estimator for regression parameters will solve for the regression parameter map $\beta_i = f(c_i)$ through | ||
| $$\hat{f} = \text{argmin}_f \sum_i (y_i - x_i \cdot f(c_i))^2,$$ | ||
| while a differentiable model (e.g. a neural network) will solve | ||
| $$\hat{\Phi} = \text{argmin}_\Phi \sum_i (y_i - g([x_i, c_i]; \Phi).$$ |
Comment on lines
+20
to
+26
| The connection is explicit for differentiable models $g$. Consider the model $P(Y | X, C)$ as a varying-coefficient regression model. An explicit estimator for regression parameters will solve for the regression parameter map $\beta_i = f(c_i)$ through | ||
| $$\hat{f} = \text{argmin}_f \sum_i (y_i - x_i \cdot f(c_i))^2,$$ | ||
| while a differentiable model (e.g. a neural network) will solve | ||
| $$\hat{\Phi} = \text{argmin}_\Phi \sum_i (y_i - g([x_i, c_i]; \Phi).$$ | ||
| Under mild assumptions, these result in an identical solution for the intermediate regression parameters $\beta$. While the varying-coefficient model solves this explicitly, these can be obtained post-hoc from the differentiable model by differentiating with respect to $c_i$ | ||
| $$\beta_i = \frac{\delta}{\delta c} g([x_i, c_i]; \Phi).$$ | ||
| This is the first-order Taylor approximation of the model, a locally linear approximation [@doi:10.48550/arXiv.1602.04938] often used in post-hoc interpretation methods. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a brief theoretical bridge in the Foundations of Implicit Adaptation section, showing that differentiable models with context inputs (e.g. neural networks) recover the varying-coefficient regression solution: the intermediate regression parameters$\beta_i$ can be obtained post-hoc by differentiating the model with respect to $c_i$ — a first-order Taylor approximation often used in post-hoc interpretation.
Renames the relevant subheading to "Theoretical Bridge: Architectural Conditioning via Context Inputs".
Split out from #88 to keep the amortized-estimation discussion separate from the theoretical bridge content.
Test plan
@doi:10.48550/arXiv.1602.04938resolves