Shrinking the repo - important but disruptive chore #9020
Replies: 2 comments 3 replies
-
|
This would cause an enormous amount of churn for contributors with clones / forks / active PRs. Is it really necessary now that git has good tooling like Also: it's currently possible to cryptographically verify that every commit (or at least every recent commit, I didn't check back to the start of the repo) in main came from a GitHub PR, because we use squash merge so the commits are created and signed by GitHub. If we rewrite history locally and force-push, that will no longer be the case. |
Beta Was this translation helpful? Give feedback.
-
|
This is the output of git-sizer. The biggest issues it flags are the gh-pages related artifacts. We have a few ways to trim this down but the real solution is moving to publish using a workflow instead of a branch. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
the cloned repo is around 1.7G or so at time of writing, which is unfortunately large and probably 4x at least what it should be.
Some reasons:
1 one blog post has very large full res photos which could be compressed
2 some dead directories which had checked in binaries (for windows and other things) and some accidental node_modules commits
3 other large bits of content as the repo also acts as a bit of a CMS for docs/content
The largest one to fix is 2, but is most disruptive:
Filter out old paths, rewrite history, force push
This does mean that every PR basically will break, everyone has to clone fresh, but otherwise all history is kept
How to do this:
git filter-repo --invert-paths --path temporal-service/ --path pr-preview/ --path documentation/prompt-library/node_modules/ --path extensions-site/node_modules/ --path crates/goose-demo/ --path crates/goose-ffi/ --path ui/desktop/src/bin/goosed --path ui/desktop/src/bin/goose_ffi.dll --path ui/desktop/src/bin/goose_llm.dll --path ui/desktop/package-lock.json --path input.wavThis will shave off 500+ gig of totally dead noise.
for 1: documentation/blog/2025-04-17-goose-goes-to-NY/ has 256M of photos - could be 3m, so really should do this as part of 2 above - shrink those, and then filter out the old ones from the history so we don't keep the large ones.
for others: there are videos which could be moved to somewhere else, could also use git LFS to prevent, and perhaps we should consider docs/blogs into another peer repo next to goose as part of it aaif-goose/goose-docs for example
With all the above, I think can get it down to a few hundred meg at most, a vastly better experience, but does require some pain.
Anyone up for this?
Beta Was this translation helpful? Give feedback.
All reactions