Skip to content

Propose modernising hackage-server project#67

Open
qnikst wants to merge 4 commits into
haskellfoundation:mainfrom
tweag:tweag/proposal-modernising-hackage-server
Open

Propose modernising hackage-server project#67
qnikst wants to merge 4 commits into
haskellfoundation:mainfrom
tweag:tweag/proposal-modernising-hackage-server

Conversation

@qnikst

@qnikst qnikst commented May 21, 2026

Copy link
Copy Markdown

This commit introduces a proposal of the modernising hackage-server project by Tweag. The project includes a plan to improve hackage-server scalability and resource use by migration of the data store to relation database as well as a zero-downtime migration plan

Rendered document: 0000-modernising-hackage-server.md
Related discussion on Discourse: https://discourse.haskell.org/t/feedback-request-modernising-hackage-server-community-project-proposal/14142

This commit introduces a proposal of the modernising hackage-server project
by Tweag. The project includes a plan to improve hackage-server scalability and resource use
by migration of the data store to relation database as well as a zero-downtime
migration plan

### Migration Sequence

For the migration we 5 distinct phases:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed a slight typo!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@LaurentRDC

Copy link
Copy Markdown
Contributor

This is wonderful. I'm glad someone is taking a stab at this.

A few thoughts:

I'd like the proposal to go a bit further. Since incremental changes are hard on the current hackage-server, what are some of the ways in which hackage-server-v2 will be foward looking? How do we ensure that, in 10 years, there isn't a similar proposal for hackage-server-v3 because the architecture for hackage-server-v2 is lacking, or is hard to incrementally change?
Today's problem is horizontal scalability, and the proposal addresses that. What could be tomorrow's problem, and ensure that the new design allows for this to be solved? For example, the proposal mentions the use of IO () callbacks as being problematic due to an unclear control flow. What's the alternative being proposed here?

I strongly support the choice to go with Servant. Generating HTML pages is a bit annoying out-of-the-box, but the ability to create a hackage-server-api package is unparalleled.

Finally, one crucial detail to get right here, that I think needs to be addressed, is the solution around long-term data migrations once hackage-server-v2 is the source-of-truth. I'm not familiar with acid-state in practice, but I assume that it involves writing migrations in Haskell. SQL migrations can be painful if not managed appropriately

@hasufell

Copy link
Copy Markdown
Contributor

This is direly needed.

But I found the section about flora a bit handwavy... why exactly can this not be used to build a modern hackage-server? Have you reached out to @Kleidukos? It's possible these projects have largely different scope, but it's also possible this may cause more fragmentation that could have been avoided.

I also find it unclear who is going to maintain this project after the proposal is done and implemented.

@gbaz

gbaz commented May 22, 2026

Copy link
Copy Markdown
Collaborator

Flora has none of the APIs or backend necessary to be hackage. It is only a database and frontend. Most of the "juice" in hackage is the backend structures, not just the interface it provides to an existing database.

@gbaz

gbaz commented May 22, 2026

Copy link
Copy Markdown
Collaborator

On the whole I think this proposal is reasonable and addresses a real problem. The proposed architecture -- servant and postgres, is a standard and nice one that makes sense. That said, here are some comments.

HTML is generated manually throughout, as opposed to being a structured, templated system. This means it is prohibitively expensive to do any sort of modernizing of the generated documents, despite them conceptually being simple projections of the data.

This is not true. Many, though not all pages are generated using the hstringtemplate library, and the usage could be further pursued.

My main question is I don't understand the migration plan. The existing system will take all API requests, no? So how will the new system have the data to serve? Or is the idea all requests go to the new system and then it also "forwards" them to the old one? Additionally, will we need to run both servers at once on the existing hackage box? If so, will that cause even further resource costs on an already resource-starved box? Or is the plan to have a second box as well? (Which is fine, except then the filestore will need to be shared across boxes?).

An additional issue regarding just the proposal text (not the plan) is that we do not need horizontal scaling -- mirrors suffice for the most part, and we can build UI mirroring beyond that. What we need is to reduce the in-memory footprint. The motivation for switching datastores (a much needed thing, and thank you so much for looking at it!) is not scalability in the requests-per-second sense. I believe that a well-written hackage server could comfortably be served for quite some time on a much less beefy box than we now have, if it did not use acid-state. The motivation is just that the quantity of resident memory required by the current architecture is too high per each incremental package upload.

Finally, while on the whole I think a clean API-for-API rewrite would be ideal, I do wonder if there's another "middle balance" for now, which is to not swap the whole of the backend at once from acid-state to postgres, but to just swap the most expensive part, which I believe is the packagedb. It seems from skimming the migration document that most of the lines of code that require touching (20% or so in total) are not related to the packagedb, but rather to the user store, etc -- which are much less costly, I believe, to keep in acid-state for the time being.

A partial migration does not make hackage more horizontally scalable, but as I said above, it is not horizontal scalability that is our obstacle -- it is the single-box-cost of keeping too much data in memory.

@gbaz

gbaz commented May 22, 2026

Copy link
Copy Markdown
Collaborator

All that said, if a full rewrite can be done by two engineers in three months as this proposal states, then I think that we should absolutely go for it despite my reservations -- the cost-benefit analysis and my concerns are based on my experience of the very slow development of hackage in the past, and my fear of the scale of a large rewrite. So I would encourage the proposal submitters to really be sure they understand the scope of hackage well enough to give such an estimate (though the inventory of APIs and features indicates they have already thought about this.) If that is a genuine estimate of good engineers with sound timeline judgement, then that very much incentivizes going this path.

@hasufell

Copy link
Copy Markdown
Contributor

All that said, if a full rewrite can be done by two engineers in three months

Maybe we should ask directly: is Tweag planning to use AI assistance and if so in what shape or form?

I don't see an LLM contribution policy in the hackage-server project, but this is probably useful to clear up anyway.

@qnikst

qnikst commented May 22, 2026

Copy link
Copy Markdown
Author

Thanks for replies. I'll try to address them:

@L0neGamer:

I'd like the proposal to go a bit further. Since incremental changes are hard on the current hackage-server, what are some of the ways in which hackage-server-v2 will be foward looking? How do we ensure that, in 10 years, there isn't a similar proposal for hackage-server-v3 because the architecture for hackage-server-v2 is lacking, or is hard to incrementally change?

Any reply here would be a bit philosophical. There can be no reply that will convince everyone, as there is no agreement on what right or wrong in the community. What we can guarantee that Tweag will use the best (and safe) practices as of 2026 (and not use too experimental approaches). The very least we will split the storage/query layers, so it would be possible to change the implementation w/o affecting other layers of the server implementation, and care about documentation. We believe that the proposed incremental approach to migration will ensure that codebase is modifiable without crucial rewriting, so there will be no need in the similar proposal.

I'm not familiar with acid-state in practice, but I assume that it involves writing migrations in Haskell. SQL migrations can be painful if not managed appropriately

These years we prefer to use rel8 for working with database (we already have a proof of concept for that) and sqitch for migrations. Both were used in various Haskell projects, ensuring the sustainability of the solution.

@hasufell, with regards to the flora.pm, yes we definitely in contact with @Kleidukos, at the point (may 2026) flora.pm has some features that are not compatible with hackage (e.g. because of namespace support). And no background tasks coverage. If we continue with flora.pm keeping the hackage-server as it it will require significant work as in the flora.pm, but also update tooling that will have to support modern API. With all the respect to flora.pm that I believe is very important project for entire Haskell Ecosystem, modernising hackage-server looks like the better strategy in terms of efficiency and required investments.

I also find it unclear who is going to maintain this project after the proposal is done and implemented.

We expect that the proper long term strategy is that Haskell Foundation should own hackage-server, as it's important that the core infrastructure does not depend on a single entity. But Tweag will support code maintenance and address the bugs as much as we can.

@gbaz

HTML is generated manually throughout....
This is not true. Many, though not all pages are generated using the hstringtemplate library, and the usage could be further pursued.

Thanks! We will remove the false statement. And on the course of the implementation will check what will be the best way forward whether to pursue it further, or there will be safer/more efficient approach.

My main question is I don't understand the migration plan.

We will need to update the document to be more explicit, but long story short, we expect to have a second box on duration of the migration, the only complex part is sharing an access to data storage during the first step of the migration. But this problem has nice known solutions.

... I do wonder if there's another "middle balance" for now, which is to not swap the whole of the backend at once from acid-state to postgres, but to just swap the most expensive part, which I believe is the packagedb.

This was a part of the migration plan, we first move package db, and move usersdb as a separate step. But when you mentioned that, I start to think that this step will be a great milestone in our work. When we wrote a proposal we have not anticipated that, and saw benefit to community only when all the work will be done. We look forward to do complete rewrite and current approach to working with data still sets some limitations. But I think it worth explicitly mention the milestone.

... estimates ...

With regards to the timing. Initial very safe assumption after initial work as was 6 month 3 developers, but this will be a too costly request. With the experience of the similarly looking packages and concrete plan 3 month 2 devs is optimistic but still possible assumption for the interative migration, even without any AI-tools being involved. (Though it's possible if we have unknown unknowns we will be able to deliver only the packagedb related milestone in that time.)

@hasufell and we do not plan to use agentic approach for any code rewrite, where rewrite itself is done solely or largely using AI tools.


Following actions from us:

  • add information about migrations.
  • remove statement about html generation.
  • add details about the migration steps and requirements.
  • add details if it's feasible to move only packagedb related parts to a relational database.

I'll add another comment once we complete those actions.

@LaurentRDC

Copy link
Copy Markdown
Contributor

flora.pm has some features that are not compatible with hackage (e.g. because of namespace support)

Package namespaces is something that comes up quite often. Perhaps you could mention what hackage-server-v2 could do differently from hackage-server in order to allow this feature to be added in the future?

@Kleidukos

Kleidukos commented May 22, 2026

Copy link
Copy Markdown

Regarding several things that have been said about https://flora.pm in this thread:

  • Is it ready to act as a package repository today?
    • No
  • Does it want to replace Hackage Central today?
    • No
  • Namespaces are incompatible with what we have, what do?
    • A swift read of https://flora.pm/documentation/namespaces will inform you that currently, namespaces on Flora refer to package repositories that are indexed, because https://flora.pm is a meta-index of Haskell repositories
      • If Cabal finally supports namespaces, then it's not much work on the Flora side.

I don't think flora-server is the adequate choice to replace hackage-server today. I'd like it if work on hackage-server could ensure that the service still works for people, like being able to upload through cabal upload without a timeout error, for instance.

@qnikst

qnikst commented May 22, 2026

Copy link
Copy Markdown
Author

@LaurentRDC to be honest I would love Tweag to concentrate on concrete problem: high-memory usage (and as a result instability problems (cabal upload that mentioned above)) and keep the interfaces of the hackage-server-v2 fully compatible with v2.

And after this work we will be in a place where we can discuss what can be improved or adopted from other solutions. And think about more advanced features (e.g. namespaces) and the migration path to support them in the central repository. There are many interesting ideas floating around hackage so it's too easy to jump on the endless feature creep path.

For now I, personally, would prefer to leave exploration of the namespaces to the solutions that would solve it better (flora.pm).

@LaurentRDC

LaurentRDC commented May 22, 2026

Copy link
Copy Markdown
Contributor

@qnikst I totally agree. I wasn't clear enough. My ask isn't to add this feature (namespaces) or any other; it's to ensure that the new hackage-server-v2 is designed in an extensible way, whereas hackage-server is apparently not, such that future work on hackage-server-v2 is easier than current work on hackage-server

That's a bit of a vague ask, I concede

@gbaz

gbaz commented May 22, 2026

Copy link
Copy Markdown
Collaborator

To be clear, hackage-server is currently extensible despite living on (due to its age) an effectively custom web framework. It is modular, and mostly (with the exception of the problems caused by shared in-memory state) well factored. The problem is that the core foundations that it is built on are nonextensible, and along with that the general design feature of the Haskell language that migrating large chunks of code from pure to effectful can be extremely invasive.

Extending hackage with namespaces would not be difficult for technical reasons in the design of hackage. It would be difficult for reasons having to do with the design of namespaces vis-a-vis cabal, the package ecosystem as a whole, how packages are designed and dependencies declared, and even what the purpose would be and getting a large group to agree to it. I think that its out of scope to worry about such things -- they're not hackage problems.

Ultimately, the problem with hackage as it exists is it was built using what is now a very idiosyncratic stack at its very foundations. This proposal seems sound to me particularly in that it swaps that out for broadly used and maintained code. In fact, I would hope that a full rewrite could lead to a significant drop in source lines, because much of the "framework" in the hackage codebase would not need to be rewritten -- rather that is now duplicative of servant, etc.

I do think the migration plan needs to be thought through in greater detail. On that front, we should check with the rest of the admin team but I understand hackage now runs on a cloud box because it outgrew the physical box we had allocated for it. However, we still have the physical box around -- so a cost-efficient way to have (at least for a while) two servers would be to use the current cloud and the physical box together -- but bear in mind those are not at the same datacenter, so it would not be especially easy for them to directly share disk space.

That said, mirroring packages alone can be done efficiently by polling, because one need only repeatedly check the timestamp.json file to check when the index-01 tarball has been updated, then incrementally refetch that to get the incremental updates to the core package store since last fetch.

All told, I think the new server would probably be best written not with a balancer at front between old and new, but proxying all requests, and passing through the subset (large at first) that it did not know how to handle.

@LaurentRDC

Copy link
Copy Markdown
Contributor

Thanks @gbaz , that's all very helpful!

@blackheaven

Copy link
Copy Markdown

A quick word from the SRT: every few months, especially since the rise of LLMs-based security audit, we get security reports (we still have on-going reports).
We don't hope a possible rework to fix all the vulnerabilities, but, can we somehow mention to have the security in mind, either with regular, possibly async, code review, and/or LLMs reviews (if Tweag use them).

Address all the comments from the GitHub discussion.

Co-authored-by: Sandy Maguire <sandy.maguire@tweag.io>
@qnikst

qnikst commented May 22, 2026

Copy link
Copy Markdown
Author

Thanks everyone for the comments.

We believe that we have addressed all of them: removed the falsy statement about the html generation, added information about the migrations and security.

@gbaz we have changed the architecture of the solution so now all the requests will go through the hackage-server-v2 and unknown will fall through to the hackage-server.

As for the sharing the disk, we don't think we need that. In the proposal we have 2 alternatives (lsync-based or pull based) and the best one can be chosen together with admin team, as it largely depends on the existing infrastructure.
I hope it matches your vision.


We propose a complete rewrite of `hackage-server`, into the following form. Although full rewrites are often hard to justify it is our opinion that this is the best approach forwards (see “Why Incremental Refactoring Is Not Feasible” and “Correctness Guarantees” for the specific details.)

The Hackage Server V2 project represents a *complete rewrite* of the existing infrastructure, utilizing contemporary Haskell libraries and development methodologies. This new version is architected into two primary segments:

@ysangkok ysangkok May 23, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version of Hackage is v2 as you can see from the announcement. This version number was previously advertised on the main page. It is confusing to also label this proposal v2, it would better to call it v3.

EDIT: Here is the Well-Typed blog post: https://www.well-typed.com/blog/2013/09/hackage-2-now-available-for-beta-testing/

EDIT2: v0 was at hackage-scripts and v1 was announced at ICFP '09 at 11m58s: https://vimeo.com/6571975

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@bcardiff

Copy link
Copy Markdown

I support the goal of modernizing hackage. I’ll keep an eye in case there is something concrete I could collaborate with.

From my experience with the codebase and from the conversations that happened in AmeriHack I do have some questions/comments.

I think the main pain point is acid-state. Migrating out of happstack to servant is appealing, but is not a must have.

Are we betting that the investment of the full rewrite is relatively similar to replacing acid-state in the happstack stack?

I wonder if it’s not safer to narrow this effort to migrating out of acid-state. Leaving the happstack to servant migration as another proposal. One that can happen iteratively without proxy.

Is it expected to keep, at least during transition period, the same auth? The lack of cookie based session limits the UX. Changing that should happen sometime IMO, but we need to keep the same to allow the proxing. Unless proxy is just for public routes and we expect to also modernize the authentication. (I am/was actually working on a happstack-session package to try to have cookie based sessions in hackage)

Is there a plan on how other hackage installations would be able to migrate to v2? Maybe after the transition period have a one-shot migration program.

Letting happstack and acid-state aside, hackage is structured in a specific way. Features are represented as modules. What is the expected code organization of the v2?

I do think this effort will unlock many improvements and would enable contributions that will benefit us all, so again I support this.

@qnikst

qnikst commented May 26, 2026

Copy link
Copy Markdown
Author

@ysangkok

Thanks for the comment about version, surely we need to call it v3 at least. I'll update the proposal

@bcardiff hello, we will come with more detailed answer about the features and modules later, we need to sync inside the team on that.

But right now I can tell that migrating from the acid-state alone without touching other bits such as migrating out of happstack not feasible from the time and energy investment perspectives. See why incremental refactoring is not possible. Originally we thought that work can be done in 1 developer month and will be pretty straightforward, but after sending several PRs it was evident, that it involves refactoring from the pure to effective code, that will lead to more than >25% code refactoring (see appendix 1), and changing how codebase works, @isovector can give more details. As codebase other legacy parts that are heavily involved it's easier to start a migration. So after our research it appeared that proper changing of the acid-state alone will be comparable in time with the new implementation.

Is it expected to keep, at least during transition period, the same auth?

In our migration plan we plan to keep all current API, including auth fully backwards compatible, until the end migration. Dropping any backwards compatibility should happen outside of this proposal and this scope of work, either after a discussion in the hackage-server-v3 tracker or, I believe, wider community discussion.

Is there a plan on how other hackage installations would be able to migrate to v2?

To be honest we have not considered that. In my opinion I would prefer either leave that to the administrators decision, but we will need to provide scripts for one-off data migration. This is an important point thanks!

@isovector

Copy link
Copy Markdown

@bcardiff:

Are we betting that the investment of the full rewrite is relatively similar to replacing acid-state in the happstack stack?

Yes, definitely. Please see the section "Why Incremental Refactoring is not Feasible." Swapping out acid-state touches at least 25% of the codebase, which is a huge refactor, and if we're going to do that much work we might as well get all of the architectural benefits at the same time.

Is it expected to keep, at least during transition period, the same auth?

Yes, this change will be invisible to end users (modulo some HTML differences.)

The lack of cookie based session limits the UX.

I don't think this will be very hard to add, though it's not a huge priority. If we can get it for free we'll add it, if not, a PR would be pretty small I think.

Is there a plan on how other hackage installations would be able to migrate to v2? Maybe after the transition period have a one-shot migration program.

We'll need tooling to dump acid-state into postgresql as part of the online migration, and we will make sure to ship that code as well. I'll make a note to update the proposal with that.

Letting happstack and acid-state aside, hackage is structured in a specific way. Features are represented as modules. What is the expected code organization of the v2?

Whatever ends up being the cleanest way of carving the problem at its joints :) While the existing architecture has some benefits, its not nearly as modularized as it pretends to be --- see eg the HTML "feature" which takes 21 XFeature arguments!

@chreekat chreekat left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the Haskell Foundation DevOps Engineer, I think this proposal would be strengthened with some details about how the new infrastructure will be implemented.

I think rewriting the application with a modern stack is a great idea, because my main value is creating open and maintainable Haskell infrastructure. The rewrite helps with maintainability. I won't quibble over the details of how it's implemented. You had me at "replace acid-state".

When I think about the openness and maintainability of Hackage, though, my main worry is operational. Some day, a new admin will need to be responsible for Hackage. It can't keep being the same 2 people for the entire future lifetime of the Haskell programming language. :) But currently, there is a lot of operational know-how buried in those peoples' brains and living live on the server. How do we onboard new admins? How do new versions of hackage server get deployed? How do we upgrade the OS underneath the app? Do we deploy with systemd? Where are the app logs? What are the disaster recovery plans?

To be clear, I'm just stating the facts, and I think it's entirely reasonable to be where we're at. I also don't expect this proposal to answer any of these (rhetorical?) questions.

I just want to make a point. You talk about spinning up a second app and a proxy. That's new stuff. If you could describe how you'll spin that up, I might be able to make useful suggestions!


Some background:

In 2024, I took over Stackage. I created open-source infrastructure-as-code for the whole operation, creating a central location for documentation, issues, and public development. Onboarding devs get easier. Drive-by infrastructure improvements were enabled (and actually happen.) I even have VM-based tests in CI! I'd like this repo to be a model for other public infrastructure. I would even be so bold as to suggest that any public config go into the same repo.


The core issue is the IO boundary. Replacing `acid-state` with a real database requires IO, but the existing codebase assumes pure access to the full application state. Consequently, the first alternative leaves performance bottlenecks unaddressed, while the second requires very substantial engineering investment (see Appendix 1 for a thorough estimate of what needs to change, and how.)

Additional complexity arises from the fact that the two maintainers of `hackage-server` as listed in the cabal file haven't authored any commits since 2016 and 2013, respectively. The copyright field hasn't been updated since 2015\. There is no changelog. The original authors are no longer involved. The current maintainers did not write the system. There is little-to-no documentation of the architectural invariants, no record of why key decisions were made, and no single person who holds a complete mental model of how the pieces fit together.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well actually... 🤓

https://liferay.dev/b/how-and-why-to-properly-write-copyright-statements-in-your-code

This is the smallest and most insignificant of nitpicks. Please don't whip me

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!


#### Proxy

A reverse proxy implemented in Haskell that for each route:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this proxy can't be implemented with Caddy?

...Hm, it looks like duplicating writes to two backends may not be a native Caddy feature, so maybe not. But on the other hand, that's also a sketchy idea. Hackage is known to send 502s (hence this proposal 😃 ). What's the error response in that case? Does the write still happen on the new server? Seems like we'd get split-brain pretty fast. Would it be better to have hackage-server itself push writes to the new service? Phase 3 and Phase 4 both assume that all writes will have the same response/success on both servers, and I can guarantee that will not be true. You might need to rethink that plan.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. We will update this, because after receiving the comments, we revised the idea and instead of having a proxy proposed that new hackage is a proxy, and passes through all the unsupported requests. This is simpler from the infrastructure perspectives and provides nicer migration path. We have updated the schema, but missed that part of the next. In this case we do not need any proxy.

As for caddy or any other configurable one, the only reason for proposing Haskell one as that during first stages of the migration it would nice to have as responses verification that they match. And logic there could be quite tricky, to have it in the generic tool. But as the proposal was revised there, it's no longer a part of discussion.

I think other parts of the comment also affected by the decision, as now it's up to the new implementation to make a request to old hackage and ensure we have no fix brain, but I'll recheck our plan and add relevant details about your questions.

@qnikst

qnikst commented May 27, 2026

Copy link
Copy Markdown
Author

@chreekat to be honest we have deliberately not touched infrastructure more than in high level details in order not to step on the toes of the Haskell Foundation DevOps Engineers team. I truly believe that your team should set rules and constraints there and Tweag as an implementor should adopt or suggest alternative for consideration during the normal implementation step. I would love to keep that flexibility. Though it makes sense to dig more into how we see that implemented in more details than we do right now.

As for the knowledge, all the details of how to configure and run the service and all source code will be documented. Also some details (like note about cronjobs) targeted exactly that, it would be nice to reuse more widely known tools rather than having a smart but niche implementation that requires some knowledge specific to the project. I'm not sure how to properly highlight that in the proposal, as all that looks like a basic assumption about the project for me :) But regarding infrastructure as a code and deployment we would be happy to adopt the workflow that exists.

What do you think here?


Next actions from our side

  • fix typos mentioned above
  • rework statements about copyright
  • update information that we do not need proxy
  • recheck the migration plan steps (highlight where is the source of truth on each step and why we do not have split brain, possibly update steps)
  • align with infrastructure team how much of the process should be in proposal

@chreekat

Copy link
Copy Markdown
Member

@gnikst thanks for the thoughts!

It's important for me to state that "HF DevOps" hasn't had anything to do with Hackage operation yet. I haven't been inducted, and the actual operations pre-date my role and the HF as a whole. My comments are aspirational and forward-looking. My intent is to make the proposal stronger by highlighting that ops stuff should be a bit more explicit.

For instance, I think Hackage is one of the last holdouts in the Haskell ecosystem that isn't run on NixOS+systemd, so that might be a good starting point.

Depending on the wishes of the actual operators (@davean and @gbaz) and the HF leadership, I'm happy to be involved on the ops side, too. Right now I defer 100% to them.

@qnikst

qnikst commented May 28, 2026

Copy link
Copy Markdown
Author

I've address the action items I've mentioned above, especially: updated information about proxy, added some details on the migration path, added details when load on the current server can be reduced during the migration (best and worst case scenario), added a statement on the deployment, and fixed typos.

@Ericson2314

Copy link
Copy Markdown
Contributor

I strongly disagree with the "Why Incremental Refactoring Is Not Feasible" section.

Haskell is great for refactoring, LLMs are great for refactoring, refactoring has never been easier in the history of programming! Hackage has a long tail of random features and the unknown-unknowns of a full rewrite are great. OTOH, the challenges of refactoring are fully known.

For example, we can do silly incremental steps like reading out the entire database into the pure value and then writing it back. Will that be better than acid state? Probably not! But it would be PostgreSQL in use! That's a very good first step, and then from there, incremental refactoring can make SQL used in an efficient way.

I am a strong -1 on any approach that refuses to try refactoring,

@qnikst

qnikst commented May 28, 2026

Copy link
Copy Markdown
Author

@Ericson2314 Let me give a bit more context.

First, I think nobody in this thread would disagree that Haskell is a great language for refactoring, possibly one of the best. But this proposal did not come out of nowhere. It follows roughly a month of work, during which we initially chose the path of incremental refactoring and explored it seriously.

The issue is not that the refactoring is impossible. The issue is that, from an economic and delivery perspective, it does not seem justified. When a refactoring touches up to 25% of the codebase and the changes are not mechanical, you have to ask whether that refactoring is actually worth doing. In our view, the answer should depend on whether it gives us one or more of the following:

  • a faster path to the goal
  • faster iteration cycles
  • a safer migration path

In this case, we do not think those conditions hold. It is easier to introduce a proxy service that can start receiving real Hackage traffic sooner and immediately improve the developer experience. It is also easier to iterate feature by feature in a separate service than to prepare large invasive patches for each feature inside the existing codebase. Most importantly, the proxy approach gives us a safer migration path, because it allows the old and new services to run in parallel.

I would add that refactoring old software, especially when the original authors are no longer around, inevitably involves unknown unknowns. Strong typing helps a lot, and so do modern tools, but they do not remove that risk entirely. This is not a purely theoretical concern for us; we did a fair amount of work following iterative refactoring approach before arriving at the current proposal.

One possible alternative that has been suggested is something like:
reading the entire database into a pure value and then writing it back

We considered this kind of approach, but our conclusion was that it would make the situation worse rather than better. Writes would require either an efficient way to compute diffs, or a fairly complex procedure for storing the whole state back into the database. Both options introduce new complexity in design, evaluation, and implementation, and most of that work would eventually need to be thrown away.

It would also require more resources, be less stable operationally, and still leave us with essentially the same amount of refactoring to do later. In addition, it would not give us an incremental deployment path: the server could not realistically be deployed until the full refactoring was completed.

That said, we are very open to discussing concrete technical opportunities or alternative approaches, especially if they take into account the constraints and goals described in the proposal.

@isovector

Copy link
Copy Markdown

@Ericson2314

The challenge is not only to fix the code, but also to do it in a way that is reviewable, deployable, and migratable. Keep in mind that hackage.haskell.org is not the only production instance of hackage-server!

Any sort of break-it-to-fix-it solution (like your proposal to effectively implement acid-state in postgres) needs a migration plan; at the very least this means keeping all of the acid-state around in order to replay it later. But this means we can't just change the types and follow the type errors! And since acid-state isn't referentially transparent, moving the acid-state modules elsewhere isn't trivial.

There's als the question of what the migration story is for the unknown number of downstream mirror sysadmins. If we roll out the incremental changes incrementally, what do mirror operators need to do to stay up to date? Does each incremental change require a partial migration? Can it be rolled back without data loss if necessary? If we instead roll out the incremental changes in bulk, how feasible is it to review?

We have done our due diligence here. I firmly believe that the proposal is our best and safest path forwards; dismissing it as "refusing to try refactoring" is, frankly, insultingly uncharitable and unkind.

@Ericson2314

Ericson2314 commented May 28, 2026

Copy link
Copy Markdown
Contributor

I would add that refactoring old software, especially when the original authors are no longer around, inevitably involves unknown unknowns.

The same unknowns are in rewriting. If someone is confused about the implementation, how they are not also confused about how to replicate it from scratch? Conversely if someone is exactly sure what it is doing, they can refactor more aggressively.

The proxy stuff sounds highly non-trivial because now we have two backends and they can disagree on the state of hackage. That scares me far more than any one-time acid-state -> postgresql migration.

Any sort of break-it-to-fix-it solution (like your proposal to effectively implement acid-state in postgres) needs a migration plan; at the very least this means keeping all of the acid-state around in order to replay it later.

Huh? You just convert the acid state data to a postgresql db in one go.

Writes would require either an efficient way to compute diffs, or a fairly complex procedure for storing the whole state back into the database.

It can just be the simple stupid "truncate everything and write it all back" Yes, this sucks, but it also works.

Moreover, I never said you need to actually deploy the postresql-acid-state-style version. It can merely be a refactoring aid, and then nothing is actually deployed until the SQL usage is fixed up.

But this means we can't just change the types and follow the type errors!

I don't get it. You absolutely can do that.


I dunno if it useful to continue debating it. I fully disagree on the premise. I think many/most people are prone to "yes refactoring is better but this project is different...." exceptional thinking. I really can't think of a situation where I would agree with the rewrite approach unless the old thing is totally busted beyond belief and barely working.

@gbaz

gbaz commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Just for clarity's sake, here's some of the steps towards refactoring that were already done. We might disagree on different paths forward, but people can confirm the extent of exploration of the refactoring path: https://github.com/haskell/hackage-server/issues?q=is%3Apr%20author%3Aisovector

@hasufell

Copy link
Copy Markdown
Contributor

I hear the "let's refactor it first" approach a lot too, but then people never actually do it, because it turns out often times it's not really trivial and there is little benefit in playing motivation archeology on a codebase that has no active maintainer anymore.

Although I have not looked at hackage-server itself, I did look at hackage-security and had a deeper look at its TUF implementation years ago: haskell/hackage-security#249

My conclusion here is it would make no sense whatsoever trying to refactor it. There's no benefit in trying to understand why this half baked implementation of TUF was executed. There is a TUF spec. The right thing would be to rewrite it.

A similar approach can be beneficial in general, especially when the requirements of the system can be reasonably extracted. And I think that is the case here. We're not dealing with something like cabal-install, where we don't even know what the requirements (or behaviors for that matter) are.

Lastly, I would distrust anyone who claims LLMs are great for refactoring. It's well known that they corrupt documents. I'd much rather trust people who write the code by hand and are as a result very intimately familiar with it.

@Ericson2314

Copy link
Copy Markdown
Contributor

The one saving grace I will say for rewrite is that Hackage may have a long tail of features that we no longer care about, and jettisoning those features whether intentionally or because rewriting is a lossy process may be a beneficial way to reduce complexity and the maintenance burden going forward.

@gbaz

gbaz commented May 29, 2026

Copy link
Copy Markdown
Collaborator

This plan does not include jettisoning features, and I would be worried if it did. However, there is a lot of the code base that is building up a framework that is no longer necessary, because third party libraries now handle the job well and efficiently. For example, routing etc being handled by servant should remove the need for a fair amount of code. And I think one hope is that database features we've had to "reinvent" in haskell code can now be delegated directly to postgres.

@Ericson2314

Ericson2314 commented May 30, 2026

Copy link
Copy Markdown
Contributor

I want to put my money where my mouth is, so today I started refactoring: https://github.com/Ericson2314/hackage-server/tree/sql

Currently this a bunch of barely looked at slop. If you look at it closely I am sure it will have tons of garbage and be totally fake --- I am certainly not claiming otherwise. That's not the point. The point is figuring out an approach.

The real problem with hackage today is the event sourcing that acid-state does, and how that leads to the memory pressure, etc. However acid-state is still annoying in other other ways. For example, the unclear separation between in memory and on-disk data structures (a bit too cute) makes it very hard to know when one is breaking the old data format --- that is what @isovector linked above.

What I have started is to slop together some bespoke postgresql-based event sourcing. Again event sourcing is the problem, and it has to go. But just getting to PostgreSQL at all, and away from acid-state, so we can refactor with less fear, is a huge victory. And the preservation of event sourcing means that the actual business logic of hackage-server is not yet touched.

It is certainly more than 1 day's work, but not much more than 1 day's work. Eventually the nasty temporary event sourcing can be finished, and thoroughly reviewed. And then that can be switched to. It ought perform similarly bad to what is had today, but not worse.

After that, humans can much more freely rip out the event sourcing component by component. (I am told hackage is rather a layer cake in its data model, and this aids reimplementing and refactoring alike.) Those PRs would touch the business logic, since it is so tied together with the event sourcing approach. But those PRs would be small, as they work module by module, table by table. (For each conceptual data type, "checkpoint" tables would become the only tables, now mutated directly, and the "event" tables would be dropped.)

Time permitting I hope to:

  1. Finish the hand-rolled event sourcing enough to pass tests.
  2. Do one of the de-event-sourcing refactors on top to demonstrate how much easier they are.

After that, I think the approach is pretty well demonstrated, can be extrapolated to be safer and cheaper than the estimated 6 person-months for the rewrite.

@Bodigrim

Bodigrim commented May 30, 2026

Copy link
Copy Markdown
Collaborator

It is certainly more than 1 day's work, but not much more than 1 day's work. Eventually the nasty temporary event sourcing can be finished, and thoroughly reviewed.
...
Time permitting I hope to:

  1. Finish the hand-rolled event sourcing enough to pass tests.

  2. Do one of the de-event-sourcing refactors on top to demonstrate how much easier they are.

"Time permitting" and "enough to pass tests" are the famous last words, aren't they? Pending that, https://github.com/Ericson2314/hackage-server/commits/sql/ is just +5000 (-1000) lines of vibe code (for comparison, the entire Hackage is ~30000 lines) that IMHO hardly demonstrate anything.

@tomjaguarpaw

Copy link
Copy Markdown
Contributor

"Time permitting" and "enough to pass tests" are the famous last words, aren't they?

It seems reasonable to be open to the possibility of @Ericson2314 demonstrating that his refactoring approach has legs whilst this proposal is moved forward in parallel. If, before this proposal is accepted, @Ericson2314 demonstrates convincingly that a refactoring approach is plausible then that would be very useful information to take into account when considering the proposal. If he doesn't then he doesn't, and nothing is lost except his own time and tokens. I don't see any benefit in deciding up front that his words are "famous last". Let him to try demonstrate that they're not, if he wants.

@Bodigrim

Copy link
Copy Markdown
Collaborator

I don't see any benefit in deciding up front that his words are "famous last".

I didn't imply that they are literally last.

@tomjaguarpaw

Copy link
Copy Markdown
Contributor

I didn't mean to imply that you did!

@jappeace

Copy link
Copy Markdown
Contributor

I think this is a good direction

The proposal misses how to deal with TUF, the ceremonies (with 2 servers in migration phase) and does the rewrite keep on using hackage-security?

I think this should be split in scope like mentioned before by @gbaz "middle balance" and trying to do just the packagedb as a phase 1 migration.

If you cut the scope to just the packagedb I'd believe the timeline, I think the current timeline is too optimistic.

@Ericson2314

Ericson2314 commented May 31, 2026

Copy link
Copy Markdown
Contributor

"Time permitting" and "enough to pass tests" are the famous last words, aren't they?

Well, the main HighLevelTest test suite now passes. (Well, It failed to connect to the SASS HTML validator, but that was already an allowed failure.) The only thing that doesn't pass is the DocTests, and that is because I have forgotten the cursed knowledge needed to run Haskell doc tests in general.

Pending that, https://github.com/Ericson2314/hackage-server/commits/sql/ is just +5000 (-1000) lines of vibe code (for comparison, the entire Hackage is ~30000 lines) that IMHO hardly demonstrate anything.

I am confused...what is your point?

  • If the point is 5000 is big, yes I concede that means I have a lot of reading yet to to do.

  • If the point is that 5000 is much smaller than 30000, that's a huge benefit of this approach: the event sourcing infrastructure is being rewritten, but that fast majority of the application stays the same in this first commit.

    As the event sourcing is gotten rid of, yes, more of that 30000 will also change, but that is the smooth incremental part, once we are already in SQL land and acid-state free.

@Ericson2314

Copy link
Copy Markdown
Contributor

but that is the smooth incremental part, once we are already in SQL land and acid-state free.

Just to drive that point home, I spent the time since I wrote that last comment (and none before, so less than an hour) picking off the first tables/modules to de-event-source. The result is the second commit added to my sql branch --- the voting machinery was converted.

Again, the code is not properly reviewed by me yet (I spent most of the time just trying to get the LLM to stop messing with the haddocks/comments, hah), but already we can see it is much smaller <200 lines added and removed. Commits like these are easy to exhaustively review, much easier than the initial 5000 lines.

I fully expect we can rapidly create a queue of 15 or so small such commits, and then we can work through iteratively polishing, thoroughly reviewing, and deploying each one. This is the exact very incremental and very low-risk process we need to get this project done as quickly and efficiently as possible.

@hasufell

Copy link
Copy Markdown
Contributor

@qnikst who is funding this work? Or is this proposal meant to ask the HF for funding?

@jmct

jmct commented May 31, 2026

Copy link
Copy Markdown
Contributor

This isn't addressing @hasufell's point directly, but his question made me think that it's worth clarifying some things.

The HF is revamping the way it goes about its technical agenda (even more pointedly, it's revamping the way it goes about making a technical agenda). So it's natural that there's going to be some questions about process.

That said, I see no reason why community sourced can't continue living here for the HF to consider. While outside sources of funding are always appreciated (see Mercury's initial funding of the Botan work), I think it's fine if every proposal here is seen as an implicit request for HF funding.

Eventually the new committee will be fully set up (hopefully soon!), but several current HF board members and members of the former TWG are already taking part in the discussion, so my impression is that the right eyes are on this. My request for patience is on the "formal decision" aspect, as the HF won't be able to give one until the new structure is in place. Hopefully that will be in the coming few weeks.

- horizontal scaling need clarified
- tuf and hackage-security
- budget clarification
@qnikst

qnikst commented May 31, 2026

Copy link
Copy Markdown
Author

Hello, I've addressed some of the questions. Let me note them here:

@jappeace I've added a note that we plan to keep using hackage-security in new implementation, unless it's proven impossible. Any other improvement can be done in a separate proposal, or during normal development of the hackage-server. Is that enough or we should do a larger investigation of the potential problems and explain that in migration plan?

I've also added a missing note about horizontal scaling, even if it's not needed at the moment, it provides additional functionality not available otherwise, such as zero downtime rollups (or host maintenance).

As many requested I've updated budget section to keep packagdb rewrite as MVP, but if we manage to update userdb — we want to do it. Anyway our plan is to reduce load on the hackage-server as fast as possible to finally get rid of 502 errors that happens during interactions with it.

Additional benefit (not mentioned), is that in Modus Create parent company of Tweag we have quite strong security department, so most important parts will be independently reviewed. We can add that to the proposal as well.

@hasufell we'd like to get some support from the HF, if it's not possible but we have a general approval of the proposal we would try to find outsides sources of funding, though this thread does not seems to be a proper place for that discussion. Firstly, we planned to fund the work alone with an iterative approach, but found out that it's too big chunk of work to fund that internally.


To keep the discussion I tag @haskellfoundation/tech-proposals as per the process.

And I would be appreciated if further discussion would follow the policy:

@tomjaguarpaw

tomjaguarpaw commented May 31, 2026

Copy link
Copy Markdown
Contributor

[Comment no longer relevant]

@LaurentRDC

Copy link
Copy Markdown
Contributor

we'd like to get some support from the HF,

Speaking as a member of the Haskell Foundation's Board, I can say that, in general, this proposal is definitely something that the Foundation would be willing to fund. Hackage is a critical part of the Haskell ecosystem, and deserves some love.

It is perfectly appropriate to explicitly request funding from the Foundation in the proposal under the Budget heading.

@Bodigrim

Bodigrim commented May 31, 2026

Copy link
Copy Markdown
Collaborator

I am confused...what is your point?

  • If the point is 5000 is big, yes I concede that means I have a lot of reading yet to to do.

I'm generally more on the "refactor not rewrite" side, but if +5000 lines (as compared to existing 30000 in total) is what it takes just to replace a storage layer then I can definitely see an appeal to rewrite from the scratch.

@Ericson2314

Copy link
Copy Markdown
Contributor

OK Thanks @Bodigrim for clarifying.

Since then I have made first drafts of all the little commits de-event sourcing each feature (the Hackage term of art, as I am learning) / table. And then after that, I can delete the event sourcing infrastructure which is now dead code. That gets us to a current status of 103 files changed, 5492 insertions(+), 3319 deletions(-), +2200 not +5000.

@gbaz

gbaz commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

I'm not so sure of the proper protocol here. It seems very strange to have effectively a counter-proposal being implemented and described in real-time in the discussion of another proposal. I tend to think that if this development continues it should be advocated and described elsewhere, then the appropriate decision making bodies can look at both thoughtfully and critically but meanwhile discussion can remain focused on these individually.

@Ericson2314

Copy link
Copy Markdown
Contributor

That sounds fine with me. I didn't want to be a peanut gallery naysayer without any real evidence --- that would also be annoying. I feel I am now far enough along with the refactor to have provided the evidence, and I give make any further updates elsewhere.

@tomjaguarpaw

Copy link
Copy Markdown
Contributor

I give make any further updates elsewhere

As someone who will be voting on this proposal, I would be interested to know how your efforts turn out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.