You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been wrestling with this intermittently over the last couple of weeks.
The problem is:
You create a sandbox
You edit a workflow in the sandbox
Someone else edits the same workflow on main
The two workflows have now diverged. You cannot simply take one version of the workflow and use it in production: you must merge the changes together.
Since the inception of sandboxes there's always been a rough plan to resolve this:
Use version hashes to track the edit history of a workflow so that we can recognise a conflict (works great!)
On conflict, you can either duplicate the workflows or use git branches to resolve
I've been trying to create clear concise documentation on this and it's just hard.
Problems
There are some problems in the status quo:
because of how merging works with the CLI, when merging staging into main, I have to resolve the merge on the main branch. Which means I have to force-push to main on the app. Which feels icky to me. In git when there's a conflict I resolve it on the branch and then merge into main. It should be the same in the app: I resolve the conflict in the sandbox and then can merge without divergence warnings.
It's too hard to tell which steps have diverged. We track versions for workflows but not steps. Diffs are easy but divergence is hard.
Future Flow
Here's a vision of how I think conflict resolution should be working:
In the app, I want to merge staging into main
I have a conflict - divergence in the app, oh no!1. I pull both projects locally
I checkout staging (because I want to pull main into it and "fast forward" )
I run openfn merge main
The merge should IGNORE older changes on main, and generate conflict files on any divergence for newer changes
I arduously resolve all conflicts
My local workflow histories include any missing versions from main
I push to staging on the app (including an updated history!)
In app, I now merge staging into main
No divergence now, because main head is in the staging history
Gotchas abound here.
Required work
I might have released a simple conflict helping function by the time we start this epic. If not, we'll need to create one. See CLI: openfn reconcile #1181
When we merge right now, we basically dumbly override all workflows from the source into the target. So if main is the source and staging is the target, the merge is just going revert all changes on staging. What you actually want is to understand the direction of merge: only merge workflows that are newer (that have version histories since the target), and never to roll back. I suppose I should double check the code but I'm sure we don't do this check today
When merging, we need to do some syncing of version histories. If we've merged state X into a workflow, then X must appear in that workflow's history. There was some doubt about whether we needed to do this - but now I see it
When pushing to the provisioner, we need to be able to send an updated version history. This is needed so that main's workflows sit on the version history of staging, and so the app doesn't report divergence. Otherwise, when resolving the merge and pushing, the app will still tell you that there's a divergence.
Additional: Saving History
The step arduously resolve all conflicts could be improved. The problem is that the right now, the CLI can only report on what's different between workflows. It can't report on which steps/expressions/lines have diverged between projects. That's a critical difference.
In git, when you're merging two branches, there are three reference points:
The state of the branch now
The state of the branch when it was was created
The state of the target
Git uses a three-way merge algorithm to work out which lines to keep on the original branch, which to apply on the new branch, and where both branches have made a change in the same place. It can do this because a full history is available at that point.
In the CLI (and Lightning), we really only have two reference points: the state of the target and the state of the source.
So when there's a conflict when comparing the branches we can do a deep diff to tell what's changed. But we can't really tell where a conflict exists. Like a diff between the branches isn't actually helpful - what you really need is to know the conflicts.
To be able to do anything like this, we need to save the whole workflow history: the spec for each hash.
If we had this, we could produce much smarter diffs and make conflict resolution easier.
Things to consider:
Should the history sync with the app? Or is it just local?
The app already saves snapshots - so the app could save a snapshot with a version hash, and now the app is a backup of a project's whole history
When should old versions be cleared? You should only need to save the versions which are actually referenced somewhere locally (or perhaps in the app). This should be calculated but is obviously harder in a distributed system (so maybe we just preserve the local versions)
A lot of workflows aren't too big - but some are massive because they embed data sets. So we likely need the capability to switch history tracking off
Could we utilise git internally to represent versions as commits? Git compresses really effectively and tracks patches - not something we want to get involved with
I've been wrestling with this intermittently over the last couple of weeks.
The problem is:
Since the inception of sandboxes there's always been a rough plan to resolve this:
I've been trying to create clear concise documentation on this and it's just hard.
Problems
There are some problems in the status quo:
Future Flow
Here's a vision of how I think conflict resolution should be working:
openfn merge mainGotchas abound here.
Required work
mainis the source andstagingis the target, the merge is just going revert all changes on staging. What you actually want is to understand the direction of merge: only merge workflows that are newer (that have version histories since the target), and never to roll back. I suppose I should double check the code but I'm sure we don't do this check todayAdditional: Saving History
The step
arduously resolve all conflictscould be improved. The problem is that the right now, the CLI can only report on what's different between workflows. It can't report on which steps/expressions/lines have diverged between projects. That's a critical difference.In git, when you're merging two branches, there are three reference points:
Git uses a three-way merge algorithm to work out which lines to keep on the original branch, which to apply on the new branch, and where both branches have made a change in the same place. It can do this because a full history is available at that point.
In the CLI (and Lightning), we really only have two reference points: the state of the target and the state of the source.
So when there's a conflict when comparing the branches we can do a deep diff to tell what's changed. But we can't really tell where a conflict exists. Like a diff between the branches isn't actually helpful - what you really need is to know the conflicts.
To be able to do anything like this, we need to save the whole workflow history: the spec for each hash.
If we had this, we could produce much smarter diffs and make conflict resolution easier.
Things to consider: