Skip to content

norns converged: crone + matron --> single binary - bring-up & stabilization#1883

Merged
tehn merged 50 commits into
monome:norns-converged-2026from
colinmcardell:norns-converged-2026
May 20, 2026
Merged

norns converged: crone + matron --> single binary - bring-up & stabilization#1883
tehn merged 50 commits into
monome:norns-converged-2026from
colinmcardell:norns-converged-2026

Conversation

@colinmcardell

@colinmcardell colinmcardell commented Feb 27, 2026

Copy link
Copy Markdown
Collaborator

what

Based off the work of @catfact and @ngwese to the norns-converged branch, which was then rebased forward to the latest on main by @Dewb, this PR works through the hardware bring-up and stabilization of the norns-converged branch through a variety of fixes and tidying commits.

This PR brings the work over to a new branch on monome/norns:norns-converged-2026 in an effort to land all this great work to streamline and combine the matron and crone processes into a single process, and use shared memory rather than OSC for the communication between the control and audio threads.

the bring-up has been iterative, but I find it to be quite stable and worth sharing at this time.

generally there is more work which I'm triaging and starting to draft out a task list. at this moment I'm finishing up work on maiden-repl that I will PR to this branch soon, as I tidy it up and test it on a pre-converged system to ensure backwards compatibility. maiden-repl is broken in this PR.

notes

  • Added nng as a git submodule under third-party/nng, and updated the cmake to support both system-installed nng first, and falls back to the submodule. It does makes sense as a follow up to remove this submodule in favor of the system-installed library to follow the trend within of the build wscript. Also, the submodule is version 2.0 of nng and the system seems to install some version of 1.x, as does macOS homebrew. this is more reason to not use the submodule as a final solution, but also a note to potentially attempt to align on a specific version of nng (which is relevant for maiden-repl compilations on macOS/Linux).
  • Fixed std::bad_function_call crash on boot, initializing VU and tape poll callbacks in MixerClient. They are no longer null on first invocation
  • Expanded VU meters from 4 → 12 channels (in/out, engine, monitor, softcut, tape) across the crone → matron → Lua bindings.
  • Added visual rendering for the 8 new channels in the mix menu.
  • Improved crone::Poll thread lifecycle safety.
  • Added (and then later removed) a ConcurrentQueueWorker (io_queue) for tape and softcut operations to prevent UI blocking during heavy disk I/O.
  • Removed previously added io_queue from oracle, since it was redundant for softcut. Tape operations were still synchronous and blocking so I updated Tape.h to defer sf_open to the tape's internal disk thread making it safe to call from oracle.
  • Fixed a race condition in the main --> sidecar startup and usage by adding a sync pipe, blocking main until sidecar is ready and introduced a graceful cleanup of the sidecar process on exit.
  • Moved concurrentqueue and readerwriterqueue submodules from crone/lib/third-party/.
  • Refactored the custom POSIX queue in sidecar in favor of BlockingReaderWriterQueue.
  • Got all the c/cpp unit tests running again.
  • Switched Commands.h to use ConcurrentQueue (previously ReaderWriterQueue) because it seems like commands need support for multiple producers (Lua, and external OSC messages).
  • Added a basic unit test suite for Commands.h
  • Updated clang-format paths to include additional converged files, and includes .cc files (matron)
  • Ran clang-format
  • Resolved an issue with self-kill in calling system --> restart, where restart would stop the single binary norns-main.service, killing itself before it was able to start up again.

In order to consistently run the system on device, I've removed norns-matron.service and norns-crone.service, and added norns-main.service. I've also updated the norns.target with the appropriate changes so that the converged binary is properly started on boot.

/etc/systemd/system/norns-main.service

[Unit]
After=norns-jack.service norns-sclang.service
Requires=norns-jack.service

[Service]
Type=simple
User=we
Group=we
LimitRTPRIO=95
LimitMEMLOCK=infinity
Restart=always
RestartSec=500ms
ExecStart=/home/we/norns/build/ws-wrapper/ws-wrapper ws://*:5555 /home/we/norns/build/norns/norns

[Install]
WantedBy=norns.target

/etc/systemd/system/norns-sclang.service

[Unit]
After=norns-jack.service
Requires=norns-jack.service

[Service]
Type=simple
User=we
Group=we
LimitRTPRIO=95
LimitMEMLOCK=infinity
ExecStart=/home/we/norns/build/ws-wrapper/ws-wrapper ws://*:5556 /usr/local/bin/sclang -i maiden

[Install]
WantedBy=norns.target

/etc/systemd/system/norns.target

[Unit]
Description=norns

Requires=sockets.target
Requires=sound.target

Requires=norns-jack.service
Requires=norns-sclang.service
Requires=norns-main.service
Requires=norns-maiden.service

AllowIsolate=yes

[Install]
WantedBy=multi-user.target

Test guide: https://gist.github.com/colinmcardell/3b8982327c009c2c74f7e2a0dff22cab

@colinmcardell

Copy link
Copy Markdown
Collaborator Author

I will look into these merge conflicts. Likely an artifact of the rebase.

@Dewb

Dewb commented Feb 28, 2026

Copy link
Copy Markdown
Member

awesome!

to fix the conflict, the branch just needs to be rebased against main again (there was a small change in main to device_midi.c this week.)

@colinmcardell

Copy link
Copy Markdown
Collaborator Author

Ah! Way less fiddly than I suspected. Perfect. Will do. Thank you!

Dewb and others added 27 commits March 1, 2026 01:15
fix compilation (sidecar)

fix many compiler errors (safe-c, etc), some linker problem remains

add the RELEASE cxx flags to norns/wscript, retire crone/wscript
not sure how often this script is used but figured it should probably default to release builds
assuming the contents of this file were moved to jack_client.h at some point
- fix/rename flag to enable gprof profiling
- add flag to enable debugging/syms
- fix duplicate/conflicting c++ standard flags
- add help strings to some configure flags
performs a single fork for the sidecar process along with:
- set the child process name for visibiliy in ps
- set the thread name in child process for visibility in gdb
- send child a SIGHUP if parent dies
- disconnect child from parent stdin

the primary motivation for this change is simplicity and
to allow easier debugging. the parent process remains the
one running crone/matron.
- check for all dependencies in wscript
- fix SEGV when ^D is entered
- collapse duplicate rx loop implementations
- increase const'ness of various functions
change to libnng-dev
provides the ability to alter audio routing for jack
clients either by specific port(s) or by blocks of ports
given a routing table. routing tables for system, softcut,
and supercollider simplify connection management for those
components.

modifications to audio routing are tracked and the default
routing for all components is restored on script cleanup.

the `audio_post_restore_default_routing` hook allows mods
to tweak the routing in a consistent fashion
- removes explict handling of reverse io
- adds method to temporary disable change tracking
- capture buffer termination (issue monome#1569)
- moves async command capture logic from lua to c
- ensures sidecar requests from _norns.execute and _norns.system_cmd
  are not interleaved when called from different threads
- adds progressive capture buffer allocation in server up to
  maximum of 10MB
giving explicitly created threads names makes it
substantially easier to narrow in on a particular
thread in gdb
- init vuPoll and tapePoll callbacks in MixerClient to prevent std:bad_function_call on boot.
- expand VU meters from 4 to 12 channels across the crone -> matron -> lua bindings.
- added visual rendering for the 8 new channels in the mix menu.
* When rotated 90/270 degrees led's aren't set appropraitely

When a `Grid.rotate()` is called with 1 or 3, the correct quadrant isn't
chosen.

The issue lies in the function `dev_monome_quad_idx`. This doesn't take
into account rotation state, thus when called with something like:

```c
dev_monome_quad_idx(md->m, 2, 12)
```

It returns quadrant 2. I think we technically want quadrant 1.

We can read the rotation state from the monome device, and use that to
correctly calculate the quadrant with a slightly different formula that
both works for 128 and 256 grids.

* Fix missing variable

* We need to pass in the md reference

* Possible fix for rows/cols not being reset after a rotation

* Don't query grid for rotation status

* Use Grid.update_devices

* Respond to tehn's comments

* Linter
@colinmcardell

Copy link
Copy Markdown
Collaborator Author

@Dewb I put together a step by step guide that should get all the things going. Let's me know if anything is wonky. Happy to help.

https://gist.github.com/colinmcardell/3b8982327c009c2c74f7e2a0dff22cab

@tehn

tehn commented Mar 27, 2026

Copy link
Copy Markdown
Member

this is fantastic, thank you! working through testing.

@tehn

tehn commented Mar 27, 2026

Copy link
Copy Markdown
Member

this is working wonderfully in all of my tests so far. maiden, maiden-repl, all great.

i don't expect any issues with "hardware variants" as none of this really touched hardware-specific bits, but i'll test on a shield.

similarly, i think mods should 'just work' but i'll try a few also (i tend not to use them in my own process)

thank you again, this is very encouraging!

@ngwese

ngwese commented Mar 27, 2026 via email

Copy link
Copy Markdown
Member

@colinmcardell

Copy link
Copy Markdown
Collaborator Author

I found a "bug" in the dev workflow where calling kill on the main service causes connection to the sidecar to fail, requiring reboot/noodling. I have a patch. Will push this weekend.

@tehn

tehn commented Mar 28, 2026

Copy link
Copy Markdown
Member

one thing which probably needs to be adjusted is the names of the systems units in maiden. …it uses those to implement ;restart in the web repls i can help sort that out. if it needs to be updated. -greg

this makes sense, i'll try your new maiden dev setup and see how i do

@Dewb

Dewb commented Mar 28, 2026

Copy link
Copy Markdown
Member

So far so good for me as well. Are there any particular areas of risk/concern that should get extra attention? Tasks that involve the sidecar, messages that were formerly matron-crone but are now in-process, …?

@colinmcardell

Copy link
Copy Markdown
Collaborator Author

So far so good for me as well. Are there any particular areas of risk/concern that should get extra attention? Tasks that involve the sidecar, messages that were formerly matron-crone but are now in-process, …?

That's great! I've been focusing on reliable restarts and reconnection and general stability, trying to ensure audio/UI functions during extended sessions.

The command queue is an area that might be potential issue if a script is firing off bursts of commands to Softcut. It would have to be a burst of >2048 commands within a single process block, which would be incredible @ 128 samples at 48kHz = ~2.67ms. 2048 commands is rather good amount, but if stress testing shows that more would be needed for practical usage, we could increase it at little memory cost. The trade off there is in latency of draining the queue and potential dropouts maybe?

@tehn

tehn commented Mar 29, 2026

Copy link
Copy Markdown
Member

maiden ;restart likely fix here: monome/maiden#228

@colinmcardell

Copy link
Copy Markdown
Collaborator Author

Looking into what specifically is doing a shell out via sidecar.

  • Networking
  • System Info
  • Service restart, shutdown, reset
  • File operations

A few things worth looking into for sidecar that I can think of is large outputs for things like find, and error propagation. I think most things are working as expected.

I've been putting together little test scripts here and there, but it makes me think how nice it would be to have something like a diagnostic runner that checks all the things. :) Maybe I can start pulling together all of the little tests into something more comprehensive to be added as a SYSTEM > DIAGNOSTICS at some point.

@Dewb

Dewb commented Mar 30, 2026

Copy link
Copy Markdown
Member

I've been putting together little test scripts here and there, but it makes me think how nice it would be to have something like a diagnostic runner that checks all the things. :)

Could these become unit tests?

@colinmcardell

Copy link
Copy Markdown
Collaborator Author

Could these become unit tests?

Definitely. Something between unit and integration testing.

I will wait until the dust settles and see what I can pull together.

@tehn

tehn commented Apr 4, 2026

Copy link
Copy Markdown
Member

Do we feel like this is stable enough to merit posting to the BETA upgrade channel? (Basically people could use it by doing SYSTEM > UPDATE while holding K1 and selecting BETA).

Reversing out wouldn't be un-trivial (I'd need to make a shell script if someone wanted to revert, though I'd likely just ask them to re-image instead).

But I haven't run into any issues myself. This would allow us some time with a larger userbase and feedback--- it's no problem to incrementally edit/fix the BETA release.

Edit: if so, I can put one together!

@Dewb

Dewb commented Apr 5, 2026

Copy link
Copy Markdown
Member

I've noticed some weird behavior around recovery from supercollider fails and restarts -- I'm not sure if the same things wouldn't have happened in the main branch, but given that maiden ;restart needed updated, maybe there is some other kill-code that needs to be aligned with the new service names?

@catfact

catfact commented Apr 5, 2026

Copy link
Copy Markdown
Collaborator

i'm not sure what weirdness you've noticed, but it's not unexpected for things to be out of whack if sclang/scsynth is restarted w/o also restarting matron

@ngwese

ngwese commented Apr 12, 2026 via email

Copy link
Copy Markdown
Member

@tehn

tehn commented Apr 12, 2026

Copy link
Copy Markdown
Member

@ngwese looks like a email resend? but in case you missed it, i believe i fixed this concern: #1883 (comment)

@colinmcardell

Copy link
Copy Markdown
Collaborator Author

What do we think about merging this PR in as is (colinmcardell:norns-converged-2026 -> monome:norns-converged-2026)? It's not going into monome:main at this time, and it would be helpful to be able to flatten down some of the other work-in-progress I have, such as unit tests on the converged binary changes, without overloading this PR.

Additionally, I can share a couple of options I have been scratching out regarding the update.sh changes needed to get this into a releasable form. Maybe that can be brought over to a thread within the discussion #1879... and then once update script implementation details are hashed out, the update.sh changes can be a follow-up PR to monome:norns-converged-2026.

@Dewb

Dewb commented May 11, 2026

Copy link
Copy Markdown
Member

I’ve been using this branch for a month or two now without encountering any novel issues. I think we should be talking about what the confidence line ought to be for merging it into main for a beta.

@Dewb

Dewb commented May 11, 2026

Copy link
Copy Markdown
Member

Ah, one issue I’m having is that monome/maiden#228 did not fix ;restart, but that could be user error in integrating the updated maiden.

@colinmcardell

colinmcardell commented May 11, 2026

Copy link
Copy Markdown
Collaborator Author

Ah, one issue I’m having is that monome/maiden#228 did not fix ;restart, but that could be user error in integrating the updated maiden.

Oh good call, I can try to repro.

@tehn

tehn commented May 11, 2026

Copy link
Copy Markdown
Member

i think it'd be great to merge/flatten and get the ready for a beta!

i believe i already made the appropriate changes somewhere for update.sh but they may have gotten lost in the shuffle. it's easy for me to recreate, however.

can also test out the maiden service fix and confirm in the next couple days.

thanks again for keeping this update alive.

@colinmcardell

Copy link
Copy Markdown
Collaborator Author

Ah, one issue I’m having is that monome/maiden#228 did not fix ;restart, but that could be user error in integrating the updated maiden.

Oh good call, I can try to repro.

Interesting. Ok. Got maiden built and reproduced the issue. Two findings.

  1. the units.json change in monome/maiden#228 renames the JSON key from matron to norns. The frontend looks up units by REPL component name (still matron, hardcoded in web/src/constants.js), so ;restart resolves to undefined and never reaches the backend. Simplest fix would be to keep the key as matron while pointing at norns-main.service.
  2. Separate bug... ;restart leaks a websocket connection on every invocation. The explicit reconnect creates a new socket without closing the previous one. Each surviving connection re-receives the stdout, so REPL output appears duplicated (2x after one extra restart, 3x after two, ).

I can open a PR with fixes for each of these issues tomorrow.

@tehn

tehn commented May 18, 2026

Copy link
Copy Markdown
Member

Thanks tons, I'm doing a final round of testing and looking forward to getting this finished up

@tehn tehn merged commit 230e83c into monome:norns-converged-2026 May 20, 2026
@tehn

tehn commented May 20, 2026

Copy link
Copy Markdown
Member

pushed a change to update.sh on the branch

df737a3

can prepare a beta release now. last pending issue is the maiden repl rename: monome/maiden#232 which is testing well for me but might prefer some glance at it before i post a release

tehn added a commit that referenced this pull request May 27, 2026
* norns converged: crone + matron --> single binary - bring-up & stabilization (#1883)

* Add nng to dev/ci container definition to support work on converged branch

* single-process rewrite (squashed)

fix compilation (sidecar)

fix many compiler errors (safe-c, etc), some linker problem remains

add the RELEASE cxx flags to norns/wscript, retire crone/wscript

* fix jack_client for c++

* change CRLF endings to LF in various source files

* matron: embed lua-cjson module as `json` global

* ensure edge.sh does release builds (#1552)

not sure how often this script is used but figured it should probably default to release builds

* remove empty jack_cpu.h file (#1553)

assuming the contents of this file were moved to jack_client.h at some point

* various build improvements

- fix/rename flag to enable gprof profiling
- add flag to enable debugging/syms
- fix duplicate/conflicting c++ standard flags
- add help strings to some configure flags

* norns: only fork sidecar

performs a single fork for the sidecar process along with:
- set the child process name for visibiliy in ps
- set the thread name in child process for visibility in gdb
- send child a SIGHUP if parent dies
- disconnect child from parent stdin

the primary motivation for this change is simplicity and
to allow easier debugging. the parent process remains the
one running crone/matron.

* norns: move sidecar to nng

* ws-wrapper: replace nanomsg with nng

* maiden-repl: replace nanomsg with nng

- check for all dependencies in wscript
- fix SEGV when ^D is entered
- collapse duplicate rx loop implementations
- increase const'ness of various functions

* Update readme-setup.md

change to libnng-dev

* audio: connect, disconnect, and inspect audio routing

provides the ability to alter audio routing for jack
clients either by specific port(s) or by blocks of ports
given a routing table. routing tables for system, softcut,
and supercollider simplify connection management for those
components.

modifications to audio routing are tracked and the default
routing for all components is restored on script cleanup.

the `audio_post_restore_default_routing` hook allows mods
to tweak the routing in a consistent fashion

* audio: simplify handling of system routing

- removes explict handling of reverse io
- adds method to temporary disable change tracking

* sidecar: serialize command invocation requests

- capture buffer termination (issue #1569)
- moves async command capture logic from lua to c
- ensures sidecar requests from _norns.execute and _norns.system_cmd
  are not interleaved when called from different threads
- adds progressive capture buffer allocation in server up to
  maximum of 10MB

* matron: set thread names for easier debugging

giving explicitly created threads names makes it
substantially easier to narrow in on a particular
thread in gdb

* Run clang-format on rebased converged branch

* Add libatomic dependency for nng

* fix: add converged tape and VU bindings and expanded meters

- init vuPoll and tapePoll callbacks in MixerClient to prevent std:bad_function_call on boot.
- expand VU meters from 4 to 12 channels across the crone -> matron -> lua bindings.
- added visual rendering for the 8 new channels in the mix menu.

* refactor: improve crone `Poll` thread lifecycle safety

* fix: Initialize mix menu engine, monitor, cut, and tape parameters to prevent UI rendering crashes.

* fix: Add tape pause, resume, and loop functions

* fix: sidecar - resolves IPC startup race with a sync pipe and implement graceful cleanup

* Adds nng git submodule

* Update test wscirpt includes and .cc source mapping.

* test: remove unnecessary c-linkage from c++ matron tests

* test: migrate clock test helpers from c to c++

* matron: update atomic types in internal clock

* build: uupdate cmake includes and nng library fallback

* Adds command queue concurrency support and increased capacity + unit tests

* feat: oracle asynchronous i/o for tape and softcut operations to prevent UI blocking for heavy disk i/o

* tidy: moves the git submodules for concurrentqueue and readerwriterqueue from the ./crone/lib path to ./third-party

* refactor: replace custom queue in sidecar with BlockingReaderWriterQueue

* update: adding additional paths to clang-format.sh

* fix: apply clang-format

* fix: wscript include of nng

* refactor: update Tape to defer `sf_open` to the tapes internal disk thread, removes previously introduced `io_thread` in oracle.

* fix: resolves issue with self-kill in system restart

* fix: update no longer calls stop on old norns-crone.service

* maiden-repl: wip - removes nanomsg in favor of nng

maiden-repl application logic still needs nng registration and dialer setup

* refactor: replace legacy internal OSC comms with direct C calls and cleanup unused handlers

* build: use system libnng-dev and remove third-party/nng submodule

* fix: resolve boot instability from JACK auto-start conflict and cleans up debug logs

* refactor(maiden-repl): force IPv4, thread-safe rendering, input echo, resize support, rename crone→sc

* fix: use systemd Restart=always and self-terminate for reliable restart

* fix(ws-wrapper): add retry loop for WebSocket listener binding

adds a retry mechanism to gracefully handle momentarily held ports during restarts, preventing crash loops.

* Fix `Grid.rotation` bug seen in #1888 (#1890)

* When rotated 90/270 degrees led's aren't set appropraitely

When a `Grid.rotate()` is called with 1 or 3, the correct quadrant isn't
chosen.

The issue lies in the function `dev_monome_quad_idx`. This doesn't take
into account rotation state, thus when called with something like:

```c
dev_monome_quad_idx(md->m, 2, 12)
```

It returns quadrant 2. I think we technically want quadrant 1.

We can read the rotation state from the monome device, and use that to
correctly calculate the quadrant with a slightly different formula that
both works for 128 and 256 grids.

* Fix missing variable

* We need to pass in the md reference

* Possible fix for rows/cols not being reset after a rotation

* Don't query grid for rotation status

* Use Grid.update_devices

* Respond to tehn's comments

* Linter

* fix(sidecar): unlink stale IPC socket before bind

---------

Co-authored-by: Michael Dewberry <712405+Dewb@users.noreply.github.com>
Co-authored-by: emb <emb@catfact.net>
Co-authored-by: catfact <catfact@users.noreply.github.com>
Co-authored-by: Greg Wuller <greg@afofo.com>
Co-authored-by: brian crabtree <tehn@monome.org>
Co-authored-by: Levi Cole <lckennedy@gmail.com>

* update.sh: converged services

* changelog

* update.sh: source, nng

* version.txt

* update.sh: fix install

* fix linter check

---------

Co-authored-by: Colin McArdell <colin@colinmcardell.com>
Co-authored-by: Michael Dewberry <712405+Dewb@users.noreply.github.com>
Co-authored-by: emb <emb@catfact.net>
Co-authored-by: catfact <catfact@users.noreply.github.com>
Co-authored-by: Greg Wuller <greg@afofo.com>
Co-authored-by: Levi Cole <lckennedy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants