Skip to content

Integrate Zenoh#2362

Open
paul-nechifor wants to merge 2 commits into
mainfrom
paul/feat-integrate-zenoh
Open

Integrate Zenoh#2362
paul-nechifor wants to merge 2 commits into
mainfrom
paul/feat-integrate-zenoh

Conversation

@paul-nechifor

@paul-nechifor paul-nechifor commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Problem

We need to support Zenoh as well.

Closes DIM-955

Solution

How to Test

Run a blueprint with Zenoh communication:

uv run dimos --transport=zenoh --simulation run unitree-go2-agentic

Start humancli, also with Zenoh:

uv run humancli --transport=zenoh

Contributor License Agreement

  • I have read and approved the CLA.

@paul-nechifor paul-nechifor marked this pull request as draft June 5, 2026 02:33
@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.18182% with 108 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
dimos/robot/cli/topic.py 30.00% 21 Missing ⚠️
dimos/core/transport.py 70.49% 11 Missing and 7 partials ⚠️
dimos/protocol/pubsub/impl/zenohpubsub.py 90.78% 13 Missing and 1 partial ⚠️
dimos/visualization/rerun/bridge.py 62.50% 5 Missing and 4 partials ⚠️
dimos/perception/detection/module3D.py 12.50% 7 Missing ⚠️
dimos/protocol/pubsub/benchmark/testdata.py 46.15% 7 Missing ⚠️
dimos/perception/detection/module2D.py 0.00% 5 Missing ⚠️
dimos/protocol/service/zenohservice.py 92.72% 2 Missing and 2 partials ⚠️
dimos/hardware/sensors/camera/realsense/camera.py 50.00% 3 Missing ⚠️
dimos/agents/web_human_input.py 60.00% 2 Missing ⚠️
... and 11 more
@@            Coverage Diff             @@
##             main    #2362      +/-   ##
==========================================
+ Coverage   69.61%   70.74%   +1.13%     
==========================================
  Files         878      876       -2     
  Lines       79326    78448     -878     
  Branches     7126     6968     -158     
==========================================
+ Hits        55220    55499     +279     
+ Misses      22301    21145    -1156     
+ Partials     1805     1804       -1     
Flag Coverage Δ
OS-ubuntu-24.04-arm 63.41% <89.27%> (+0.41%) ⬆️
OS-ubuntu-latest 66.19% <89.27%> (+0.36%) ⬆️
Py-3.10 66.19% <89.27%> (+0.37%) ⬆️
Py-3.11 66.19% <89.27%> (+0.37%) ⬆️
Py-3.12 ?
Py-3.13 66.19% <89.27%> (+0.36%) ⬆️
Py-3.14 63.41% <89.27%> (-2.43%) ⬇️
Py-3.14t ?
SelfHosted-macOS ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
dimos/agents/conftest.py 88.00% <100.00%> (ø)
dimos/agents/mcp/conftest.py 88.00% <100.00%> (ø)
dimos/agents/mcp/mcp_server.py 91.12% <100.00%> (ø)
dimos/agents/mcp/test_tool_stream.py 94.03% <100.00%> (ø)
dimos/agents/mcp/tool_stream.py 92.23% <100.00%> (+0.07%) ⬆️
dimos/core/coordination/coordinator_rpc.py 88.00% <100.00%> (+0.24%) ⬆️
dimos/core/coordination/module_coordinator.py 83.85% <100.00%> (+0.39%) ⬆️
dimos/core/coordination/test_module_reloading.py 100.00% <ø> (ø)
dimos/core/global_config.py 83.95% <100.00%> (+2.52%) ⬆️
dimos/core/module.py 77.61% <100.00%> (ø)
... and 37 more

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@paul-nechifor paul-nechifor force-pushed the paul/feat-integrate-zenoh branch from b66e833 to f8d2d42 Compare June 5, 2026 02:34
@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR integrates Zenoh as an alternative pub/sub transport alongside the existing LCM backend. A GlobalConfig.transport field (lcm | zenoh, defaulting to zenoh on macOS) drives the selection, and a new transport_factory module wires up ZenohTransport/pZenohTransport in place of their LCM counterparts throughout the system.

  • New Zenoh stack: ZenohSessionPool for process-scoped session reuse, ZenohPubSubBase with per-key-expr QoS, ZenohRPC (pub/sub RPC), ZenohTF (transform frames), and a ZenohService base that threads the shared session through all sub-components.
  • Coercion layer (_coerce_transport_to_backend): blueprint-pinned LCM transports are transparently rebuilt as Zenoh transports (and vice-versa) when the global switch differs, leaving deliberate non-default choices (JPEG, SHM, ROS, DDS) untouched.
  • Bridge update: _resolve_pubsubs in RerunBridgeModule selects the correct backend at runtime, treating the historical [LCM()] field value as the legacy default rather than an explicit override.

Confidence Score: 4/5

Safe to merge with one known defect: subscriptions silently stop working after any stop/restart cycle on a Zenoh transport.

ZenohPubSubBase._stopped is set to True in stop() but is never reset when start() is re-called. Any code path that stops a ZenohTransport/pZenohTransport and then calls subscribe() again — including the auto-start guard in broadcast(), which re-enters start() after a stop — will silently receive a no-op unsubscribe function instead of a live subscription. No error is logged, and test_stop_and_restart only checks the outer _started flag, not subscription delivery after the cycle.

dimos/protocol/pubsub/impl/zenohpubsub.py (the stop()/start() lifecycle and the _stopped flag) and dimos/core/test_zenoh_transport.py (the test_stop_and_restart test needs a round-trip subscription assertion after the restart).

Important Files Changed

Filename Overview
dimos/protocol/pubsub/impl/zenohpubsub.py New Zenoh pub/sub base class with LCM- and pickle-encoding subclasses. Contains a real bug: _stopped is set in stop() but never reset in start(), causing all subsequent subscribe() calls to silently return a no-op after a stop/restart cycle.
dimos/protocol/service/zenohservice.py New Zenoh session pool and service base. Session pooling by config key is clean; start() is the path that needs to reset _stopped in subclasses.
dimos/core/transport.py Implements ZenohTransport and pZenohTransport. The _start_lock removal from DDSTransport was flagged in a prior review. The auto-restart guard in broadcast() won't correctly restore subscriptions after stop due to the _stopped bug in ZenohPubSubBase.
dimos/core/transport_factory.py New factory for backend-agnostic transport construction. Topic-prefix mapping (LCM /foo ↔ Zenoh dimos/foo), typed vs. pickled selection, and CLI --transport parsing all look correct.
dimos/core/global_config.py Adds transport (defaulting to zenoh on macOS, lcm elsewhere) and zenoh_qos fields with env-var aliases. validate_assignment=True enables runtime validation on update(). Clean.
dimos/core/coordination/module_coordinator.py Adds _coerce_transport_to_backend to remap blueprint-pinned LCM↔Zenoh transports on the fly. Topic stripping logic is correct.
dimos/visualization/rerun/bridge.py Adds _resolve_pubsubs to select the active transport backend for the bridge. Correctly reads from config.g to pick up CLI overrides propagated via the worker GlobalConfig.
dimos/protocol/rpc/pubsubrpc.py Adds ZenohRPC, mirroring LCMRPC with Zenoh-compatible topic generation under dimos/rpc/. Clean.
dimos/protocol/pubsub/impl/zenohqos.py New Pydantic model and defaults for per-key-expr Zenoh publisher QoS rules. Import-free from zenoh/dimos so it's safe to use in lightweight entry-points.
dimos/core/test_zenoh_transport.py Good coverage for transport selection, coercion, and basic round-trip. test_stop_and_restart only asserts _started, not whether subscriptions work after restart — missing the _stopped regression.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant CLI as dimos CLI
    participant GC as GlobalConfig
    participant MF as transport_factory
    participant MC as module_coordinator
    participant ZT as ZenohTransport
    participant ZSP as ZenohSessionPool
    participant Z as ZenohPubSubBase

    CLI->>GC: "apply_transport_arg(--transport=zenoh)"
    GC-->>CLI: "transport = zenoh"

    MC->>MF: make_transport(name, msg_type)
    MF->>GC: "g.transport == zenoh?"
    MF-->>MC: ZenohTransport(topic, msg_type)

    MC->>ZT: start()
    ZT->>Z: zenoh.start()
    Z->>ZSP: acquire(config)
    ZSP-->>Z: session (reused or new)

    MC->>ZT: broadcast(msg)
    ZT->>Z: publish(topic, msg)
    Z-->>Z: _get_publisher() then put(bytes)

    MC->>ZT: subscribe(callback)
    ZT->>Z: subscribe(topic, callback)
    Z-->>Z: declare_subscriber, track in _subscribers

    MC->>ZT: stop()
    ZT->>Z: zenoh.stop()
    Z-->>Z: "undeclare publishers/subscribers, _stopped=True"
    Note over Z: _stopped never reset on restart
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant CLI as dimos CLI
    participant GC as GlobalConfig
    participant MF as transport_factory
    participant MC as module_coordinator
    participant ZT as ZenohTransport
    participant ZSP as ZenohSessionPool
    participant Z as ZenohPubSubBase

    CLI->>GC: "apply_transport_arg(--transport=zenoh)"
    GC-->>CLI: "transport = zenoh"

    MC->>MF: make_transport(name, msg_type)
    MF->>GC: "g.transport == zenoh?"
    MF-->>MC: ZenohTransport(topic, msg_type)

    MC->>ZT: start()
    ZT->>Z: zenoh.start()
    Z->>ZSP: acquire(config)
    ZSP-->>Z: session (reused or new)

    MC->>ZT: broadcast(msg)
    ZT->>Z: publish(topic, msg)
    Z-->>Z: _get_publisher() then put(bytes)

    MC->>ZT: subscribe(callback)
    ZT->>Z: subscribe(topic, callback)
    Z-->>Z: declare_subscriber, track in _subscribers

    MC->>ZT: stop()
    ZT->>Z: zenoh.stop()
    Z-->>Z: "undeclare publishers/subscribers, _stopped=True"
    Note over Z: _stopped never reset on restart
Loading

Reviews (10): Last reviewed commit: "Merge branch 'main' into paul/feat-integ..." | Re-trigger Greptile

Comment thread dimos/protocol/service/zenohservice.py Outdated
Comment thread dimos/protocol/pubsub/impl/zenohpubsub.py Outdated
Comment thread dimos/visualization/rerun/bridge.py
Comment thread pyproject.toml
@paul-nechifor paul-nechifor force-pushed the paul/feat-integrate-zenoh branch 6 times, most recently from 0afc3a9 to 4472fc9 Compare June 9, 2026 00:56
def make_transport(
name: str, msg_type: type | None = None, *, g: GlobalConfig = global_config
) -> PubSubTransport[Any]:
"""Construct the active-backend pub/sub transport for a logical channel.

@leshy leshy Jun 9, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few things, Transport isn't neccessarily a PubSubTransport - we can use TCP and IP address as a setting here etc (in theory, haven't implemented)

in case of PubSubTransport, a string doesn't define a full topic, there is a reason why Topic for LCM and Zenoh is a different object. Zenoh offers QoS settings etc per channel. maybe specific router config etc.

So I'm thinking when global switch zenoh or LCM is used, for lcm that can literally be just LCM(topic_string) but zenoh probably wants reliable delivery for RPC specifically or specific per topic configuration (Image can be unreliable, but not agent messages)

I'm not sure right now what to suggest here - if we can normalize transport requirements across topics "this is reliable", "this is unreliable" or if we need per transport global blueprint config overlay that this global config switch just applies? global overlay seems better to me

return Topic(topic=topic)


class ZenohRPC(PubSubRPCMixin[Topic, Any], PickleZenoh):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this initially, but Zenoh actually supports RPC on their protocol level, so we dont need to piggyback to pubsub here.

@paul-nechifor paul-nechifor force-pushed the paul/feat-integrate-zenoh branch 2 times, most recently from eab7ba7 to ee8cb1f Compare June 10, 2026 05:51
@paul-nechifor paul-nechifor changed the title WIP: integrate zenoh Integrate Zenoh Jun 10, 2026
@paul-nechifor paul-nechifor marked this pull request as ready for review June 10, 2026 11:57
Comment thread dimos/core/transport.py
@paul-nechifor paul-nechifor force-pushed the paul/feat-integrate-zenoh branch from 96fead3 to 719fd0b Compare June 12, 2026 20:37
@paul-nechifor paul-nechifor force-pushed the paul/feat-integrate-zenoh branch 8 times, most recently from 259ef4b to 3427b07 Compare June 22, 2026 12:22
@paul-nechifor paul-nechifor force-pushed the paul/feat-integrate-zenoh branch from 3427b07 to b85edf4 Compare June 23, 2026 04:34
@github-actions github-actions Bot added the ready-to-merge Required CI checks have passed on this PR label Jun 24, 2026

from dimos.msgs.helpers import resolve_msg_type
from dimos.protocol.pubsub.encoders import LCMEncoderMixin, PickleEncoderMixin
from dimos.protocol.pubsub.impl.lcmpubsub import Topic as Topic

@leshy leshy Jun 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should implement your own Topic which configures Zenoh-specific topic settings like QoS, instead of adding another global config layer for qos

moule.image.transport = Zenoh(Topic("/bla", qos=...))

I think simplifies this a lot

connection.cmd_vel.transport = make_transport(f"{prefix}/cmd_vel", Twist)

connection.camera_info.transport = LCMTransport(f"{prefix}/camera_info", CameraInfo)
connection.camera_info.transport = make_transport(f"{prefix}/camera_info", CameraInfo)

@leshy leshy Jun 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove these deploy() functions, not to mention transports within modules..

@github-actions github-actions Bot removed the ready-to-merge Required CI checks have passed on this PR label Jun 24, 2026
Comment on lines +253 to +267
stop_drain()
with self._subscriber_lock:
for subscriber in self._subscribers:
subscriber.undeclare()
self._subscribers.clear()
with self._publisher_lock:
for publisher in self._publishers.values():
publisher.undeclare()
self._publishers.clear()
super().stop()


class Zenoh( # type: ignore[misc]
LCMEncoderMixin,
ZenohPubSubBase,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _stopped is never reset on start(), silently breaking subscriptions after a stop/restart cycle

ZenohPubSubBase.stop() sets _stopped = True, but neither ZenohService.start() (the only start() in the hierarchy) nor any other path resets it. After a stop() + start() cycle — which the ZenohTransport.broadcast() auto-start path triggers on any post-stop publish — every subsequent subscribe() call silently declares a Zenoh subscriber and immediately undeclares it, returning a no-op closure. The callback is never invoked, and no error is logged.

test_stop_and_restart tests the _started flag on the outer transport but does not exercise subscription delivery after restart, so the regression goes undetected. ZenohPubSubBase.start() (or an override in the class) needs to reset self._stopped = False (under _subscriber_lock) before the session is re-acquired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants