Integrate Zenoh#2362
Conversation
Codecov Report❌ Patch coverage is @@ Coverage Diff @@
## main #2362 +/- ##
==========================================
+ Coverage 69.61% 70.74% +1.13%
==========================================
Files 878 876 -2
Lines 79326 78448 -878
Branches 7126 6968 -158
==========================================
+ Hits 55220 55499 +279
+ Misses 22301 21145 -1156
+ Partials 1805 1804 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 37 files with indirect coverage changes 🚀 New features to boost your workflow:
|
b66e833 to
f8d2d42
Compare
Greptile SummaryThis PR integrates Zenoh as an alternative pub/sub transport alongside the existing LCM backend. A
Confidence Score: 4/5Safe to merge with one known defect: subscriptions silently stop working after any stop/restart cycle on a Zenoh transport.
Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant CLI as dimos CLI
participant GC as GlobalConfig
participant MF as transport_factory
participant MC as module_coordinator
participant ZT as ZenohTransport
participant ZSP as ZenohSessionPool
participant Z as ZenohPubSubBase
CLI->>GC: "apply_transport_arg(--transport=zenoh)"
GC-->>CLI: "transport = zenoh"
MC->>MF: make_transport(name, msg_type)
MF->>GC: "g.transport == zenoh?"
MF-->>MC: ZenohTransport(topic, msg_type)
MC->>ZT: start()
ZT->>Z: zenoh.start()
Z->>ZSP: acquire(config)
ZSP-->>Z: session (reused or new)
MC->>ZT: broadcast(msg)
ZT->>Z: publish(topic, msg)
Z-->>Z: _get_publisher() then put(bytes)
MC->>ZT: subscribe(callback)
ZT->>Z: subscribe(topic, callback)
Z-->>Z: declare_subscriber, track in _subscribers
MC->>ZT: stop()
ZT->>Z: zenoh.stop()
Z-->>Z: "undeclare publishers/subscribers, _stopped=True"
Note over Z: _stopped never reset on restart
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant CLI as dimos CLI
participant GC as GlobalConfig
participant MF as transport_factory
participant MC as module_coordinator
participant ZT as ZenohTransport
participant ZSP as ZenohSessionPool
participant Z as ZenohPubSubBase
CLI->>GC: "apply_transport_arg(--transport=zenoh)"
GC-->>CLI: "transport = zenoh"
MC->>MF: make_transport(name, msg_type)
MF->>GC: "g.transport == zenoh?"
MF-->>MC: ZenohTransport(topic, msg_type)
MC->>ZT: start()
ZT->>Z: zenoh.start()
Z->>ZSP: acquire(config)
ZSP-->>Z: session (reused or new)
MC->>ZT: broadcast(msg)
ZT->>Z: publish(topic, msg)
Z-->>Z: _get_publisher() then put(bytes)
MC->>ZT: subscribe(callback)
ZT->>Z: subscribe(topic, callback)
Z-->>Z: declare_subscriber, track in _subscribers
MC->>ZT: stop()
ZT->>Z: zenoh.stop()
Z-->>Z: "undeclare publishers/subscribers, _stopped=True"
Note over Z: _stopped never reset on restart
Reviews (10): Last reviewed commit: "Merge branch 'main' into paul/feat-integ..." | Re-trigger Greptile |
0afc3a9 to
4472fc9
Compare
| def make_transport( | ||
| name: str, msg_type: type | None = None, *, g: GlobalConfig = global_config | ||
| ) -> PubSubTransport[Any]: | ||
| """Construct the active-backend pub/sub transport for a logical channel. |
There was a problem hiding this comment.
Few things, Transport isn't neccessarily a PubSubTransport - we can use TCP and IP address as a setting here etc (in theory, haven't implemented)
in case of PubSubTransport, a string doesn't define a full topic, there is a reason why Topic for LCM and Zenoh is a different object. Zenoh offers QoS settings etc per channel. maybe specific router config etc.
So I'm thinking when global switch zenoh or LCM is used, for lcm that can literally be just LCM(topic_string) but zenoh probably wants reliable delivery for RPC specifically or specific per topic configuration (Image can be unreliable, but not agent messages)
I'm not sure right now what to suggest here - if we can normalize transport requirements across topics "this is reliable", "this is unreliable" or if we need per transport global blueprint config overlay that this global config switch just applies? global overlay seems better to me
| return Topic(topic=topic) | ||
|
|
||
|
|
||
| class ZenohRPC(PubSubRPCMixin[Topic, Any], PickleZenoh): |
There was a problem hiding this comment.
We can do this initially, but Zenoh actually supports RPC on their protocol level, so we dont need to piggyback to pubsub here.
eab7ba7 to
ee8cb1f
Compare
96fead3 to
719fd0b
Compare
259ef4b to
3427b07
Compare
3427b07 to
b85edf4
Compare
|
|
||
| from dimos.msgs.helpers import resolve_msg_type | ||
| from dimos.protocol.pubsub.encoders import LCMEncoderMixin, PickleEncoderMixin | ||
| from dimos.protocol.pubsub.impl.lcmpubsub import Topic as Topic |
There was a problem hiding this comment.
you should implement your own Topic which configures Zenoh-specific topic settings like QoS, instead of adding another global config layer for qos
moule.image.transport = Zenoh(Topic("/bla", qos=...))
I think simplifies this a lot
| connection.cmd_vel.transport = make_transport(f"{prefix}/cmd_vel", Twist) | ||
|
|
||
| connection.camera_info.transport = LCMTransport(f"{prefix}/camera_info", CameraInfo) | ||
| connection.camera_info.transport = make_transport(f"{prefix}/camera_info", CameraInfo) |
There was a problem hiding this comment.
can remove these deploy() functions, not to mention transports within modules..
| stop_drain() | ||
| with self._subscriber_lock: | ||
| for subscriber in self._subscribers: | ||
| subscriber.undeclare() | ||
| self._subscribers.clear() | ||
| with self._publisher_lock: | ||
| for publisher in self._publishers.values(): | ||
| publisher.undeclare() | ||
| self._publishers.clear() | ||
| super().stop() | ||
|
|
||
|
|
||
| class Zenoh( # type: ignore[misc] | ||
| LCMEncoderMixin, | ||
| ZenohPubSubBase, |
There was a problem hiding this comment.
_stopped is never reset on start(), silently breaking subscriptions after a stop/restart cycle
ZenohPubSubBase.stop() sets _stopped = True, but neither ZenohService.start() (the only start() in the hierarchy) nor any other path resets it. After a stop() + start() cycle — which the ZenohTransport.broadcast() auto-start path triggers on any post-stop publish — every subsequent subscribe() call silently declares a Zenoh subscriber and immediately undeclares it, returning a no-op closure. The callback is never invoked, and no error is logged.
test_stop_and_restart tests the _started flag on the outer transport but does not exercise subscription delivery after restart, so the regression goes undetected. ZenohPubSubBase.start() (or an override in the class) needs to reset self._stopped = False (under _subscriber_lock) before the session is re-acquired.
Problem
We need to support Zenoh as well.
Closes DIM-955
Solution
How to Test
Run a blueprint with Zenoh communication:
Start
humancli, also with Zenoh:Contributor License Agreement