Skip to content

feat: Add ZeroMQ support for DFTracer#267

Draft
izzet wants to merge 39 commits into
llnl:developfrom
izzet:feature/streaming
Draft

feat: Add ZeroMQ support for DFTracer#267
izzet wants to merge 39 commits into
llnl:developfrom
izzet:feature/streaming

Conversation

@izzet

@izzet izzet commented May 27, 2025

Copy link
Copy Markdown
Collaborator
  • Updated CMakeLists.txt to include ZeroMQ and cppzmq dependencies based on the writer type.
  • Introduced DFTRACER_WRITER_TYPE_ENV constant for environment variable configuration.
  • Refactored metadata handling to use MetadataMap type instead of std::unordered_map.
  • Enhanced DFTracer and DFTLogger classes to accommodate new metadata structure.
  • Implemented ZeroMQWriter class for sending log messages over ZeroMQ.
  • Created a base WriterBase class to standardize logging interfaces for different writers.
  • Updated ChromeWriter to utilize the new base class and refactored JSON conversion methods.
  • Added configuration management for writer type selection via environment variables.

- Updated CMakeLists.txt to include ZeroMQ and cppzmq dependencies based on the writer type.
- Introduced DFTRACER_WRITER_TYPE_ENV constant for environment variable configuration.
- Refactored metadata handling to use MetadataMap type instead of std::unordered_map.
- Enhanced DFTracer and DFTLogger classes to accommodate new metadata structure.
- Implemented ZeroMQWriter class for sending log messages over ZeroMQ.
- Created a base WriterBase class to standardize logging interfaces for different writers.
- Updated ChromeWriter to utilize the new base class and refactored JSON conversion methods.
- Added configuration management for writer type selection via environment variables.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces ZeroMQ support for DFTracer along with a refactoring of logging writers to inherit from a new WriterBase interface and use a unified MetadataMap type for metadata management. Key changes include the addition of the ZeroMQWriter class, updates to configuration management and build files for ZeroMQ dependencies, and refactoring of ChromeWriter, DFLogger, and DFTracerCore to use the new metadata and writer type configuration.

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/dftracer/writer/zeromq_writer.cpp Adds ZeroMQWriter implementation and integrates ZeroMQ logging functionality.
src/dftracer/writer/writer_base.h Introduces a common base class for logging writers.
src/dftracer/writer/writer_base.cpp Implements shared JSON conversion functions for events and metadata.
src/dftracer/writer/chrome_writer.{h,cpp} Refactors ChromeWriter to inherit from WriterBase and adapt to MetadataMap.
src/dftracer/utils/configuration_manager.{h,cpp} Adds writer_type configuration via environment variables.
src/dftracer/{dftracer.cpp,df_logger.h,core/dftracer_main.{h,cpp}} Updates to use MetadataMap for metadata handling.
include/dftracer/core/typedef.h Defines MetadataMap as a type alias for metadata handling.
CMakeLists.txt and related cmake modules/configure files Update dependencies and build configuration for ZeroMQ support.
Comments suppressed due to low confidence (1)

src/dftracer/writer/chrome_writer.cpp:60

  • [nitpick] The initialization and subsequent update of 'is_first_write' seems inconsistent with its intended use for formatting; please verify that the starting value and update logic correctly reflect the expected behavior for the first log entry.
is_first_write = false;

Comment thread src/dftracer/writer/zeromq_writer.cpp Outdated
@izzet

izzet commented May 27, 2025

Copy link
Copy Markdown
Collaborator Author

Addresses #78

izzet added 21 commits June 15, 2025 06:58
- Introduced `PerfettoChromeFileWriter` for logging events in Perfetto format.
- Added `PerfettoProtoFileWriter` to handle ProtoBuf-based logging.
- Implemented `PerfettoProtoZMQWriter` for ZeroMQ integration with Perfetto.
- Removed the legacy `ZeroMQWriter` and `writer_base` implementations.
- Updated test cases to utilize the new `PerfettoChromeFileWriter`.
- Adjusted CMake configuration to conditionally compile tests based on writer type.
- Enhanced logging and error handling in the new writers.
- Updated CMakeLists.txt to include new writer type options for PERFETTO_CHROME_ZMQ.
- Modified configuration header files to define the new writer type.
- Adjusted dependency management to handle ZeroMQ when the new writer type is selected.
- Enhanced dftracer core to initialize and manage the new PERFETTO_CHROME_ZMQ writer.
- Created new writer class PerfettoChromeZMQWriter that inherits from PerfettoChromeWriterBase.
- Implemented buffer flushing and message sending logic for the new writer using ZeroMQ.
- Updated logging and metadata handling in the new writer class.
- Refactored existing PerfettoChromeFileWriter to share common functionality with the new writer base class.
izzet added 14 commits January 5, 2026 17:38
- Add automatic fabric protocol detection (CXI/TCP) for mofka server startup
- Implement PID-based process tracking for reliable server lifecycle management
- Update CMake to use Spack environment paths for LD_LIBRARY_PATH and PYTHONPATH
- Consolidate environment variable settings in set_common_properties function
Key changes:
- Add ZMQWriter class with non-blocking PUSH socket implementation
- Handle fork() safely with automatic socket reconnection in child processes
- Add CMake configuration for ZMQ dependencies (libzmq, cppzmq)
- Add test infrastructure with ZMQ sink process for integration testing
Detect fork via pid mismatch and rebuild Mofka driver/producer state
in child processes. This allows PyTorch DataLoader workers (forked from
the main process) to send POSIX I/O events through Mofka mid-run.

Previously, forked children inherited stale Mofka connections that
silently failed, so only main-process events were captured.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants