Skip to content

Add NCCL RAS monitoring for distributed training diagnostics#104

Open
asaiacai wants to merge 2 commits into
mainfrom
claude/nccl-ras-log-polling-08hRB
Open

Add NCCL RAS monitoring for distributed training diagnostics#104
asaiacai wants to merge 2 commits into
mainfrom
claude/nccl-ras-log-polling-08hRB

feat(nccl-ras): prefer ncclras -f json -v, fall back to verbose socke…

2d3e994
Select commit
Loading
Failed to load commit list.