Skip to content

fix: detect dead upload workers while draining#1218

Open
ratishoberoi wants to merge 1 commit into
qdrant:devfrom
ratishoberoi:fix-upload-points-dead-worker-hang
Open

fix: detect dead upload workers while draining#1218
ratishoberoi wants to merge 1 commit into
qdrant:devfrom
ratishoberoi:fix-upload-points-dead-worker-hang

Conversation

@ratishoberoi
Copy link
Copy Markdown

@ratishoberoi ratishoberoi commented May 28, 2026

Summary

Fixes #973.

  • Poll worker process health while waiting for ParallelWorkerPool output.
  • Keep the existing processing_timeout behavior for healthy workers.
  • Add a focused regression test for a child process that exits without sending QueueSignals.error.

Tests

  • /tmp/qdrant-client-973-venv/bin/python -m pytest tests/test_parallel_processor.py -q
  • /tmp/qdrant-client-973-venv/bin/python -m ruff check qdrant_client/parallel_processor.py tests/test_parallel_processor.py
  • /tmp/qdrant-client-973-venv/bin/python -m mypy qdrant_client/parallel_processor.py tests/test_parallel_processor.py --disallow-incomplete-defs --disallow-untyped-defs
  • git diff --check

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you installed pre-commit with pip3 install pre-commit and set up hooks with pre-commit install?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@netlify
Copy link
Copy Markdown

netlify Bot commented May 28, 2026

Deploy Preview for poetic-froyo-8baba7 ready!

Name Link
🔨 Latest commit be7903a
🔍 Latest deploy log https://app.netlify.com/projects/poetic-froyo-8baba7/deploys/6a17e04d12143e0008e38819
😎 Deploy Preview https://deploy-preview-1218--poetic-froyo-8baba7.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0ada41cb-5faa-4430-a925-4700e4ce7d23

📥 Commits

Reviewing files that changed from the base of the PR and between 3ce3042 and be7903a.

📒 Files selected for processing (2)
  • qdrant_client/parallel_processor.py
  • tests/test_parallel_processor.py

📝 Walkthrough

Walkthrough

This PR adds deadline-based health polling to the worker pool's output queue retrieval. A new _get_output() helper method replaces single blocking queue.get() calls with repeated polling on a deadline, invoking periodic health checks between attempts. The method is integrated into two call sites in unordered_map: the main streaming loop and the post-input drain loop. A regression test with a CrashingWorker confirms the pool now detects worker failures during draining and surfaces the expected RuntimeError.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main fix: detecting dead upload workers while draining the output queue.
Description check ✅ Passed The description provides context by referencing issue #973 and explaining the changes: polling worker health while waiting for output, preserving timeout behavior, and adding a regression test.
Linked Issues check ✅ Passed The changes address issue #973 by introducing health polling during output queue operations to detect dead workers that don't send error signals, preventing hangs during draining.
Out of Scope Changes check ✅ Passed All changes are scoped to addressing the worker health detection issue: adding _get_output() helper, updating unordered_map loops, and adding a targeted regression test.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant