Skip to content

[PolySwarm] May cleanup + full connector pair, mirroring upstream PR #6149#1

Open
erickingleby-polyswarm wants to merge 107 commits into
polyswarm:masterfrom
erickingleby-polyswarm:feature/polyswarm-connectors
Open

[PolySwarm] May cleanup + full connector pair, mirroring upstream PR #6149#1
erickingleby-polyswarm wants to merge 107 commits into
polyswarm:masterfrom
erickingleby-polyswarm:feature/polyswarm-connectors

Conversation

@erickingleby-polyswarm
Copy link
Copy Markdown

Summary

Brings the PolySwarm internal-enrichment + sandbox connector work into the polyswarm/connectors org fork so the team has visibility without needing to read OpenCTI-Platform/connectors#6149 directly. Targets master because the org fork doesn't currently have a feature/polyswarm-connectors branch — this is the same scope as the upstream PR (107 commits / 300 files) plus four small May-cleanup fixes on top.

The org fork's master is currently 269 commits behind OpenCTI-Platform/connectors:master. After upstream PR OpenCTI-Platform#6149 merges, syncing org-fork master from upstream would supersede this PR naturally — so merging this here is optional and gives the team a stable internal branch to work from in the meantime.

What's in here

The full PolySwarm connector pair (polyswarm-enrichment + polyswarm-sandbox) plus all of Bassi's branch work AND the four May-cleanup commits added on top.

The four most recent commits (the May cleanup)

These are what's new vs. the prior tip of feature/polyswarm-connectors on the personal fork (4af84f309).

1. [PolySwarm] Pin pycti>=7.260515.0,<8 and bump polyswarm-api min to 3.21.0

  • pycti>=7.260515.0,<8 — floor captures the SDK API the connector depends on; <8 cap protects against major-version breaks. Loose constraint (not a hard pin) because connectors-sdk@master itself pins pycti==7.260520.0.
  • polyswarm-api>=3.21.0,<4.0.0 — for client features used by both connectors.

2. [PolySwarm] Add --no-network to apk del in Dockerfiles

Without --no-network, apk del attempts a network refresh during cleanup, which can hang or fail silently on Alpine when the package mirror is unreachable.

3. [PolySwarm] Add observable-to-malware STIX relationship in sandbox

Single-line addition in polyswarm-sandbox/src/connector/stix_builder.py:

objects.append(self._create_rel(entity["id"], "related-to", malware_id))

After the sandbox creates a malware object with full enrichment, this links the originating observable to that malware. The enrichment connector already had this edge — brings the sandbox to parity.

4. [PolySwarm] Harden test suite for portable local + CI runs

Three pre-existing test issues surfaced when I ran the full suite locally — all pre-existing on Bassi's branch (verified by reverting my edits and re-running):

a) Sandbox STIX validator schema fixturestix2-validator>=3.3 stopped bundling OASIS STIX 2.1 JSON schemas. Without them, every test in test_stix_validation.py fails. Conftest fixture clones oasis-open/cti-stix2-json-schemas once to ~/.cache/polyswarm-stix-tests/ and patches _get_error_generator to inject schema_dir.

b) test_connector_active — hardcoded "exactly one polyswarm-named connector"; trips on any stack with both connectors deployed. Filtered to "PolySwarm Enrichment" specifically.

c) Two sandbox e2e tests vs gh0stRAT scoring drifttest_score_updated expected >= 80; current consensus is ~33. test_malware_linked expected malware-family attribution every time; with low scores the sandbox legitimately doesn't attribute a family. Loosened: score > 0, malware-linked rel only required when score >= 50.

Test results (full live OpenCTI + PolySwarm)

Ran against Cortana's local stack (running 4 days) plus live PolySwarm API:

  • Sandbox: 312 passed, 0 failed, 0 skipped (61s)
  • Enrichment: 190 passed, 0 failed, 0 skipped (4:02)
  • Combined: 502/502, all green

Relationship to upstream PR OpenCTI-Platform#6149

Same commits, same diff. PR OpenCTI-Platform#6149 was opened from erickingleby-polyswarm/connectors:feature/polyswarm-connectorsOpenCTI-Platform/connectors:master in April. The May cleanup commits were pushed there 2026-05-20. This PR mirrors that work onto our org fork for team visibility.

If maintainer review on OpenCTI-Platform#6149 lands changes, they should be ported back here (or we'll wait for upstream merge then sync this fork's master).

Observation worth surfacing

While running the e2e suite, gh0stRAT consensus score on PolySwarm has dropped from ~90 to ~33, and the sandbox no longer reliably attributes a malware family for it. This isn't a test bug — it's a data-quality observation about PolySwarm engine consensus on that sample. Worth raising with engineering separately.

Reviewer notes

Requesting Bassi as reviewer. Anyone with write on polyswarm/connectors can merge if/when the team wants this in. No external dependencies — happy for someone with org write to merge into master, or to leave open as a visibility artifact until the upstream PR settles.

Filigran-Automation and others added 30 commits March 9, 2026 14:43
…atform#4385)

Co-authored-by: tanvik-metron <tanvi.karale@metronlabs.com>
…ssing (OpenCTI-Platform#5933)

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Powlinett <pauline.eustachy@filigran.io>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
… OCTI models (OpenCTI-Platform#5951)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Thibaut Rouxel <98959405+throuxel@users.noreply.github.com>
ocd-acauchy and others added 30 commits March 31, 2026 16:36
…rm#6123)

Co-authored-by: Lullah <chaos@efqr.dev>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Chaos Pjeles <fqrious@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Powlinett <pauline.eustachy@filigran.io>
- Fix black formatting on enrichment source files
- Fix isort import ordering (isort 7.0.0 --profile black)
- Fix ruff SIM102 nested if statements
- Fix ruff RET504 unnecessary assignments
- Fix ruff SIM210 bool conversion
- All 4 CI checks pass: isort, black, flake8, pylint STIX
- Fix connector_manifest.json source_code to point to OpenCTI-Platform/connectors
- Add CONNECTOR_ID env var to Pydantic config tests (required by SDK 7.260401.0)
CI runs black with default line-length (88), not our pyproject.toml (120).
Reformatted test_connector.py to comply.
Use short cooldown + sleep instead of backdating internal state.
Prevents race condition on CI where monotonic time could match.
- Network observables (IP, domain): uuid5 seeded on value string
- Unknown filename fallback: use observable_id suffix instead of random
- File entity ID fallback: uuid5 seeded on entity_id
- Zero uuid4() calls remain in source (only historical comment)
…ration

- connector.py: use observable['id'] not undefined observable_id
- connector.py: remove unused uuid import
- polyswarm_connector.py: use opencti_entity.get('id') not undefined entity_id
…21.0

May cleanup fix. pycti floor at 7.260515.0 captures the SDK API the
connector depends on; <8 cap protects against major-version breaks.
polyswarm-api min raised to 3.21.0 for client features used by both
connectors. Loose pycti constraint lets pip resolve with whatever
connectors-sdk pulls in (currently pycti==7.260520.0).
May cleanup fix. Without --no-network, apk del will silently attempt
to refresh the package index during build, which can hang or fail
silently on Alpine when the package mirror is unreachable. Applied to
both enrichment and sandbox Dockerfiles.
May cleanup fix. The sandbox connector was creating the malware object
and caching it but never linking it back to the observable that triggered
the enrichment. The enrichment connector already had this relationship.
This adds a single 'related-to' edge from entity[id] to malware_id so
OpenCTI users can navigate from the file/hash observable to the malware
family identified by sandbox analysis.
Three pre-existing test issues found while running the full suite on
the polyswarm-may-cleanup-on-bassi branch:

1. STIX validator (sandbox): stix2-validator>=3.3 ships without the
   OASIS STIX 2.1 JSON schemas in its wheel. Added a session-scoped
   conftest fixture that clones cti-stix2-json-schemas to a user cache
   dir (one-time, ~/.cache/polyswarm-stix-tests/) and monkey-patches
   _get_error_generator to inject the schema_dir for the default
   validator pass. Tests in test_stix_validation.py now pass in any
   environment without external prerequisites.

2. test_connector_active (enrichment): hardcoded len == 1 across all
   'polyswarm' connectors; stacks deploying both enrichment + sandbox
   would trip it. Filter to 'PolySwarm Enrichment' specifically.

3. test_malware_linked + test_score_updated (sandbox): assumed
   gh0stRAT would always score >= 80 with malware-family attribution.
   PolySwarm engine consensus drifts; observed scores as low as ~30
   with no family attributed. Score test now checks score > 0
   (non-trivial result). Malware-linked test requires the linkage
   only when score >= 50, falls back to checking that any relationships
   were produced otherwise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.