Skip to content

Add Synchronoss / Verizon Cloud legal return parser (9 artifacts)#284

Open
OneSixForensics wants to merge 1 commit into
abrignoni:mainfrom
OneSixForensics:synchronoss-parser
Open

Add Synchronoss / Verizon Cloud legal return parser (9 artifacts)#284
OneSixForensics wants to merge 1 commit into
abrignoni:mainfrom
OneSixForensics:synchronoss-parser

Conversation

@OneSixForensics

@OneSixForensics OneSixForensics commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Synchronoss / Verizon Cloud legal return parser

Adds scripts/artifacts/synchronoss.py (new) and a SYNCHRONOSS icon block in scripts/report_icons.py. Parses Synchronoss / Verizon Cloud legal returns, which are commonly received in ICAC investigations.

Follows the same patterns as my merged kik.py parser, now on the modern @artifact_processor / LAVA pipeline.

Artifacts (9)

All v2 @artifact_processor, emitting HTML + TSV + timeline + LAVA:

  1. Messages (SMS and MMS)messages/YYYYMMDD.csv
  2. Calls — same CSVs, Type = call (Sender/Recipients kept verbatim — meaning flips with direction, so no "account number" mislabel)
  3. MMS Media Received / 4. MMS Media Sent — inline media, linked to the message by file-existence resolution (each attachment token resolved against the actual files in the message's own date folder), with a Link Status column (linked / referenced-but-absent → likely quarantined / ambiguous → manual review)
  4. MMS Folder Media (Unlinked) — media physically present in the mms/in|out folders but referenced only via SMIL placeholders (chiefly extensionless 0 files). Surfaced so no media is lost, dated by folder, not attributed to a message (avoids fabricated attribution)
  5. Contactscontacts_YYYYMMDD.txt (JSON); surfaces deleted contacts
  6. DV Access Log Uploads / 8. Sync Events — DV/IP logs; first IP = user vs CDN IPs; upload rows carry a SHA-256 checksum that correlates to NCMEC CyberTip files
  7. VZMOBILE Device Backup — device cloud-backup media, inline

Notes for review

  • Media registration: media is registered via an exact seeker.file_infos lookup (_register_media) rather than check_in_media. check_in_media resolves files with Path.match (glob), which fails on real-world Synchronoss filenames containing glob metacharacters (e.g. IMG_0347[1].jpg, [clips4sale.com]...) and is O(n²) per artifact. The helper otherwise mirrors check_in_media exactly (media-id/ref scheme, hardlink-or-copy, guess_mime magic-byte mimetype). Happy to switch back if check_in_media is fixed upstream — see the related issue.
  • Extensionless files (the 0 files) are typed and rendered inline via magic-byte mimetype.
  • Timestamps are UTC per Synchronoss documentation; epoch is cross-checked against the Message ID prefix.

Validation

  • Known-ground-truth synthetic dataset (group MMS, emoji, deleted contacts, DV upload/sync, inline + extensionless media, cross-date name collisions, quarantined-absent refs).
  • A real ~32 GB ICAC return (read-only): all 9 artifacts, 17,214 VZMOBILE files registered, full run end-to-end into the LAVA viewer.

Media-heavy artifacts like VZMOBILE benefit substantially from the companion LAVA WAL perf change (separate PR).

@OneSixForensics

Copy link
Copy Markdown
Contributor Author

Rebased onto current main after the base-code sync. Note this branch now also includes a one-commit fix to Context.get_source_file_path(): after the sync, file resolution verifies candidates with Path.match (glob), which silently drops media whose filenames contain glob metacharacters ([, ], *, ?) — pervasive in real device-backup returns (e.g. IMG_0347[1].jpg, [clips4sale.com]...). The parser needs that to register VZMOBILE media, and the fix is general, so it closes #286. Media now goes through the stock check_in_media. Re-validated against the new base (synthetic incl. bracketed names; real ~32 GB return = 17,214 VZMOBILE files).

@JamesHabben

JamesHabben commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

a couple general notes for your future contributions (hopefully) :)

  1. the report.py file and the icons in there are from the old processing pathway. the newer @artifact_processor decorator pathway uses the artifact-icon key declared in the v2 header. the code here is fine to include as it sits because we will be going through an overhaul of the core code once all the artifacts are converted to v2. once all artifacts are on v2 headers, we dont need any of those icons in the report module file.
  2. generally its better practice to separate things like the context class patch from a new/updated module. its easier to deal with the rebasing like you had to do for one thing. it also gives us the ability to evaluate and merge in the context patch as a separate path from evaluating and merging the module updates. as it sits, they are stuck together and need to be evaluated together. no need to update anything here either.

i am way less familiar with the R setup and usage, so i will defer to @stark4n6 or @abrignoni to evaluate the module. as for the context patch, i merged all the others in. that part of this PR is good with me to merge.

New scripts/artifacts/synchronoss.py parsing Synchronoss/Verizon Cloud legal
returns for ICAC investigations, plus a SYNCHRONOSS icon block in
scripts/report_icons.py.

Artifacts (all v2 @artifact_processor; HTML + TSV + timeline + LAVA):
  - Messages (SMS and MMS)
  - Calls
  - MMS Media Received / Sent (inline media, message-linked by file-existence
    resolution with a Link Status column)
  - MMS Folder Media (Unlinked) (extensionless "0" files referenced only via
    SMIL placeholders; surfaced, dated by folder, not attributed)
  - Contacts (surfaces deleted contacts)
  - DV Access Log Uploads / Sync Events (user-IP vs CDN split; upload checksums
    for CyberTip correlation)
  - VZMOBILE Device Backup (inline media; extensionless files typed by magic
    bytes)

Media is registered with the framework check_in_media. Validated against a real
~32 GB ICAC return (17,214 VZMOBILE files) and a known-ground-truth synthetic
dataset.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BdD6DdQA21KqDRSUjQHaTK
@OneSixForensics

Copy link
Copy Markdown
Contributor Author

Thanks @JamesHabben — good calls. Per your point #2, I've split the Context.get_source_file_path() patch out into its own PR #309 so it can be evaluated/merged independently; this PR is now just the Synchronoss module (scripts/artifacts/synchronoss.py + scripts/report_icons.py). The module relies on that resolver fix at runtime to link media with bracketed filenames, so #309 is effectively a prerequisite. Left the report_icons entries in place as you noted (fine until the v2 icon overhaul). Deferring to @stark4n6 / @abrignoni on the module itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants