Skip to content

Migrate Kik parser to the LAVA pipeline (@artifact_processor)#287

Open
OneSixForensics wants to merge 1 commit into
abrignoni:mainfrom
OneSixForensics:kik-lava
Open

Migrate Kik parser to the LAVA pipeline (@artifact_processor)#287
OneSixForensics wants to merge 1 commit into
abrignoni:mainfrom
OneSixForensics:kik-lava

Conversation

@OneSixForensics

@OneSixForensics OneSixForensics commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Migrate the Kik parser to the LAVA pipeline

kik.py (merged earlier) hand-builds ArtifactHtmlReport objects, so its artifacts render in HTML but never populate the LAVA database — they don't appear in the LAVA viewer. This rebuilds the module onto the modern @artifact_processor decorator so all artifacts emit HTML + TSV + timeline + LAVA from a single (data_headers, data_list, source_path) return. No parsing logic changed.

Changes

  • Functions renamed to match their __artifacts_v2__ keys (the decorator and plugin loader key off the function name).
  • Subscriber PDF split: the subscriber function previously wrote three tables (Info, Profile Pics, Events) but only kik_subscriber was a declared artifact, so Profile Pics and Events would never reach LAVA. Split into three declared artifacts (kik_subscriber / kik_subscriber_pics / kik_subscriber_events) sharing a cached PDF text extract. report_icons.py already contained icon entries for all three.
  • Chat media: now a typed ('Media','media') column registered via an exact seeker.file_infos lookup (handles the Base64-encoded medias/ filenames), rendering inline in both HTML and LAVA. Referenced-but-absent media shows the filename with an empty media cell.
  • Human-readable timestamp columns typed ('…','datetime') for the LAVA timeline.
  • Schema-fallback (raw passthrough on unexpected headers) preserved.

Artifact count goes 13 → 15 (the two new subscriber sub-artifacts).

Validation

End-to-end on a synthetic Kik return (a generated subscriber PDF + every CSV + a real media file): all 15 artifacts register in LAVA with correct row counts, and the media item/reference is created (real file linked, absent file shown by name).

Real-return QA recommended before merge — I validated against a faithful synthetic return; if you have a sanitized Kik return to confirm against, even better.

Companion to the Synchronoss parser PR (same LAVA-migration recipe).

kik.py previously hand-built ArtifactHtmlReport objects, so its artifacts
rendered in HTML but never populated the LAVA database. Rebuilt all functions
onto the modern @artifact_processor decorator: each returns
(data_headers, data_list, source_path) and the framework writes
HTML + TSV + timeline + LAVA.

Changes:
- Functions renamed to match their __artifacts_v2__ keys (the decorator and
  plugin loader key off the function name).
- The subscriber PDF previously emitted three tables from one function
  (Info, Profile Pics, Events); only the first was a declared artifact, so
  Pics/Events would not appear in LAVA. Split into three declared artifacts
  (kik_subscriber / kik_subscriber_pics / kik_subscriber_events) sharing a
  cached PDF text extract. report_icons.py already had entries for all three.
- Chat media now uses a typed ('Media','media') column registered via the
  framework check_in_media, rendering inline in HTML and LAVA; absent media
  shows the filename with an empty media cell.
- Human-readable timestamp columns typed ('...','datetime') for LAVA timeline.
- Schema-fallback behaviour preserved.

Validated end-to-end on a synthetic Kik return (PDF + all CSVs + a media file):
all 15 artifacts register in LAVA with correct counts; media item/reference
created. Real-return QA recommended before merge.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BdD6DdQA21KqDRSUjQHaTK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant