[codex] Add meeting memory transcriber by Savin99 · Pull Request #74 · salute-developers/GigaAM

Savin99 · 2026-06-16T13:20:26Z

What changed

Adds the local mac_transcriber pipeline for GigaAM transcription, meeting report generation, Zoom backfill/import helpers, launchd examples, and local web UI assets.
Adds repository/agent instructions plus implementation notes for the meeting memory work.
Includes tests for archive handling, diarization/reporting, memory DB behavior, service endpoints, Markdown transcription CLI, Zoom backfill, and Zoom import.
Renders adaptive protocol sections in Markdown and HTML reports.

Validation

git diff --cached --check
.venv/bin/python -m pytest mac_transcriber/tests (83 passed)

Notes

Real secrets and local transcript artifacts were left out of git; examples use placeholders only.

…g-memory work - AI report critic (MAC_TRANSCRIBER_REPORT_CRITIC_MODEL, off by default): structured edit-ops (keep/drop/merge/rewrite) for dedup, cross-section dedup and ASR/clarity rewrite of formal sections; citations preserved from source items, re-validated with rollback to the pre-critic report on failure. - recovery: stop raw verbatim transcript utterances leaking into formal sections via recover_base_items (_is_clean_recovered_statement). - includes pending meeting-memory-postgres pipeline work (asr/memory_db/service/ scripts/tests). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add MAC_TRANSCRIBER_VAD flag to choose the per-track speech detector in build_segments: rms (default, unchanged) or silero (silero-vad neural VAD). silero fragments less and keeps in-sentence boundaries intact while capturing slightly more speech at comparable speed. Segment timestamps stay absolute (source sample position), so chronological order across tracks and the pyannote diarization path are unaffected; in multi-track mode the speaker is still the track, not the VAD. Falls back to rms with a logged warning when silero-vad is missing. Includes ab_vad_eval.py A/B harness and test_vad.py coverage. Co-authored-by: Claude <claude@anthropic.com>

Flip MAC_TRANSCRIBER_VAD default from rms to silero so all new recordings use the neural detector; rms stays as the opt-out fallback. Validated across all multi-track meetings: cross-track speech overlap is the same or lower than rms on all but one recording (no systematic ghost turns), and silero captures >= speech on 23/33 tracks while fragmenting less and keeping in-sentence boundaries intact. Co-authored-by: Claude <claude@anthropic.com>

Local-fallback reports put raw transcript fragments into decisions/tasks; those leaked into meeting_facts and biased later reports into fabricating decisions (a meeting's report invented a decision not in the discussion). _upsert_facts now skips facts unless report_health.json reports a non-local generated_by. Status (ok/degraded) is not a usable signal — local reports are often marked ok. Existing junk (213 facts, all from local reports) was purged separately with a JSON backup; the fact axis refills only from AI reports going forward. Co-authored-by: Claude <claude@anthropic.com>

…tracts regen_facts.py integrates clean facts (produced offline by Claude subagents reading transcripts) into memory: it rewrites report.json fact lists, stamps report_health.json with a non-local generated_by so the fact gate trusts them, and upserts via upsert_meeting_memory WITHOUT embeddings — facts are token-searchable, so no OpenAI calls are made. Used to replace the purged local-fallback junk facts (213 raw-transcript items) with 344 clean, owner-attributed facts across 18 meetings, 0% raw-transcript. Co-authored-by: Claude <claude@anthropic.com>

…of local fallback Previously only quota errors parked the meeting; any other AI failure (network/timeout/5xx/rate-limit/missing key) fell back to the raw local keyword report — the same junk that polluted memory. New ReportUnavailableError marks API-availability failures and propagates through every path (direct/chunked/synthesis/critic/wrapper) without being downgraded to a generic error or swallowed as a skipped chunk. build_report re-raises it (no local fallback); the service parks the meeting as blocked_on_ai (transcript kept, no report written); reprocess_blocked.py drains both blocked_on_quota and blocked_on_ai when AI is back (schedulable for automatic draining). Co-authored-by: Claude <claude@anthropic.com>

…ueue Adds com.slack-zoom.gigaam-reprocess: a periodic (StartInterval 900s) LaunchAgent that runs reprocess_blocked.py to drain blocked_on_quota/blocked_on_ai meetings once the AI API is back, making the 'queue until available, then process' loop fully automatic. Carries no secrets (reads .env.local via --env-file). Wired into install_launchd.sh and documented. Co-authored-by: Claude <claude@anthropic.com>

Coverage data (.coverage, *,cover, htmlcov) больше не попадает в дерево. Co-authored-by: Claude <claude@anthropic.com>

Скрипт заливает записи встреч (сведённый audio.m4a + раздельные дорожки спикеров) на Яндекс.Диск через REST API с дедупликацией по sha256. Раскладка: одна папка на запись '<ГГГГ-ММ-ДД> <имя>'. OAuth-токен читается из .env.local (YANDEX_DISK_OAUTH_TOKEN), без хардкода секретов. Транскрипты остаются локально; локальная копия аудио удаляется только после подтверждённой заливки. Запускается вручную, в launchd не подключён. Co-authored-by: Claude <claude@anthropic.com>

Сравнивает самодельный ретривер с LlamaIndex и mem0 на едином context_pack (facts/segments/embedding_chunks). Стадии: retrieve (дёшево, метрики) и report (дорого, полная генерация + LLM-судья). Дев-инструмент, гоняется из изолированного venv; в прод не входит. Секреты из env (OPENAI_API_KEY, DATABASE_URL). Co-authored-by: Claude <claude@anthropic.com>

…hardening Заменяет платный LLM-API генерации отчётов на headless `claude -p` по расписанию (launchd), плюс надёжность очереди blocked-встреч и архива на Яндекс.Диск. - reporting: kill-switch MAC_TRANSCRIBER_REPORT_BACKEND=claude — build_ai_report бросает ReportUnavailableError, сервис паркует встречу в blocked_on_ai - agent_report.py: шов prepare/finalize (transcript.json -> ai_payload -> рендер штатным рендерером; coverage авто, цитаты фильтруются) - drain_reports_via_claude.py: часовой слив очереди blocked/stale через `claude -p`. Кросс-процессный flock, health-гейт (failed остаётся blocked), поштучная заливка на Я.Диск + self-heal (маркер report_disk_uploaded), обработка временного лимита (без штрафа, ретрай), кап попыток для детерминированных сбоев, DISABLE_AUTOUPDATER - upload_reports_to_yandex.py: заливка/переименование отчётов на Я.Диск (титульные папки, дедуп, verify размера) - promote_reports.py: дедуп-рендер + разнос отчётов по дублям встреч - launchd plist example + обёртка + report_agent_instructions.md - reprocess_blocked.py / archive_audio_to_yandex.py: доработки очереди и архива - tests: kill-switch, agent_report round-trip, helper'ы дрейнера; conftest изолирует прод-флаг report-backend из .env.local от тестов Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Ilya and others added 13 commits June 16, 2026 16:14

Add meeting memory transcriber

57896eb

Render adaptive protocol sections

aa5d2d6

chore(mac_transcriber): ignore coverage artifacts

7afacda

Coverage data (.coverage, *,cover, htmlcov) больше не попадает в дерево. Co-authored-by: Claude <claude@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Add meeting memory transcriber#74

[codex] Add meeting memory transcriber#74
Savin99 wants to merge 13 commits into
salute-developers:mainfrom
Savin99:codex/meeting-memory-postgres

Savin99 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Savin99 commented Jun 16, 2026

What changed

Validation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant