Skip to content

feat(perf-data): add WeChat Channels and extend Bilibili adapters#47

Open
NEYMAR946 wants to merge 2 commits into
XBuilderLAB:mainfrom
NEYMAR946:contrib/wechat-bilibili-adapters
Open

feat(perf-data): add WeChat Channels and extend Bilibili adapters#47
NEYMAR946 wants to merge 2 commits into
XBuilderLAB:mainfrom
NEYMAR946:contrib/wechat-bilibili-adapters

Conversation

@NEYMAR946

Copy link
Copy Markdown

Summary

This contribution adds a WeChat Channels creator adapter and extends the existing Bilibili adapter after end-to-end validation on Windows with NAS-mapped content projects.

WeChat Channels

  • persistent Playwright login isolated in .auth-wechat-channels/
  • recent post list and creator-side performance metrics
  • views, likes, favorites, comments, shares, completion rate, average watch time, fast-swipe rate, and follows where available
  • automatic Interaction Management -> Comments capture
  • pagination, root/reply normalization, creator reply detection, and Top 20 comment reporting
  • avoids persisting internal usernames and signed media URLs
  • wechat_channels integration in cheat-retro with manual fallback

Bilibili

  • danmaku text capture with video timestamps and deflate decoding
  • nested comment replies and commenter names in reports
  • persistent Playwright login for automatic creator-space video/BV discovery
  • removes mandatory httpx dependency from single-video public retrieval
  • Windows mapped-drive, Chinese-path, virtualenv, UTF-8, and canonical-output fixes in run.sh

Additional fix

  • Xiaohongshu login now waits for creator backend readiness instead of closing after an intermediate SSO cookie when phone verification is still required.

Validation

  • py_compile passes for all changed Python modules
  • WeChat Channels tested with two real creator accounts, including post metrics and comment/reply capture
  • Bilibili creator space automatically listed two real videos and matched the expected UP account
  • Bilibili single-video reports captured statistics, comments, replies, and available danmaku
  • run.sh generated report.md end to end on a Windows NAS-mapped path containing Chinese characters

Privacy and scope

  • no account cookies, creator data, signed URLs, local paths, or debug captures are included
  • .auth-wechat-channels/ and .auth-bilibili/ are gitignored
  • adapters are intended for creators reading their own backend/public content data

@Jooonnn Jooonnn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

谢谢 @NEYMAR946 这个 PR。wechat-channels adapter 本身是真贡献——架构跟 douyin-session 对齐、Privacy 写得很自觉(不持久化内部用户名 / 签名 URL)、cheat-retro 集成完整。但 PR 现在绑了 3 件事,其中 bilibili 那块比 description 写的影响更大。需要拆 + 澄清后再合。

🔴 1. bilibili-stat 是 rewrite,不是 extension

PR description 说:

"removes mandatory httpx dependency from single-video public retrieval"

读起来像"httpx 变成可选"。但 requirements.txt 的实际改动是:

```diff
-httpx>=0.27
+playwright>=1.44
```

httpx 被直接删掉,crawler.py 改了 ~300 行。这跟 #41 刚合时的 USP 完全相反——#41 的卖点是 "纯 httpx,clone 下来 pip install httpx 就用" (~10MB),比其他 adapter 轻一个数量级。

PR 把这条 lightweight 路径移除而不是保留为可选模式。对只想抓单视频公开数据的用户(B 站 USP 受众),现在要被迫装 Playwright Chromium(~500MB)。

解决路径,二选一

  • A(推荐):把 bilibili 改动从本 PR 剥离,单独开 PR,明确说明 architectural rewrite,让维护方独立 review
  • B:保留 bilibili 但真正落实 "httpx 可选"——requirements.txt 同时含 httpx 和 playwright,run.sh 检测,单视频路径继续走 httpx,创作者空间路径走 Playwright。#41 的"纯 httpx, pip install httpx" 安装故事必须保留

🟠 2. PR 绑 3 件事,建议拆

当前 PR 同时改:

  1. NEW wechat-channels adapter(7 文件)
  2. bilibili-stat rewrite(7 文件,影响 #41 用户)
  3. xhs-explore 小修(login readiness wait,1 文件)

3 个改动各自的风险面、回滚边界、review 重点都不一样。建议拆:

  • PR A:wechat-channels(新增,干净,低风险)
  • PR B:bilibili 改造(按上面问题 1 处理)
  • PR C:xhs-explore 小修(trivial)

特别是 PR C 跟 #46(已合)改同一个文件 — 拆出来后 rebase 才看得清是 trivial 还是有实质冲突。

🟠 3. wechat-channels 没接 cheat-init Q2.1

skills/cheat-init/SKILL.md Q2.1 "你内容主要在哪个平台" 当前选项是:抖音 / 小红书 / YouTube / B站 / 其他。本 PR 加了 wechat-channels 到 cheat-retro 表,但没加进 Q2.1 平台选项——新用户 init 时看不到"微信视频号"选项。

参考 #41 的范式(同时改 cheat-init Q2.1 + state.enabled_perf_adapters 映射),拆出来的 PR A(wechat-channels)需要补上这个集成。

🟡 4. wechat-channels 的 TOS 信号要再确认

PR description 写:

"avoids persisting internal usernames and signed media URLs"

——意识到了,赞。但 wechat 是封闭生态,TOS 比抖音/小红书更严。建议 wechat-channels README 单开一节"TOS 与风险",明确:

  • 用户自己自己视频号后台数据(个人用途)是用户与微信间的事,cheat-on-content 不背书
  • 不抓他人作品 / 不批量爬取 / 不绕过登录
  • 提示用户阅读视频号助手用户协议

跟 douyin-session README 的"TOS 风险"块同范式。


推荐路径

把 PR 拆成 A / B / C 三个,A(wechat-channels)+ C(xhs-explore 小修)我估计能直接合,B(bilibili)按问题 1 单独走 review。要全部 squash 进一个 PR 也行,但 bilibili 那块必须保留 httpx 的轻量路径——回归 #41 的设计意图,不要把 bilibili 变成跟其他 adapter 一样重的实现。

你拍板怎么改。

@Jooonnn

Jooonnn commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@NEYMAR946 看到 PR 有 force-push(head 移到 `85865ef`)但文件清单和 `bilibili-stat/requirements.txt`(仍是 `httpx → playwright` 一行替换)跟我之前 review 时一模一样——所以前 review 的 4 个关切(bilibili 是 rewrite 不是 extension / 3 件事建议拆 3 个 PR / cheat-init Q2.1 没加微信视频号 / wechat TOS 段建议补)还成立。

是 force-push 只是 rebase 没改 scope,还是打算等 follow-up 一起改?告诉我意向我决定继续等还是 close as won't-address。

特别是 bilibili 那块——希望保留 #41 的"`pip install httpx` 就能用"的轻量路径作为单视频公开数据 fallback。

@Jooonnn

Jooonnn commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

感谢贡献!新的 wechat-channels 适配器和 bilibili 扩展整体不错,安全姿态干净(auth 目录已 gitignore,无 secret/cookie/__pycache__/真实指标入库,shell 变量都做了引用)。但 review 后建议先 rebase 到当前 main 再重新应用最近合入的修复,否则会悄悄回退已合的 bugfix。合并前请处理:

  1. 当前与 main 冲突(基于旧 main 重写)。请 rebase 到 origin/main
  2. adapters/perf-data/bilibili-stat/crawler.pyfetch_comments(video["aid"], …)aid is None 守卫——回退了 fix: 全仓代码审查 — 修复 Python / Shell / 协议 / 配置共 34 个 bug #49 的修复。
  3. adapters/perf-data/bilibili-stat/run.sh + adapters/perf-data/wechat-channels/run.sh:仍用旧的 dirname/dirname/realpath 项目根定位;fix: 全仓代码审查 — 修复 Python / Shell / 协议 / 配置共 34 个 bug #49 已统一改为向上查找 .cheat-state.json(不依赖嵌套深度 + 校验 + 找不到 exit 3)。新适配器也应采用新约定。
  4. adapters/perf-data/xhs-explore/crawler.pyensure_loginfeat(xhs-explore): public-page fallback, archive/summarize, image download #46 的重写重叠,需手动合并以同时保留两边行为(main 的 QR 刷新/Chrome fallback + 本 PR 的 backend-readiness 门控)。
  5. adapters/perf-data/wechat-channels/crawler.py:未知 post_id 会返回全 0 stub → 写出 0 值 report.mdexit 0,导致 /cheat-retro 误判为成功、不会降级到手动。建议未知 ID 返回非 0 退出码。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants