Skip to content

[codex] fix youshedubao import side effect#2

Draft
datehoer wants to merge 1 commit into
mainfrom
codex/fix-youshedubao-import
Draft

[codex] fix youshedubao import side effect#2
datehoer wants to merge 1 commit into
mainfrom
codex/fix-youshedubao-import

Conversation

@datehoer

@datehoer datehoer commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Summary

Fix the hotToday spider startup failure caused by youshedubao doing network work during module import.

Root Cause

task.py imports get_youshedubao_data, but youshedubao.py also called print(get_youshedubao_data()) at module top level. When the upstream page started returning JS-style escaped single quotes such as Sam\'s, import-time parsing raised JSONDecodeError and stopped the entire hourly spider run before other sources could update.

Changes

  • Move the debug execution behind if __name__ == "__main__" so importing the module has no side effects.
  • Handle escaped single quotes before JSON parsing.

Validation

  • git diff --cached --check
  • docker exec hottoday-spider python -m py_compile /app/youshedubao/youshedubao.py
  • docker exec hottoday-spider python -c 'from youshedubao.youshedubao import get_youshedubao_data; data=get_youshedubao_data(); print(len(data["data"]))' returned 6.
  • Manually ran the spider after the fix and refreshed /rank/hot; API returned updated data with latest timestamp 2026-06-08 15:11:38.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the youshedubao.py script to handle escaped single quotes in the JSON string and wraps the execution entry point in an if __name__ == "__main__": block. The review feedback points out that using chained .replace() calls to unescape strings is fragile and suggests a more robust approach using codecs.escape_decode.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +26 to +28
uisdc_news = json.loads(
uisdc_news.replace('\\"', '"').replace("\\\\", "\\").replace("\\'", "'")
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using chained .replace() calls to unescape JS/JSON strings is fragile and can lead to corruption. For example, if the string contains a literal backslash followed by a single quote, the first replacement of double backslashes with a single backslash will convert it, and the subsequent replacement of escaped single quotes will incorrectly strip the backslash.\n\nA more robust and standard way to unescape all JS/Python-style escape sequences (like escaped quotes, backslashes, newlines, etc.) in a single pass is to use codecs.escape_decode.

Suggested change
uisdc_news = json.loads(
uisdc_news.replace('\\"', '"').replace("\\\\", "\\").replace("\\'", "'")
)
import codecs
uisdc_news = json.loads(
codecs.escape_decode(uisdc_news.encode("utf-8"))[0].decode("utf-8")
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant