Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions youshedubao/youshedubao.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ def get_youshedubao_data():
doc = pyquery.PyQuery(res.content)
items = doc(".news-main>script").text()
uisdc_news = re.findall('var uisdc_news = "(.*?)";', items)[0]
uisdc_news = json.loads(uisdc_news.replace('\\"', '"').replace("\\\\", "\\"))
uisdc_news = json.loads(
uisdc_news.replace('\\"', '"').replace("\\\\", "\\").replace("\\'", "'")
)
Comment on lines +26 to +28

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using chained .replace() calls to unescape JS/JSON strings is fragile and can lead to corruption. For example, if the string contains a literal backslash followed by a single quote, the first replacement of double backslashes with a single backslash will convert it, and the subsequent replacement of escaped single quotes will incorrectly strip the backslash.\n\nA more robust and standard way to unescape all JS/Python-style escape sequences (like escaped quotes, backslashes, newlines, etc.) in a single pass is to use codecs.escape_decode.

Suggested change
uisdc_news = json.loads(
uisdc_news.replace('\\"', '"').replace("\\\\", "\\").replace("\\'", "'")
)
import codecs
uisdc_news = json.loads(
codecs.escape_decode(uisdc_news.encode("utf-8"))[0].decode("utf-8")
)

return {"data": uisdc_news}
print(get_youshedubao_data())


if __name__ == "__main__":
print(get_youshedubao_data())