WaimaiAnalyzer is a machine-learning-powered sentiment analysis tool for Chinese food delivery (外卖, waimai) reviews. Enter a review in any language — it gets translated to Chinese automatically — and the app instantly tells you whether the sentiment is positive 😋 or negative 😡, complete with a confidence percentage.
- 🌏 Multi-language support — Accepts reviews in any language and auto-translates them to Chinese (Simplified) via Google Translate before analysis.
- ✂️ Chinese text segmentation — Uses Jieba to tokenise Chinese text the same way the model was trained.
- 🤖 Naive Bayes classifier — A lightweight, fast
CountVectorizer + MultinomialNBpipeline trained on ~10,000 real-world Waimai reviews. - 📊 Confidence scores — Shows the exact probability for both positive and negative predictions so you always know how certain the model is.
- 🎈 Visual feedback — Balloons for positive reviews, clear error styling for negative ones.
- ⚡ Streamlit web UI — A clean, interactive browser-based interface that anyone can use without writing a single line of code.
| Positive Review | Negative Review |
|---|---|
| 😋 Positive! (92.3% confident) | 😡 Negative! (87.1% confident) |
WaimaiAnalyzer/
├── main.py # Streamlit web application
├── train.py # Model training script
├── requirements.txt # Python dependencies
├── data/
│ └── waimai_10k.csv # ~10,000 labelled Waimai reviews (0 = negative, 1 = positive)
└── model/
└── waimai_model.pkl # Trained Naive Bayes pipeline (generated by main.py)
git clone https://github.com/Flo1632/WaimaiAnalyzer.git
cd WaimaiAnalyzerpip install -r requirements.txtpython main.pyThis reads data/waimai_10k.csv, trains the classifier, prints the accuracy, and saves the model to model/waimai_model.pkl.
streamlit run app.pyOpen your browser at http://localhost:8501 and start analysing reviews!
- Input — The user types a review (any language) into the text field.
- Translation — If the auto-translate checkbox is enabled, the text is sent to Google Translate and converted to Simplified Chinese.
- Tokenisation — Jieba splits the Chinese text into individual word tokens (e.g.
"外卖很棒"→["外卖", "很", "棒"]). - Vectorisation —
CountVectorizerconverts the token list into a bag-of-words feature vector. - Classification —
MultinomialNBpredicts the sentiment label (0= negative,1= positive) and returns class probabilities. - Display — The app shows the result, confidence score, and a per-class probability breakdown.
| Package | Purpose |
|---|---|
streamlit |
Interactive web UI |
pandas |
Data loading and manipulation |
scikit-learn |
ML pipeline (CountVectorizer + MultinomialNB) |
joblib |
Model serialisation |
jieba |
Chinese text segmentation |
deep-translator |
Automatic translation via Google Translate |
Install all dependencies with:
pip install -r requirements.txtThe model is trained on waimai_10k.csv, a publicly available dataset of approximately 10,000 Chinese food-delivery reviews labelled as:
1— Positive review0— Negative review
Contributions, ideas, and feedback are welcome! Feel free to open an issue or submit a pull request.
This project is open-source. See the repository for license details.