Data Mining Class Exercise 2 for Olga, Simon and Fabian
scriptsincludes all R scripts needed to reproduce this projectoutputcontains the outputs generated by the R scripts, including the knitted HTML report
- Set up API: Simon
- Create corpus of Guardian articles on the company Amazon: Fabian
- NER Classifier not 100% accurate: Includes mentions of the rainforest (see Word Cloud)
- Sentiment analysis of corpus and 2-3 sentences on the analysis: Simon
- Word cloud and / or word frequencies of corpus and 2-3 sentences on the analysis: Olga
- Topic modelling of corpus and 2-3 sentences on the analysis: Fabian
- Create final report: Fabian
Andrea's Feedback from CE1
You worked reproducibly using advanced features of GitHub (e.g., the todo list!). The substantive part (the idea, the research question...) is usually not considered in this seminar, but in your case is really well-developed and so it boosted the grade a but. It could lead to a reseach paper. If you want to work at it together I am willing to supervise. Excellent! You missed the 6.0 grade because you did not use issues and had only one PR, ideally each one of you would have made one. Also, you could do a few more commits to practice (it's below average).