1st April 2017 R code group project from Data Mining course STAT5703 Carleton University.
It was made to identify the important words and word-pairs from a collection of documents.
Abiola Smith: Responsible for Finding appropriate text mining techniques and providing idea to generate R code.
Brendan Maher: Responsible for R code and analysis of plots and tables made.
Deepesh Khaneja:Responsible for finding relevant content and preparing presentation and report.
The results and graphs and tables generated from the Excel file through the R code are shown in the PDF.
Term weights are made with TermFrequncy/DocumentFrequency and word-pairs work just as well as word correlation.
Clustering of documents top down by K-means gives 3 clusters while bottom up by Hierarchy gives 7 clusters.
Each document gets a main topic but there are not enough unique terms to account for the number of publishers.
BrendanMaher/TextMining
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|