Skip to content

BrendanMaher/TextMining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TextMining

1st April 2017 R code group project from Data Mining course STAT5703 Carleton University.
It was made to identify the important words and word-pairs from a collection of documents.
Abiola Smith: Responsible for Finding appropriate text mining techniques and providing idea to generate R code.
Brendan Maher: Responsible for R code and analysis of plots and tables made.
Deepesh Khaneja:Responsible for finding relevant content and preparing presentation and report.
The results and graphs and tables generated from the Excel file through the R code are shown in the PDF.
Term weights are made with TermFrequncy/DocumentFrequency and word-pairs work just as well as word correlation.
Clustering of documents top down by K-means gives 3 clusters while bottom up by Hierarchy gives 7 clusters.
Each document gets a main topic but there are not enough unique terms to account for the number of publishers.

About

1st April 2017 R code group project from Data Mining course STAT5703 Carleton University

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages