TextMining

1st April 2017 R code group project from Data Mining course STAT5703 Carleton University.
It was made to identify the important words and word-pairs from a collection of documents.
Abiola Smith: Responsible for Finding appropriate text mining techniques and providing idea to generate R code.
Brendan Maher: Responsible for R code and analysis of plots and tables made.
Deepesh Khaneja:Responsible for finding relevant content and preparing presentation and report.
The results and graphs and tables generated from the Excel file through the R code are shown in the PDF.
Term weights are made with TermFrequncy/DocumentFrequency and word-pairs work just as well as word correlation.
Clustering of documents top down by K-means gives 3 clusters while bottom up by Hierarchy gives 7 clusters.
Each document gets a main topic but there are not enough unique terms to account for the number of publishers.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CSV_All_Texts.csv		CSV_All_Texts.csv
README.md		README.md
STAT5703TextMiningReport.pdf		STAT5703TextMiningReport.pdf
TextMining.R		TextMining.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextMining

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TextMining

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages