Skip to content

mmarinated/topic-modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

topic-modeling

Inferring the topics of Wikipedia articles in different languages.
Capstone project, Fall 2019.
Top-3 best capstone poster among 36 teams.

Research directions

  • Improving the architecture of currently deployed model for English articles.
    • bag-of-words models with fastText embeddings
    • LSTM, LSTM with self attention, LSTM with IDF self attention weights, transformer
  • Transferring the model to articles in other languages (Hindi, Russian).
    • using fastText multilingual word embeddings, we experiment on using model trained only on English articles vs model trained on several languages simulteneously.
  • Exploring language agnostic models based on links between articles.
    • bag-of-words model
    • graph CNN model (GraphSAGE)

Report
Poster

Poster

Created by Marina Zavalina, Peeyush Jain, Sarthak Agarwal, Chinmay Singhal in Fall 2019.
Advisors: Isaac Johnson (Wikimedia Foundation), Anastasios Noulas (NYU CDS).
Project for DS-GA 1006, NYU Center for Data Science.

About

Inferring the topics of Wikipedia articles. Capstone project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors