Get a representative sample of articles for a domain that can be used for further studies like credibility assestment of the domain or any other type of analysis.
The project is made up of two stages:
- Size Reduction Stage: where the number of the article is reduced baed on the statistical limited population theory
- Topic sampling: A representative sample from each topic is taken to ensure the diversity and representativeness of the sample. We use BERTopic.
To use the domain sample clone this git repo and follow these steps:
- On your terminal type: pip install -r requirements.txt
- Create a new .py file and import the DomainSampler class and use it.