Skip to content

RTT-app/collector-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Collector-API: Reddit Data Extractor and Transformer API

This REST API provides two functions to extract and transform text data from Reddit using the Reddit API. It utilizes Redis for storing raw data and NLTK for text transformation. You can easily run the API in a containerized environment using Docker and manage the build steps using the provided Makefile.

Features

  • Data Extraction: Retrieve text data from Reddit using Praw lib client for Python.

  • Data Transformation: Transform the extracted text data using NLTK for tasks such as text cleaning, tokenization, and more.

  • Data Storage: Utilize Redis as a data store to efficiently manage raw data.

Getting Started

Prerequisites

Before running the API, ensure you have the following prerequisites installed:

To use the application, you need to create an .env file with the following env variables:

REDIS_PORT ----> Redis port to access DB.
REDIS_HOST ----> Redis host public ip.
CLIENT_ID -----> Client id provided by Reddit.
SECRET_TOKEN --> Secret token provided by Reddit.
USER_AGENT ----> User agent provided by Reddit.
SUBREDDIT -----> The subreddit used for data extraction.
N_POSTS -------> The number of most recent messages, used to extract data.

Running the API

  1. Clone this repository to your local machine:

    $ git clone https://github.com/yourusername/reddit-data-api.git
    $ cd reddit-data-api
  2. To run the application as a container, you should run the following command:

    $ make run

The swagger doc will be available at localhost:5000/apidoc/swagger. You can access the API endpoints for data extraction and transformation.

Run the command below to clear all the files in the docker container from the application modules environment.

  1. Run make command to stop containers and clean all files from the app:
    $ make clean

API Endpoints

  • /extract: Extract text data from Reddit.

  • /transform/<string:id>: Transform the extracted data using NLTK for various text processing tasks.

Usage

  • Use the API endpoints /extract and /transform/<string:id> to retrieve and transform Reddit data as needed for your application.

  • Customize the extraction and transformation logic in the Python files located in the app/services directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors