This REST API provides two functions to extract and transform text data from Reddit using the Reddit API. It utilizes Redis for storing raw data and NLTK for text transformation. You can easily run the API in a containerized environment using Docker and manage the build steps using the provided Makefile.
-
Data Extraction: Retrieve text data from Reddit using Praw lib client for Python.
-
Data Transformation: Transform the extracted text data using NLTK for tasks such as text cleaning, tokenization, and more.
-
Data Storage: Utilize Redis as a data store to efficiently manage raw data.
Before running the API, ensure you have the following prerequisites installed:
- Docker
- Docker Compose
- Python 3.10
- NLTK
- Make (to automate the project execution)
- Reddit API credentials (to access Reddit data)
To use the application, you need to create an .env file with the following env variables:
REDIS_PORT ----> Redis port to access DB.
REDIS_HOST ----> Redis host public ip.
CLIENT_ID -----> Client id provided by Reddit.
SECRET_TOKEN --> Secret token provided by Reddit.
USER_AGENT ----> User agent provided by Reddit.
SUBREDDIT -----> The subreddit used for data extraction.
N_POSTS -------> The number of most recent messages, used to extract data.
-
Clone this repository to your local machine:
$ git clone https://github.com/yourusername/reddit-data-api.git $ cd reddit-data-api -
To run the application as a container, you should run the following command:
$ make run
The swagger doc will be available at localhost:5000/apidoc/swagger. You can access the API endpoints for data extraction and transformation.
Run the command below to clear all the files in the docker container from the application modules environment.
- Run make command to stop containers and clean all files from the app:
$ make clean
-
/extract: Extract text data from Reddit. -
/transform/<string:id>: Transform the extracted data using NLTK for various text processing tasks.
-
Use the API endpoints
/extractand/transform/<string:id>to retrieve and transform Reddit data as needed for your application. -
Customize the extraction and transformation logic in the Python files located in the
app/servicesdirectory.