Big Data project

Motivation for this project was to create a system architecture for storing large amount of car data, its efficient batch and streaming processing in order to get useful information from the data.

Technologies used: Apache Hadoop, Apache Spark, Apache Kafka, Metabase, PostgreSQL, Airflow, Druid, Docker

Architecture

Running the application

Create "datasets" directory in the root directory
Download batch and streaming datasets from these 2 links:

Download PostgreSQL JDBC Driver v42.3.0 in jar format for saving processed data to database

https://jdbc.postgresql.org/download.html

Batch

Go to "batch" directory and start docker-compose

docker-compose up --build

Go to "upload_data" directory and run command below to transfer datasets to HDFS

./docker_copy.sh

Go to Airflow URL http://localhost:8282/ and run "batch-dag"

All batch queries will be applied to the batch dataset
Also, 2 tables will be created for data visualization - "price_decline" and "num_of_ads"

Go to Metabase URL http://localhost:3000/ and run SQL queries for data visualization

Create SQL query below to visualize car model's price decline per year

SELECT "brandName", "modelName", "vf_ModelYear", "AvgPrice", "PriceDiff", "TotalCars" from price_decline
WHERE "modelName" = {{modelname}}
GROUP BY "modelName", "vf_ModelYear", "brandName", "AvgPrice", "PriceDiff", "TotalCars"
ORDER BY "vf_ModelYear" desc;

Set "vf_ModelYear" as x-axis and "AvgPrice" for y-axis

Streaming

Go to "streaming" directory and run:

./docker-run.sh

After producer starts filling up Kafka broker with the data, in another terminal run:

./spark-jobs-run.sh

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
airflow		airflow
batch		batch
streaming		streaming
upload_data		upload_data
.gitignore		.gitignore
ASVSP - Tehnička specifikacija.pdf		ASVSP - Tehnička specifikacija.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data project

Architecture

Running the application

Batch

Streaming

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Big Data project

Architecture

Running the application

Batch

Streaming

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages