Music streaming platforms rely heavily on data-driven systems to understand user preferences and song characteristics.
For this reason, EchoScope explores how machine learning can be applies to analyse Spotify tracks to map the musical universe through data.
In particular, this project aims to provide the following services:
- Grouping songs based on their main features.
- Predicting the popularity potential of a song.
- Generating song recommendations by balancing acoustic similarity with predicted commercial success.
The project is structured as a modular pipeline as each notebook focuses on a specific task of the pipeline:
- Audio Mapping: Clustering tracks based on the main musical features.
- Productive Analytics: Evaluating popularity using a regression model based on a random forest.
- Smart Discovery: Implementing a hybrid song and playlist recommendation engine based on similarity and predicted success.
Source: Kaggle – Spotify Tracks Dataset by Maharshi Pandya.
Size: ~114,000 tracks.
Key Features: danceability, energy, loudness, speechiness, acousticness, instrumentalness, liveness, valence, tempo.
The repository is organized into five main notebooks:
data_visualization.ipynb: Exploratory Data Analysis step to identify eventual patterns within the dataset.data_preprocessing.ipynb: Clearning the raw dataset by handling missing values and performing feature scaling.clustering.ipynb: Implementing Principal Component Analysis and K-Means Clustering to group songs into musical identikits based on their features.popularity_prediction.ipynb: Training a Random Forest Regression to predict track popularity scores and investigating feature importance for popularity.song_recommendation.ipynb: A recommendation engine based on Cosine Similarity that is filtered by cluster and weighted by predicted popularity.
- Dimensionality Reduction: Principal Component Analysis.
- Clustering: K-Means with Elbow Method and Silhouette Score.
- Regression: Random Forest Regressor.
- Similarity Metrics: Cosine Similarity.
- Dynamic Threshodling: Evaluate song popularity using popularity percentiles for each cluster.
<pre>
├── data/
│ ├── dataset.csv # Original raw data
│ └── echoscope_production_data.csv # Processed data with clusters and predictions
├── notebooks/
│ ├── data_visualization.ipynb
│ ├── data_preprocessing.ipynb
│ ├── clustering.ipynb
│ ├── popularity_prediction.ipynb
│ └── song_recommendation.ipynb
├── main.py # FastAPI application script
├── requirements.txt # Project dependencies
├── LICENSE
└── README.md
</pre>- Clone the repository:
git clone https://github.com/yourusername/echoscope.git
cd echoscope- Install the needed dependencies:
pip install -r requirements.txt- Run the pipeline, eventually fixing the arguments passed in the
pd.read_csv()and thedf.to_csv()functions for directory consistency.
The EchoScope project includes a production-ready REST API built with FastAPI that allows to search for tracks and get real-time song recommendations.
To use the API:
-
Ensure you have installed all dependencies:
pip install -r requirements.txt pip install uvicorn
-
Run the following command in your terminal, in the same directory as the
main.pyfile:uvicorn main:app --reload
-
Once the server is running, visit http://127.0.0.1:8000/docs.
Here, you will be able to use the
/searchendpoint to search for songs in the database and the/recommendendpoint to find similar songs to an input-provided song title.
Use this to find track metadata and confirm the exact name of a song in the database.
| Parameter | Type | Example Value | Description |
|---|---|---|---|
| title | string |
Primadonna |
The name of the track to search for |
Example Response:
{
"track_name": "Primadonna",
"artists": "MARINA",
"album_name": "Electra Heart (Deluxe)",
"popularity_pred": 43.32
}The core engine, which is tasked with identifying the track's cluster and generating a list of recommendations based on audio similarity and popularity potential.
| Parameter | Type | Example Value | Description |
|---|---|---|---|
| track_name | string |
Flamingo |
Target song for similarity analysis |
| artist_name | string |
Kenshi Yonezu |
(Optional) To filter specific artists |
| n | int |
5 |
Number of recommendations to return |
Example Response:
{
"base_song": {
"title": "Flamingo",
"artist": "Kenshi Yonezu",
"cluster": 3
},
"recommendations": [
{
"track_name": "Potion (with Dua Lipa & Young Thug)",
"artists": "Calvin Harris;Dua Lipa;Young Thug",
"similarity": 0.99,
"popularity_pred": 65.54
},
{
"track_name": "Super Freaky Girl",
"artists": "Nicki Minaj",
"similarity": 0.98,
"popularity_pred": 65.63
},
{
"track_name": "Junio",
"artists": "Maluma",
"similarity": 0.99,
"popularity_pred": 61.24
},
{
"track_name": "Mañana",
"artists": "Ozuna",
"similarity": 0.98,
"popularity_pred": 63.98
},
{
"track_name": "One Kiss (with Dua Lipa)",
"artists": "Calvin Harris;Dua Lipa",
"similarity": 0.97,
"popularity_pred": 66.99
}
]
}