This project analyzes a football players dataset created by merging two separate datasets, each containing approximately 4,000 player records. After merging, the final dataset consists of approximately 8,000 records and 11 attributes related to football players.
The project demonstrates the use of Python for data cleaning, exploratory data analysis (EDA), and data visualization to uncover insights about player characteristics such as age, height, weight, wages, preferred foot, and playing positions.
players1.csvplayers2.csv
players data md.csv
- Approximately 8,000 records
- 11 columns
- Python
- Pandas
- Matplotlib
- Seaborn
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snsThe two datasets were merged using an inner join based on the short_name column.
merge = pd.merge(data1, data2, how='inner', on='short_name')The following cleaning steps were performed:
- Removed missing values using
dropna() - Removed duplicate records using
drop_duplicates()
cl.dropna(inplace=True)
cl.drop_duplicates(inplace=True)The following exploratory analysis methods were applied:
- Dataset inspection (
head(),tail(),sample()) - Missing value analysis
- Statistical summary using
describe() - Data type inspection using
info()
A correlation heatmap was created to examine relationships between:
- Weight (kg)
- Height (cm)
- Age
- Heatmap
Identify whether age, weight, and height are positively or negatively correlated.
A box plot was used to visualize the distribution of player ages.
- Box Plot
- Examine age distribution
- Identify median age
- Detect outliers
A scatter plot was generated to explore the relationship between player age and wage.
- Scatter Plot
Determine whether player earnings tend to increase or decrease with age.
A histogram was created to analyze player height distribution.
- Histogram
Identify the most common height ranges among football players.
A pie chart was used to show the proportion of left-footed and right-footed players.
- Pie Chart
Compare the distribution of preferred foot usage among players.
A count plot was created to display the frequency of each playing position.
- Count Plot
Determine the most common positions occupied by football players.
- Correlation Heatmap
- Age Distribution Box Plot
- Age vs Wage Scatter Plot
- Height Distribution Histogram
- Preferred Foot Pie Chart
- Position Count Plot
Through this project, the following data analysis skills were practiced:
- Data merging and integration
- Data cleaning and preprocessing
- Exploratory data analysis (EDA)
- Statistical correlation analysis
- Data visualization with Matplotlib and Seaborn
- Insight generation from sports datasets
Possible enhancements for future versions:
- Analyze player nationality distributions
- Investigate wage differences by position
- Explore player performance metrics
- Build predictive models for player wages
- Create interactive dashboards using Plotly or Tableau
Football Players Dataset Analysis Project
Created using Python, Pandas, Matplotlib, and Seaborn for exploratory data analysis and visualization.