DES646 Project

Dataset Bias and Quality Diagnostics Dashboard

Project Description

This project presents a lightweight, interactive Streamlit dashboard designed to help researchers and designers detect, visualize, and mitigate dataset bias and quality issues before training machine learning models.
By combining principles of explainable AI and visual analytics, it enables transparent and interpretable exploration of image datasets even for users without deep technical expertise.

The tool automatically identifies common dataset issues such as duplicates, class imbalance, outliers, and potential bias-conflicting samples using pretrained vision models. It also supports semi-supervised labeling for partially labeled datasets and provides actionable feedback for improving data quality and fairness.

Key functionalities include:

Data Diagnostics: Detect duplicates, imbalance, outliers, and low-quality samples.
Bias & Self-Influence Analysis: Identify samples that disproportionately affect model fairness or accuracy.
Visual Exploration: Explore class distributions, embedding projections (UMAP), and influence-ranked images.
Label Completion (Optional): Suggest probable labels for unlabeled images using embedding similarity.
Actionable Feedback: Recommend relabeling or removing problematic samples to improve dataset quality.

How to Run

1. Clone the repository and navigate to the project directory

git clone <repo-url>
cd <repo-name>

2. Install dependencies

Make sure you have Python 3.8 or above, then run:

pip install -r requirements.txt

3. Launch the Streamlit dashboard

streamlit run dashboard/app.py

4. Use the interface

Choose whether to upload a semi-labeled dataset or run diagnostics on an existing dataset.
Explore duplicates, imbalance, outliers, and influence scores through interactive visualizations.
Download reports for duplicate and imbalance analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
dashboard		dashboard
data_utils		data_utils
diagnostics		diagnostics
embedding		embedding
influence		influence
outputs		outputs
testing		testing
uploaded_data_mode2/fashionmnist_sample		uploaded_data_mode2/fashionmnist_sample
.gitignore		.gitignore
Final_Report.pdf		Final_Report.pdf
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DES646 Project

Dataset Bias and Quality Diagnostics Dashboard

Project Description

How to Run

1. Clone the repository and navigate to the project directory

2. Install dependencies

3. Launch the Streamlit dashboard

4. Use the interface

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DES646 Project

Dataset Bias and Quality Diagnostics Dashboard

Project Description

How to Run

1. Clone the repository and navigate to the project directory

2. Install dependencies

3. Launch the Streamlit dashboard

4. Use the interface

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages