📊Data Analytics Activities Repository

Course: Data Analytics
Authors: Shawn Jurgen Mayol, Elgen Mar Arinasa
University: University of San Carlos

🔍 Overview

This repository contains implementations of various assignments from our Data Analytics course. Each assignment explores different analytical techniques, data processing methods, and visualization strategies. The goal is to apply theoretical concepts to real-world datasets and develop proficiency in Python for data analysis.

📌 Assignments

Each assignment is structured as a Jupyter Notebook (.ipynb) or Python script (.py), with clear documentation and visualization of results.

📂 Assignment 1: Balanced Risk Set Matching

Objective: Implement the Balanced Risk Set Matching Algorithm for an observational study analyzing the effects of Cystoscopy and Hydrodistention on Interstitial Cystitis patients.
Key Steps:
1. Load patient data from a CSV file.
2. Compute Mahalanobis distances to compare treated and control patients.
3. Identify feasible treated-control pairs, ensuring treatment time constraints.
4. Solve Integer Programming (IP) to determine the optimal matching.
5. Analyze treatment effects (compare symptom changes between groups).
6. Perform sensitivity analysis to check robustness of findings.
Tech Stack: pandas, numpy, scipy, matplotlib, seaborn

📂 Assignment 2: Data Visualization & Network Analysis

Objective: Utilize data visualization techniques to analyze relationships and distributions.
Key Steps:
1. Bar Chart Analysis: Visualize the distribution of Yes/No responses by category.
2. Sankey Diagram: Illustrate the flow distribution between different categories.
3. Network Graph: Construct a network of category connections, highlighting core and external nodes.
Tech Stack: Python (Matplotlib, Seaborn, Plotly, NetworkX)
Generated Visualizations:
- 📊 Bar Graph: Displays Yes/No distribution across labeled categories.
- 🔗 Network Graph: Maps node connections, distinguishing between core and external entities.
- 📈 Sankey Diagram: Represents flow relationships between categorized entities.

📂 Assignment 3: For Clustering: Sessa Empirical Estimator

Objective: Apply clustering techniques (K-Means and DBSCAN) to prescription duration data using the Sessa Empirical Estimator (SEE) method.
Key Steps:
1. Preprocess and clean the dataset, ensuring accurate calculations of prescription duration intervals.
2. Implement K-Means and DBSCAN clustering algorithms to identify patterns in prescription refill behavior.
3. Compare the performance of both algorithms using silhouette scores and other evaluation metrics.
4. Visualize the clustering results, comparing patterns in dosage per day and prescription duration.
5. Analyze the clinical implications of the clustering results, focusing on improving patient adherence and healthcare management.
Tech Stack: pandas, numpy, matplotlib, seaborn, sklearn (DBSCAN, K-Means)

📂 Assignment 4: Target Trial Emulation (TTE) & TTE-V2 with Clustering

Objective: Implement the Target Trial Emulation (TTE) methodology in Python, replicating results from an R-based framework, and extend it by integrating clustering techniques to improve patient subgroup analysis.
Key Steps:
1. Replicate TTE in Python: Convert the original R-based Target Trial Emulation (TTE) into Python, ensuring the methodology and results remain consistent.
2. Perform Causal Inference: Apply the Marginal Structural Model (MSM) to estimate treatment effects while adjusting for confounders and censoring.
3. Validate Against R Implementation: Ensure that the results from Python match those obtained from the original R-based TTE framework.
4. Develop TTE-V2 (Enhanced with Clustering): Introduce a clustering mechanism within TTE to segment patients into meaningful subgroups.
5. Apply K-Means Clustering: Group patients based on baseline characteristics and analyze how treatment effects differ across clusters.
6. Compare TTE vs. TTE-V2: Evaluate whether clustering improves the robustness of treatment effect estimation.
7. Discuss Findings: Interpret the impact of clustering in observational studies and discuss its limitations and advantages.
Tech Stack:
- pandas, numpy, matplotlib, seaborn
- statsmodels (for Marginal Structural Models)
- sklearn (for clustering: K-Means, Silhouette Score)

📈 Visual Representations

This repository includes:

✅ Data visualizations using matplotlib and seaborn
✅ Statistical analysis and data preprocessing
✅ Interactive data exploration via Jupyter Notebooks

Sample Output:

(Include example graphs and insights from the analysis.)

🛠 Setup & Usage

To run the notebooks or scripts in this repository:

Clone the repository:

git clone https://github.com/yourusername/DataAnalytics_Activities.git

Install dependencies:
```
pip install -r requirements.txt
```
Open Jupyter Notebook:
```
jupyter notebook
```
Navigate to the desired .ipynb file and run the cells.

📜 Conclusion

This repository serves as a portfolio of data analytics projects, demonstrating various data processing, statistical analysis, and visualization techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
Assignment_1		Assignment_1
Assignment_2		Assignment_2
Assignment_3		Assignment_3
Assignment_4		Assignment_4
Lecture Activities - Experiment/Lecture 1 - Activity		Lecture Activities - Experiment/Lecture 1 - Activity
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊Data Analytics Activities Repository

🔍 Overview

📌 Assignments

📂 Assignment 1: Balanced Risk Set Matching

📂 Assignment 2: Data Visualization & Network Analysis

📂 Assignment 3: For Clustering: Sessa Empirical Estimator

📂 Assignment 4: Target Trial Emulation (TTE) & TTE-V2 with Clustering

📈 Visual Representations

Sample Output:

🛠 Setup & Usage

📜 Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📊Data Analytics Activities Repository

🔍 Overview

📌 Assignments

📂 Assignment 1: Balanced Risk Set Matching

📂 Assignment 2: Data Visualization & Network Analysis

📂 Assignment 3: For Clustering: Sessa Empirical Estimator

📂 Assignment 4: Target Trial Emulation (TTE) & TTE-V2 with Clustering

📈 Visual Representations

Sample Output:

🛠 Setup & Usage

📜 Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages