A data science capstone project analyzing customer behavior on Vooks, a children's animated storybook SaaS platform, to improve trial-to-paid conversion and reduce subscriber churn. Its product is an animated storybook platform designed to replicate the read-aloud experience, encouraging early childhood literacy while promoting healthy, non-addictive screen time.
Vooks operates on a subscription model where parents sign up for a 7-day free trial before converting to a paid plan. This project identifies behavioral patterns that distinguish users who convert to a paid customer from trial and those who churn from their subscription and builds predictive models to act on those patterns.
Core objectives:
- Predict whether a trial user will convert to a paid subscription
- Predict whether an active subscriber will churn
- Identify watch behavior patterns linked to churn and trial conversion
Mux and Vooks platform data is preprocessed via SQL queries and consolidated into a fact table, where each row is a user’s metrics about their behavioral signals. This fact table is run through various predictive models in order to predict churn
| Source | Description |
|---|---|
| Platform data | 12 tables with subscriber, user, and profile metrics (daily snapshots, 2021–2025) |
| Mux (video streaming) | View-level engagement data: watch duration, view counts, startup times, locations |
After filtering to parent users who joined through the standard sign-up flow and intersecting date ranges across all tables, the final dataset contains 62,268 users, with a subset of 5,113 who trialed or subscribed within the analysis window (2021-06-24 to 2025-09-09).
User segments:
- Trialing — within the 7-day free trial period
- Converted — active paying subscribers
- Churned — cancelled or lapsed within 14 days of renewal
├── EDA/ # Exploratory data analysis notebooks
├── ML/ # Predictive modeling & Feature Importance (conversion & churn)
├── SQL_Queries/ # Data extraction and feature engineering queries
├── Report/ # Final report and supporting notebooks
└── README.md
Exploratory analysis of platform and Mux data. Covers distributions of watch behavior, engagement metrics, demographic breakdowns, and patterns across user lifecycle segments.
Two predictive models:
- Conversion model — predicts whether a trialing user will become a paying subscriber
- Churn model — predicts whether an active subscriber will cancel
Survival analysis model:
- Survival model - predicts relative churn risk of users through the duration of subscription
Queries used to extract and aggregate raw data from platform and Mux tables into analysis-ready features mapped to individual users.
Final capstone report and any associated notebooks used to generate figures or results included in the write-up.
| Term | Definition |
|---|---|
| Trial | 7-day free access period after sign-up |
| Conversion | Trial user upgrades to a paid monthly or annual plan |
| Churn | Subscriber cancels or fails to renew within 14 days |
| Profile | A sub-account within a parent user account |
- Google Big Query
- Python (pandas, scikit-learn, matplotlib)
- SQL
- Jupyter Notebooks
- Mux Analytics API
