Skip to content
View alfskoyen's full-sized avatar

Block or report alfskoyen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
alfskoyen/README.md
Visual Depiction of Proposed CNN EfficientNet Model w/ Transfer Learning

Figure: Visual Depiction of EfficientNet CNN Model Highlighting Transfer Learning and Open Layers.

๐ŸŒŒ ๐Ÿ“ ๐Ÿ“ˆ ๐Ÿš„ ๐ŸŒŽ ย  Hello, I'm Alfred Haugen. Welcome to my Git.

Since 2019, I've been bridging my extensive background in risk and audit leadership with machine learning and algorithmic design, applying ML and AI to solve complex, real-world problems. I recently completed my M.S. in Data Science at the University of Virginia's School of Data Science (August 2025), building on prior machine learning studies at Georgetown University in 2020.

My technical expertise spans statistical modeling, from linear and non-linear techniques to deep learning architectures, with applications of Bayesian inference for deriving posteriors across feature variables and latent constructs. This portfolio highlights my focus on data structure design, feature engineering, importance analysis, outlier detection, associational coefficient analysis, and predictive inference.

Recent projects apply these methods to computer vision (landslide detection via CNNs with transfer learning) and healthcare analytics, leveraging tools like GVIF for multicollinearity assessment, SHAP for model interpretability, and rigorous cross-validation for robust, reproducible workflows. My current interests extend to time series modeling, including frequency-domain analysis (Fourier transforms) and deep learning approaches (RNNs, LSTMs) for sequential and temporal pattern recognition.

๐Ÿ› ๏ธ Core Skills

  • Languages and Packages:
    Python R MySQL SQLite
    Pandas NumPy Scikit-Learn XGBoost SciPy Stan TensorFlow Keras
    RPy2 QuantReg in R SHAP Docs car package

  • Data Engineering:
    AWS S3 AWS SageMaker AWS Lambda

  • Visualization:
    Matplotlib Seaborn ggplot2 Yellowbrick Tableau Plotly Dash

  • Other Tools: Git, JupyterLab, Overleaf


Projects

Status As Of

A current passion project of mine, the development of a full end-to-end quantitative alpha engine to scan ~600 NYSE & Nasdaq symbols nightly, ranking the universe through a two-dimensional Premium ร— Risk scoring model in order to surface asymmetric put-selling opportunities. The project is a full pipeline from the Alpha Vantage API through feature engineering, scoring, and finally an interactive Plotly/Dash dashboard deployed live on Render.

Key methods: Cross-sectional percentile scoring ยท IV/HV ratio analysis ยท Vol spike detection ยท Term structure regression ยท Delta-bucketed premium efficiency

Live Demo
Options Alpha Scanner โ€” Premium vs Risk Quadrant Map

Figure: Premium vs Risk scatter across ~600 symbols. Q1 (mint) โ€” High Premium / Low Risk โ€” is the primary target zone.


Team Project: Harold Haugen, Max Pearton, Daniel Sery, Elena Tsvetkova

Landslide Detection Through Deep Learning โ€” Applied convolutional neural networks with transfer learning and hyperparameter tuning to classify satellite imagery into landslide and non-landslide regions, supporting risk mitigation and disaster management.

Documentation:

My Project Notebooks:

The full repository contains notebooks across all team members. Highlighted below are my primary contributions:


Scalable Inference with AWS Step Functions and Lambda

To support large-scale astronomical modeling, we built a serverless workflow using AWS SageMaker, AWS Lambda, Amazon S3, and CloudWatch to orchestrate parallel distribution and execution of tasks. Amazon S3 provided a structured data architecture to store segmented training inputs and capture outputs from distributed runs, while Step Functions and Lambda coordinated execution across many workloads. We also developed Python scripts to collect and organize CloudWatch metrics alongside JSON outputs from the parallel runs, enabling calculation of mean performance measures for review and visualization. This design allows complex astronomical computations to scale automatically while providing clear insight into execution flow, runtime performance, and cost efficiency.

๐Ÿ“ถ Presentation: AWS Big Data - Parallel Inference Project (Dec 2024)


Predictive Inference in Critical Care: Data Wrangling to Model Evaluation

This project explored whether predictive inference was a valid opportunity for classifying ICU ventilation duration using the MIMIC-III dataset. We focused on structuring and wrangling clinical data, engineering features, and training models including logistic regression, random forest, XGBoost, and neural networks. To identify the most informative predictors, we combined multicollinearity analysis (GVIF) with feature importance methods (SHAP, regression diagnostics), supported by cross-validation across iterative models. Emphasis was placed on building reproducible pipelines that link data preparation, feature selection, and predictive modeling to generate insights for ICU decision-making and resource planning. Note - Repository private given patient data protection and research ethics compliance.

๐Ÿ“ถ Presentation: Predicting Ventilation Duration in ICU Patients with Data-Driven Models

๐Ÿ“‘ Predictive Inference within a Clinical Setting

Next Steps

Given the high-dimensional and sparse nature of our dataset, a natural next step is to explore LightGBM as an alternative modeling framework. LightGBM is well-suited for these data characteristics and offers several innovations that could provide both performance gains and computational efficiency:

  • Leaf-wise vs. Level-wise Growth
  • Gradient-based One-Side Sampling (GOSS)
  • Sparse Data Mastery
  • Exclusive Feature Bundling (EFB)

LightGBM Docs


International Clinical Field Studies: Data Research and Regression Insights

This project highlights my work in associational coefficient analysis across historical lower and middle income country (LMIC) clinical field studies, examining relationships between covariates and outcomes. The project involved extensive data wrangling, cleaning, and research to manage heterogeneous datasets and assess variable interconnectedness. Methods included statistical and machine learning models, multicollinearity diagnostics, and interaction effects to capture nuanced patterns. Emphasis was placed on robust regression pipelines, comparative model evaluation, and reproducible workflows to deliver meaningful clinical insights. Note - Repository private given patient data protection and research ethics compliance.

Pair-Plot During Associative EDA: Child to Mother Interactions

Figure: Pair-Plot During Associative EDA: Interactions between maternal health indicators and socio-educational factors.

๐Ÿ“ถ Presentation: Uncovering Covariate Relationships in LMIC Clinical Data for Health Insights

๐Ÿ“‘ Child Health in Bangladesh: Regression and Coefficient Analysis Across Field Data


Dynamic Programming

As part of the computer science curriculum, I developed an understanding of key topics such as algorithmic design, computational complexity, and dynamic programming, applying them through Python coding to build efficient and reproducible solutions.

The notebook below presents my manually derived solution to the Gerrymandering problem, along with an analysis of its computational complexity.

๐Ÿงฎ ๐Ÿ’ป Dynamic Programming & Time Complexity Analysis


Time Series Data Structuring - Short Presentation

A short presentation on the structure of time series data, highlighting temporal dependency, order, seasonality, and frequency. I demonstrate how tools like the Fourier Transform can decompose signals to uncover cycles and hidden patternsโ€”insights that are often overlooked by models but can greatly enhance forecasting, anomaly detection, and decision-making.

๐Ÿ“ถ Presentation on Options for Time Series Data Design

๐ŸŽฅ Short Video on Time Series Data

Pinned Loading

  1. eltsvetk/DS6050_Project eltsvetk/DS6050_Project Public

    Jupyter Notebook 1

  2. DS5100-2023-08-O DS5100-2023-08-O Public

    Forked from UVA-Courses/DS5100-2023-08-O

    UVA DS 5100 Programming for Data Science | Fall 2023 | Online

    Jupyter Notebook

  3. Mod_9_Project Mod_9_Project Public

    Exercise for Module 9.

    Python

  4. UVA-Courses/DS5100-2023-08-O UVA-Courses/DS5100-2023-08-O Public template

    UVA DS 5100 Programming for Data Science | Fall 2023 | Online

    Jupyter Notebook 2 46