Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
1d9aec2
Add README to project
jameno Jan 28, 2020
7587d81
Add iCGM Condition Finder script skeleton
jameno Jan 29, 2020
9d5c095
Add condition labeling feature
jameno Jan 29, 2020
a74b3a5
Add 48-hour snapshot location finder
jameno Jan 30, 2020
ea2d4e9
Add results summary function
jameno Jan 31, 2020
f4113bf
Update README
jameno Jan 31, 2020
d166fb1
Update main function to take in a dataframe
jameno Jan 31, 2020
f8585fe
Add file_name to main condition finder function
jameno Jan 31, 2020
bd1c441
Add batch processing script for condition finder
jameno Jan 31, 2020
4b88681
Update evaluation points to use id instead of index
jameno Jan 31, 2020
4b31b28
Add rounded CGM point deduplication
jameno Jan 31, 2020
171f713
Add metric for the percent of CGM points that are duplicated
jameno Jan 31, 2020
0f26626
Add an empty results frame for batch processing
jameno Jan 31, 2020
3f2d0d2
Add logic for counting only not-null values
jameno Jan 31, 2020
9db1600
Fix condition column names in output
jameno Feb 3, 2020
3ca481a
Update README with output information
jameno Feb 3, 2020
f75713e
Add utc=True argument to utc datetime converter
jameno Feb 11, 2020
44b48da
Add snapshot processor
jameno Feb 11, 2020
d7c439c
Add check for null evaluation points
jameno Feb 11, 2020
f2537b9
Add condition text to pickle file export
jameno Feb 11, 2020
8eb977b
Update README with script information and numbered condition labels
jameno Feb 11, 2020
a72ec17
Update README formatting
jameno Feb 11, 2020
7279283
Update README anchor link
jameno Feb 11, 2020
6d5d713
Update README anchor link
jameno Feb 11, 2020
aacf164
Update README anchor link
jameno Feb 11, 2020
eedd491
Add new condition constraints + update time to local time
jameno Feb 14, 2020
4abbec2
Update basal 5-minute downsampling
jameno Feb 14, 2020
a7cec6e
Limit basal dataframe to necessary columns for merging
jameno Feb 14, 2020
14a90ca
Update processor with rounded local times and csv output
jameno Feb 14, 2020
61eaa97
Add carbInput as a snapshot requirement
jameno Feb 18, 2020
dc8c04f
Remove no_travel as a snapshot requirement
jameno Feb 18, 2020
5d557a8
Update activeSchedule finder and bgTarget logic
jameno Feb 18, 2020
83167f6
Update carb event finding for snapshots
jameno Feb 18, 2020
cf0506d
Update schedule finding algorithm
jameno Feb 20, 2020
95def25
Add CGM Smoothing
jameno Feb 20, 2020
f924a1f
Add functions to simplify settings and add empty events
jameno Feb 20, 2020
d0fd8e9
Fix CGM rolling error and target bg bug
jameno Feb 20, 2020
69ccdea
Update empty carb/dose events to remove all other events from dataframe
jameno Feb 20, 2020
1e310a9
Add placeholder for age and ylw
jameno Feb 20, 2020
e09eb97
Update active schedule algorithm
jameno Feb 21, 2020
6680bde
Add a cgm data validation for 576 points
jameno Feb 21, 2020
caf605f
Update CGM deduplication
jameno Mar 4, 2020
5538d31
Change condition finding to use 10 days of data
jameno Mar 4, 2020
ce183c8
Update snapshot settings to use bolus calculator entries if needed
jameno Mar 4, 2020
d00f9ba
Add smoothing and interpolation to CGM values
jameno Mar 4, 2020
fb11ca1
Add handling of birthdate/diagnosisDate metadata from file
jameno Mar 4, 2020
6b4e585
Use 10 days from evaluation point instead of 48 hours
jameno Mar 4, 2020
70ac504
Add condition validation check
jameno Mar 4, 2020
c06415c
Round CGM values to 1 decimal
jameno Mar 26, 2020
f28c406
Round basal rates to nearest 0.05u
jameno Mar 26, 2020
88bc534
Update interpolation logic
jameno Mar 26, 2020
1b765e5
Update max basal & bolus settings defaults
jameno Mar 26, 2020
d2b2308
Add a final NaN CGM backfill after interpolating
jameno Mar 26, 2020
0870a71
Add logic to handle carb to insulin ratios of 0
jameno Mar 26, 2020
008867e
Add weighted cgm gap blending to snapshot processor
jameno Apr 7, 2020
255f0b6
Add sample data for testing
jameno Apr 8, 2020
081cb70
Update README
jameno Apr 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions projects/iCGM-test-matrix/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# iCGM Test Matrix

This project finds snapshots of data to explore the sensitivity of the Loop Algorithm over the entire range of BG values. Each snapshot is a 10-day window of data. At the end of this data is an "evaluation point" which falls into one of the 9 conditions detailed in the table below.

The primary goal of this project is to find 9 snapshots for each condition from 100 datasets in the Tidepool Big Data Donation Project (TBDDP).

The secondary goal is to calculate the distribution of all 9 conditions within the entire TBDDP donor population.

## Scripts

There are 3 python scripts used in this project:

- **icgm_condition_finder.py** - Given a Tidepool donor dataset, returns 9 locations (if available) of each condition along with some other statistics (see [Condition Finder Output](#condition-finder-output) below)
- **batch-icgm-condition-stats.py** - A batch script wrapper for the icgm_condition_finder. Given a folder of Tidepool datasets, creates a .csv output of condition locations and stats for every file.
- **snapshot_processor.py** - Given the output of batch-icgm-condition-stats.py, takes each snapshot location for every dataset and converts it into a formatted .csv of input data tables used by the pyLoopKit simulator.

## Condition Table

There are 3 value conditions and 3 rate of change conditions with a combined 9 unique iCGM states that any iCGM data point can exist within as shown in the table below.

<table>
<tbody>
<tr>
<td></td>
<td></td>
<td colspan=3><b>Median BG value of the previous 6 BG values<br>(mg/dL)</b></td>
</tr>
<tr>
<td></td>
<td></td>
<td>[40-70)</td>
<td>[70-180]</td>
<td>(180-400]</td>
</tr>
<tr>
<td rowspan=3><b>Rate of change of the<br>previous 3 BG values <br>(mg/dL/min)</b></td>
<td>< -1</td>
<td>[40-70) <br>&<br> < -1 </td>
<td>[70-180] <br>&<br> < -1 </td>
<td>(180-400] <br>&<br> < -1 </td>
</tr>
<tr>
<td>[-1 to 1]</td>
<td>[40-70) <br>&<br> [-1 to 1]</td>
<td>[70-180] <br>&<br> [-1 to 1]</td>
<td>(180-400] <br>&<br> [-1 to 1]</td>
</tr>
<tr>
<td>> 1</td>
<td>[40-70) <br>&<br> > 1</td>
<td>[70-180] <br>&<br> > 1</td>
<td>(180-400] <br>&<br> > 1</td>
</tr>
</tbody>
</table>

The conditions are numbered 1-9 as follows:

| Condition # | 30min Median BG (mg/dL) <br />& <br />15min Rate of Change (mg/dL/min) |
| :---------: | :----------------------------------------------------------- |
| 1 | [40-70) & < -1 |
| 2 | [70-180] & < -1 |
| 3 | (180-400] & < -1 |
| 4 | [40-70) & [-1 to 1] |
| 5 | [70-180] & [-1 to 1] |
| 6 | (180-400] & [-1 to 1] |
| 7 | [40-70) & > 1 |
| 8 | [70-180] & > 1 |
| 9 | (180-400] & > 1 |

## Condition Finder Algorithm

The algorithm for finding a snapshot is as follows

- Fit the CGM trace to a 5-minute time series to uncover gaps
- Calculate the median mg/dL value with a 30-minute (6 cgm points) rolling window
- Calculate the slope in mg/dL/min with a 15-minute (3 cgm points) rolling window
- Apply one of the 9 conditions labels to each CGM point
- Calculate the max gap size of the cgm trace in a 24 hour *centered* rolling window (where the evaluation point is in the center)
- Randomly select one evaluation point for each condition that does not overlap with any other 48-hour snapshot and has a max gap <= 15 minutes

## Condition Finder Output

The output for the icgm_condition_finder.py and batch processing script are:

- **file_name** - The file name of the .csv analyzed
- **nRoundedTimeDuplicatesRemoved** - The number of cgm duplicates removed after rounding to the nearest 5 minutes
- **cgmPercentDuplicated** - Percent of the cgm data that was duplicated
- **gte40_lt70** - The number of cgm entries with a median BG value of the previous 6 BG values (mg/dL) in the range [40, 70) (mg/dL)
- **gte70_lte180** - The number of cgm entries with a median BG value of the previous 6 BG values in the range [70, 180] (mg/dL)
- **gt180_lte400** - The number of cgm entries with a median BG value of the previous 6 BG values in the range (180, 400] (mg/dL)
- **lt-1** - The number of cgm entries with a rate of change of the previous 3 BG values less than -1 (mg/dL/min)
- **gte-1_lte1**- The number of cgm entries with a rate of change of the previous 3 BG values in the range [-1, 1] (mg/dL/min)
- **gt1** - The number of cgm entries with a rate of change of the previous 3 BG values greater than 1 (mg/dL/min)
- **cond[0-9]** - The number of total evaluation points that match a given condition (note that cond0 are the number of cgm entries that could not be evaluated under a condition due to a lack of data)
- **cond[1-9]_eval_time** - The rounded local timestamp of a randomly sampled evaluation point
- **status** - The batch processing completion status of each file

## Snapshot Processor Output

The output for **snapshot_processor.py** is a "snapshot_export" folder containing the pyLoopKit-formatted .csv tables. These .csvs will can also be used in the risk simulation pipeline (public repository coming soon).
120 changes: 120 additions & 0 deletions projects/iCGM-test-matrix/batch-icgm-condition-stats.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Batch iCGM Condition Stats
===========================
:File: batch-icgm-condition-stats.py
:Description: A batch processing script for the icgm_condition_finder.py module
Given a folder of Tidepool datasets, get a summary of all results
using the condition finder script.
:Version: 0.0.1
:Created: 2020-01-30
:Authors: Jason Meno (jam)
:Dependencies: A folder of .csvs containing Tidepool CGM device data
:License: BSD-2-Clause
"""
import pandas as pd
import icgm_condition_finder
import time
import datetime as dt
import os
from multiprocessing import Pool, cpu_count
import traceback
import sys
# %%

data_location = "sample_data/"
file_list = os.listdir(data_location)

# Filter only files with .csv in their name (includes .csv.gz files)
file_list = [filename for filename in file_list if '.csv' in filename]

# %%


def get_icgm_condition_stats(file_name, data_location, user_loc):

file_path = data_location + file_name
# print(str(user_loc) + " STARTING")
if((user_loc % 100 == 0) & (user_loc > 99)):
print(user_loc)
log_file = open('batch-icgm-condition-stats-log.txt', 'a')
log_file.write(str(user_loc)+"\n")
log_file.close()

results = icgm_condition_finder.get_empty_results_frame()
results['file_name'] = file_name

try:
df = pd.read_csv(file_path, low_memory=False)

if 'type' in set(df):
if 'cbg' in set(df['type']):
results = icgm_condition_finder.main(df, file_name)
results['status'] = "Complete"
else:
results['status'] = "No CGM Data"
else:
results['status'] = "Empty Dataset"

except Exception as e:
df = pd.DataFrame()
print("Processing Failed For: " + file_path)
exception_text = "Failed - " + str(e)
results['status'] = "Failed"
results['exception_text'] = exception_text

return results


# %%
if __name__ == "__main__":
# Start Pipeline
start_time = time.time()

# Startup CPU multiprocessing pool
pool = Pool(int(cpu_count()))

pool_array = [pool.apply_async(
get_icgm_condition_stats,
args=[file_list[user_loc],
data_location,
user_loc
]
) for user_loc in range(len(file_list))]

pool.close()
pool.join()

end_time = time.time()
elapsed_minutes = (end_time - start_time)/60
elapsed_time_message = "Batch iCGM Condition Stats completed in: " + \
str(elapsed_minutes) + " minutes\n"
print(elapsed_time_message)
log_file = open('batch-icgm-condition-stats-log.txt', 'a')
log_file.write(str(elapsed_time_message)+"\n")
log_file.close()

# %% Append results of each pool into an array

results_array = []

for result_loc in range(len(pool_array)):
try:
results_array.append(pool_array[result_loc].get())
except Exception as e:
print('Failed to get results! ' + str(e))
exception_text = traceback.format_exception(*sys.exc_info())
print('\nException Text:\n')
for text_string in exception_text:
print(text_string)

# %%
# Convert results into dataframe
icgm_condition_summary_df = pd.concat(results_array, sort=False)
today_timestamp = dt.datetime.now().strftime("%Y-%m-%d")
results_export_filename = \
'batch-icgm-condition-stats-' + \
today_timestamp + \
'.csv'
icgm_condition_summary_df.to_csv(results_export_filename, index=False)
Loading