-
Notifications
You must be signed in to change notification settings - Fork 3
Feature/assessment normalizations #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
45d2cbe
0b0034c
b7b56f5
0b50098
6aa0dc7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,6 +4,12 @@ with score_results as ( | |
| xwalk_scores as ( | ||
| select * from {{ ref('xwalk_assessment_scores') }} | ||
| ), | ||
| xwalk_score_values as ( | ||
| select * from {{ ref('xwalk_assessment_score_values') }} | ||
| ), | ||
| xwalk_score_value_thresholds as ( | ||
| select * from {{ ref('xwalk_assessment_score_value_thresholds') }} | ||
| ), | ||
| performance_levels as ( | ||
| select | ||
| tenant_code, | ||
|
|
@@ -34,16 +40,35 @@ dedupe_results as ( | |
| ), | ||
| merged_xwalk as ( | ||
| select | ||
| tenant_code, | ||
| api_year, | ||
| k_student_assessment, | ||
| score_name as original_score_name, | ||
| coalesce(normalized_score_name, 'other') as normalized_score_name, | ||
| score_result | ||
| dedupe_results.tenant_code, | ||
| dedupe_results.api_year, | ||
| dedupe_results.k_student_assessment, | ||
| dedupe_results.score_name as original_score_name, | ||
| coalesce(xwalk_scores.normalized_score_name, 'other') as normalized_score_name, | ||
| dedupe_results.score_result, | ||
| coalesce(xwalk_score_value_thresholds.normalized_score_result::varchar, | ||
| xwalk_score_values.normalized_score_result::varchar, | ||
| score_result::varchar | ||
| ) as normalized_score_result | ||
| from dedupe_results | ||
| left join xwalk_scores | ||
| on dedupe_results.assessment_identifier = xwalk_scores.assessment_identifier | ||
| and dedupe_results.namespace = xwalk_scores.namespace | ||
| and dedupe_results.score_name = xwalk_scores.original_score_name | ||
| left join xwalk_score_values | ||
| on dedupe_results.assessment_identifier = xwalk_score_values.assessment_identifier | ||
| and dedupe_results.namespace = xwalk_score_values.namespace | ||
| and xwalk_scores.normalized_score_name = xwalk_score_values.normalized_score_name | ||
| and dedupe_results.score_result = xwalk_score_values.original_score_result | ||
| left join xwalk_score_value_thresholds | ||
| on dedupe_results.assessment_identifier = xwalk_score_value_thresholds.assessment_identifier | ||
| and dedupe_results.namespace = xwalk_score_value_thresholds.namespace | ||
| and xwalk_scores.normalized_score_name = xwalk_score_value_thresholds.normalized_score_name | ||
| -- todo check these comparators -- what if there's a value between the upper and next lower? eg value is 20.4 and the cutoffs are 20 and 21 | ||
| -- todo review my use of try_to_numeric here -- the idea is to allow numeric values to merge, otherwise don't merge without error | ||
| and try_to_numeric(dedupe_results.score_result) >= xwalk_score_value_thresholds.lower_bound | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this will default to int since no scale argument is given, I think that's fine but maybe we consider allowing for decimals (so try_to_decimal)? I assume you could still write out the values in the xwalk as integers
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that's a good point, maybe we should be explicit about the data type of this column -- i'm still unsure about this Q I put in the PR "Is it right to overload a general "normalized_" column with various use cases, when some may need to be integers vs. characters, etc.?"
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that's a good q. I think in a lot of cases though the point of a column like this is to normalize values to a similar set of values across all assessments in the table. I don't necessarily think that's always true but my guess is this column would be used for a single particular downstream purpose - like a BI user might use a normalized column where all PLs are integers when creating charts for proper ordering. But again, maybe there is another use case I'm not considering where this could have serious negative effects |
||
| and try_to_numeric(dedupe_results.score_result) <= xwalk_score_value_thresholds.upper_bound | ||
| -- todo in future, may need to include subject & grade level in this join (with options to join across subjects) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we will definitely run into this at some point but can start without it - especially considering there will be additional assessment normalization features anyway |
||
|
|
||
| ) | ||
| select * from merged_xwalk | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,5 @@ | ||
| -- depends_on: {{ ref('xwalk_assessment_score_values') }} | ||
| -- depends_on: {{ ref('xwalk_assessment_score_value_thresholds') }} | ||
| {{ | ||
| config( | ||
| post_hook=[ | ||
|
|
@@ -30,9 +32,9 @@ student_assessments_wide as ( | |
| student_assessments.tenant_code, | ||
| student_assessments.student_assessment_identifier, | ||
| student_assessments.serial_number, | ||
| school_year, | ||
| administration_date, | ||
| administration_end_date, | ||
| student_assessments.school_year, | ||
| student_assessments.administration_date, | ||
| student_assessments.administration_end_date, | ||
| event_description, | ||
| administration_environment, | ||
| administration_language, | ||
|
|
@@ -50,6 +52,18 @@ student_assessments_wide as ( | |
| else_value='NULL', | ||
| agg='max', | ||
| quote_identifiers=False | ||
| ) }}, | ||
| {#- find distinct score names that are in one of the normalize_result xwalks (distinct scores to add normalized_ column for) -#} | ||
| {% set normalized_names_values = dbt_utils.get_column_values(ref('xwalk_assessment_score_values'), 'normalized_score_name') or [] %} | ||
| {% set normalized_names_thresholds = dbt_utils.get_column_values(ref('xwalk_assessment_score_value_thresholds'), 'normalized_score_name') or [] %} | ||
| {{ dbt_utils.pivot( | ||
| 'normalized_score_name', | ||
| (normalized_names_values + normalized_names_thresholds) | unique, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. idea here is that we only want normalized versions of scores that are included in either xwalk (because scores like |
||
| then_value='normalized_score_result', | ||
| else_value='NULL', | ||
| prefix='normalized_', | ||
| agg='max', | ||
| quote_identifiers=False | ||
| ) }} | ||
| {%- endif %} | ||
| from student_assessments | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked here whether or not the original score result should be defaulted to if there is no normalization happening for the score and leaned toward yes for the case when normalization is not necessary. What this could mean is that a score that should be normalized but isn't yet included in either normalization xwalk will make it's way into this column in an ugly format that might not match what is necessary for reporting, but in order for a column to be included here, it must be added to the
xwalk_assessment_scorescolumn in the first place, so there is at least a manual step that needs to happen anyway. Someone might not know that this normalized column exists and the values in the normalized column should be an integer if it's a performance level (as a random example), but I think we can communicate this out and it avoids having to map values to themselves.