Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ with score_results as (
xwalk_scores as (
select * from {{ ref('xwalk_assessment_scores') }}
),
xwalk_score_values as (
select * from {{ ref('xwalk_assessment_score_values') }}
),
xwalk_score_value_thresholds as (
select * from {{ ref('xwalk_assessment_score_value_thresholds') }}
),
performance_levels as (
select
tenant_code,
Expand Down Expand Up @@ -34,16 +40,35 @@ dedupe_results as (
),
merged_xwalk as (
select
tenant_code,
api_year,
k_student_assessment,
score_name as original_score_name,
coalesce(normalized_score_name, 'other') as normalized_score_name,
score_result
dedupe_results.tenant_code,
dedupe_results.api_year,
dedupe_results.k_student_assessment,
dedupe_results.score_name as original_score_name,
coalesce(xwalk_scores.normalized_score_name, 'other') as normalized_score_name,
dedupe_results.score_result,
coalesce(xwalk_score_value_thresholds.normalized_score_result::varchar,
xwalk_score_values.normalized_score_result::varchar,
score_result::varchar
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked here whether or not the original score result should be defaulted to if there is no normalization happening for the score and leaned toward yes for the case when normalization is not necessary. What this could mean is that a score that should be normalized but isn't yet included in either normalization xwalk will make it's way into this column in an ugly format that might not match what is necessary for reporting, but in order for a column to be included here, it must be added to the xwalk_assessment_scores column in the first place, so there is at least a manual step that needs to happen anyway. Someone might not know that this normalized column exists and the values in the normalized column should be an integer if it's a performance level (as a random example), but I think we can communicate this out and it avoids having to map values to themselves.

) as normalized_score_result
from dedupe_results
left join xwalk_scores
on dedupe_results.assessment_identifier = xwalk_scores.assessment_identifier
and dedupe_results.namespace = xwalk_scores.namespace
and dedupe_results.score_name = xwalk_scores.original_score_name
left join xwalk_score_values
on dedupe_results.assessment_identifier = xwalk_score_values.assessment_identifier
and dedupe_results.namespace = xwalk_score_values.namespace
and xwalk_scores.normalized_score_name = xwalk_score_values.normalized_score_name
and dedupe_results.score_result = xwalk_score_values.original_score_result
left join xwalk_score_value_thresholds
on dedupe_results.assessment_identifier = xwalk_score_value_thresholds.assessment_identifier
and dedupe_results.namespace = xwalk_score_value_thresholds.namespace
and xwalk_scores.normalized_score_name = xwalk_score_value_thresholds.normalized_score_name
-- todo check these comparators -- what if there's a value between the upper and next lower? eg value is 20.4 and the cutoffs are 20 and 21
-- todo review my use of try_to_numeric here -- the idea is to allow numeric values to merge, otherwise don't merge without error
and try_to_numeric(dedupe_results.score_result) >= xwalk_score_value_thresholds.lower_bound
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will default to int since no scale argument is given, I think that's fine but maybe we consider allowing for decimals (so try_to_decimal)? I assume you could still write out the values in the xwalk as integers

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point, maybe we should be explicit about the data type of this column -- i'm still unsure about this Q I put in the PR "Is it right to overload a general "normalized_" column with various use cases, when some may need to be integers vs. characters, etc.?"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good q. I think in a lot of cases though the point of a column like this is to normalize values to a similar set of values across all assessments in the table. I don't necessarily think that's always true but my guess is this column would be used for a single particular downstream purpose - like a BI user might use a normalized column where all PLs are integers when creating charts for proper ordering. But again, maybe there is another use case I'm not considering where this could have serious negative effects

and try_to_numeric(dedupe_results.score_result) <= xwalk_score_value_thresholds.upper_bound
-- todo in future, may need to include subject & grade level in this join (with options to join across subjects)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will definitely run into this at some point but can start without it - especially considering there will be additional assessment normalization features anyway


)
select * from merged_xwalk
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ with score_results as (
xwalk_scores as (
select * from {{ ref('xwalk_objective_assessment_scores') }}
),
xwalk_score_values as (
select * from {{ ref('xwalk_assessment_score_values') }}
),
xwalk_score_value_thresholds as (
select * from {{ ref('xwalk_assessment_score_value_thresholds') }}
),
performance_levels as (
select
tenant_code,
Expand Down Expand Up @@ -39,13 +45,31 @@ merged_xwalk as (
api_year,
k_student_objective_assessment,
score_name as original_score_name,
coalesce(normalized_score_name, 'other') as normalized_score_name,
score_result
coalesce(xwalk_scores.normalized_score_name, 'other') as normalized_score_name,
score_result,
coalesce(xwalk_score_value_thresholds.normalized_score_result::varchar,
xwalk_score_values.normalized_score_result::varchar,
score_result::varchar
) as normalized_score_result
from dedupe_results
left join xwalk_scores
on dedupe_results.assessment_identifier = xwalk_scores.assessment_identifier
and dedupe_results.namespace = xwalk_scores.namespace
and dedupe_results.objective_assessment_identification_code = xwalk_scores.objective_assessment_identification_code
and dedupe_results.score_name = xwalk_scores.original_score_name
left join xwalk_score_values
on dedupe_results.assessment_identifier = xwalk_score_values.assessment_identifier
and dedupe_results.namespace = xwalk_score_values.namespace
and xwalk_scores.normalized_score_name = xwalk_score_values.normalized_score_name
and dedupe_results.score_result = xwalk_score_values.original_score_result
left join xwalk_score_value_thresholds
on dedupe_results.assessment_identifier = xwalk_score_value_thresholds.assessment_identifier
and dedupe_results.namespace = xwalk_score_value_thresholds.namespace
and xwalk_scores.normalized_score_name = xwalk_score_value_thresholds.normalized_score_name
-- todo check these comparators -- what if there's a value between the upper and next lower? eg value is 20.4 and the cutoffs are 20 and 21
-- todo review my use of try_to_numeric here -- the idea is to allow numeric values to merge, otherwise don't merge without error
and try_to_numeric(dedupe_results.score_result) >= xwalk_score_value_thresholds.lower_bound
and try_to_numeric(dedupe_results.score_result) <= xwalk_score_value_thresholds.upper_bound
-- todo in future, may need to include subject & grade level in this join (with options to join across subjects)
)
select * from merged_xwalk
20 changes: 17 additions & 3 deletions models/core_warehouse/fct_student_assessment.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
-- depends_on: {{ ref('xwalk_assessment_score_values') }}
-- depends_on: {{ ref('xwalk_assessment_score_value_thresholds') }}
{{
config(
post_hook=[
Expand Down Expand Up @@ -30,9 +32,9 @@ student_assessments_wide as (
student_assessments.tenant_code,
student_assessments.student_assessment_identifier,
student_assessments.serial_number,
school_year,
administration_date,
administration_end_date,
student_assessments.school_year,
student_assessments.administration_date,
student_assessments.administration_end_date,
event_description,
administration_environment,
administration_language,
Expand All @@ -50,6 +52,18 @@ student_assessments_wide as (
else_value='NULL',
agg='max',
quote_identifiers=False
) }},
{#- find distinct score names that are in one of the normalize_result xwalks (distinct scores to add normalized_ column for) -#}
{% set normalized_names_values = dbt_utils.get_column_values(ref('xwalk_assessment_score_values'), 'normalized_score_name') or [] %}
{% set normalized_names_thresholds = dbt_utils.get_column_values(ref('xwalk_assessment_score_value_thresholds'), 'normalized_score_name') or [] %}
{{ dbt_utils.pivot(
'normalized_score_name',
(normalized_names_values + normalized_names_thresholds) | unique,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea here is that we only want normalized versions of scores that are included in either xwalk (because scores like scale_score and sem will rarely be normalized in this way, so would be overkill in my opinion)

then_value='normalized_score_result',
else_value='NULL',
prefix='normalized_',
agg='max',
quote_identifiers=False
) }}
{%- endif %}
from student_assessments
Expand Down
14 changes: 14 additions & 0 deletions models/core_warehouse/fct_student_objective_assessment.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
-- depends_on: {{ ref('xwalk_objective_assessment_score_values') }}
-- depends_on: {{ ref('xwalk_objective_assessment_score_value_thresholds') }}
{{
config(
post_hook=[
Expand Down Expand Up @@ -51,6 +53,18 @@ student_obj_assessments_wide as (
else_value='NULL',
agg='max',
quote_identifiers=False
) }},
{#- find distinct score names that are in one of the normalize_result xwalks (distinct scores to add normalized_ column for) -#}
{% set normalized_names_values = dbt_utils.get_column_values(ref('xwalk_objective_assessment_score_values'), 'normalized_score_name') or [] %}
{% set normalized_names_thresholds = dbt_utils.get_column_values(ref('xwalk_objective_assessment_score_value_thresholds'), 'normalized_score_name') or [] %}
{{ dbt_utils.pivot(
'normalized_score_name',
(normalized_names_values + normalized_names_thresholds) | unique,
then_value='normalized_score_result',
else_value='NULL',
prefix='normalized_',
agg='max',
quote_identifiers=False
) }}
{%- endif %}
from student_obj_assessments
Expand Down