diff --git a/02_activities/assignments/DC_Cohort/Assignment2.md b/02_activities/assignments/DC_Cohort/Assignment2.md index 01f991d02..595e2706f 100644 --- a/02_activities/assignments/DC_Cohort/Assignment2.md +++ b/02_activities/assignments/DC_Cohort/Assignment2.md @@ -56,7 +56,10 @@ The store wants to keep customer addresses. Propose two architectures for the CU **HINT:** search type 1 vs type 2 slowly changing dimensions. ``` -Your answer... +There are two ways to architect a CUSTOMER_ADDRESS table depending on whether the store needs to retain historical address data or not. +The first approach is a Type 1 Slowly Changing Dimension, which overwrites the existing record whenever a customer's address changes. The table simply holds one row per customer with the usual address columns, and when an update comes in, that row is replaced. This is simple and keeps the table small, but the old address is gone forever — there is no way to recover what it was before the change. +The second approach is a Type 2 Slowly Changing Dimension, which preserves history by inserting a new row every time an address changes rather than overwriting the old one. Each row carries a valid_from date, a valid_to date (left null while the record is still active), and an is_current flag. When a customer moves, the existing row gets a valid_to date stamped on it and is_current set to false, and a fresh row is inserted representing the new address. This means the table grows over time, but you can query it at any point in the past and know exactly what address was on file at that moment. +For a bookstore, Type 2 is generally the better choice. If a customer disputes a delivery or you need to audit where an order was shipped, you can look up the address that was active on the date of that order — something Type 1 makes impossible once the customer has since moved. ``` *** @@ -191,5 +194,13 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c ``` -Your thoughts... +Boykis's article challenges a comfortable assumption that runs through most public conversation about artificial intelligence — that these systems are somehow automatic, neutral, or self-generated. The title cuts straight to the point: neural nets are just people all the way down. Behind every model is a chain of human decisions, human data, and human labour, and the ethical problems that come with AI are largely problems about how that human foundation is treated and obscured. + +The most concrete issue the article surfaces is labour. Building a machine learning model requires vast quantities of labeled data — images identified, sentences classified, content flagged as safe or harmful. That work is done by large, distributed workforces of low-wage contractors, often in the Global South, paid per task with no benefits, no stability, and no recognition in the final product. The companies that deploy these models present them as technological achievements while the human assembly line underneath stays invisible. That invisibility is not incidental — it is part of how the industry sustains itself. + +Content moderation sits at an especially troubling intersection of labour and harm. Workers who review and categorize violent, abusive, or explicit content are exposed to a relentless stream of disturbing material, frequently without meaningful psychological support. The clean, safe-feeling experience at the user end exists because someone else processed the worst of what humans produce online, quietly and cheaply. That is an ethical cost that rarely appears in conversations about AI progress. + +Bias is another direct consequence of this human foundation. Data reflects the world as it has been, not as it should be, and the people who collect and label it bring their own cultural assumptions with them. A model trained on that data and then deployed globally will encode those assumptions as though they were objective facts. When AI systems influence decisions in hiring, lending, criminal justice, or healthcare, biased outputs stop being an abstract concern and start producing concrete harm for people who have no visibility into why a decision was made. + +What makes Boykis's framing valuable is that it closes off a common escape route. When something goes wrong with an AI system, the tendency is to treat it as a technical problem — a parameter to adjust, a dataset to clean. But if the system is people all the way down, then the failures are human failures: of accountability, of fair compensation, of whose knowledge and perspective gets treated as the default. Fixing them requires more than better engineering. ``` diff --git a/02_activities/assignments/DC_Cohort/Bookstore_logical_model_1.png b/02_activities/assignments/DC_Cohort/Bookstore_logical_model_1.png new file mode 100644 index 000000000..e6dcf9bb4 Binary files /dev/null and b/02_activities/assignments/DC_Cohort/Bookstore_logical_model_1.png differ diff --git a/02_activities/assignments/DC_Cohort/Bookstore_logical_model_2.png b/02_activities/assignments/DC_Cohort/Bookstore_logical_model_2.png new file mode 100644 index 000000000..e334e43c9 Binary files /dev/null and b/02_activities/assignments/DC_Cohort/Bookstore_logical_model_2.png differ diff --git a/02_activities/assignments/DC_Cohort/assignment2.sql b/02_activities/assignments/DC_Cohort/assignment2.sql index f7515f625..b23e39b68 100644 --- a/02_activities/assignments/DC_Cohort/assignment2.sql +++ b/02_activities/assignments/DC_Cohort/assignment2.sql @@ -23,7 +23,9 @@ Edit the appropriate columns -- you're making two edits -- and the NULL rows wil All the other rows will remain the same. */ --QUERY 1 - +SELECT + product_name || ', ' || COALESCE(product_size, '') || ' (' || COALESCE(product_qty_type, 'unit') || ')' +FROM product; --END QUERY @@ -41,7 +43,14 @@ HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). Filter the visits to dates before April 29, 2022. */ --QUERY 2 - +SELECT + customer_id, + market_date, + DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY market_date) AS visit_number +FROM customer_purchases +WHERE market_date < '2022-04-29' +GROUP BY customer_id, market_date +ORDER BY customer_id, market_date; --END QUERY @@ -53,7 +62,17 @@ only the customer’s most recent visit. HINT: Do not use the previous visit dates filter. */ --QUERY 3 - +SELECT * +FROM ( + SELECT + customer_id, + market_date, + DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number + FROM customer_purchases + GROUP BY customer_id, market_date +) AS ranked_visits +WHERE visit_number = 1 +ORDER BY customer_id; --END QUERY @@ -66,7 +85,17 @@ You can make this a running count by including an ORDER BY within the PARTITION Filter the visits to dates before April 29, 2022. */ --QUERY 4 - +SELECT + customer_id, + product_id, + market_date, + COUNT(*) OVER ( + PARTITION BY customer_id, product_id + ORDER BY market_date + ) AS purchase_count +FROM customer_purchases +WHERE market_date < '2022-04-29' +ORDER BY customer_id, product_id, market_date; --END QUERY @@ -85,7 +114,14 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */ --QUERY 5 - +SELECT + product_name, + CASE + WHEN INSTR(product_name, '-') > 0 + THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1)) + ELSE NULL + END AS description +FROM product; --END QUERY @@ -94,7 +130,9 @@ Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR w /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */ --QUERY 6 - +SELECT * +FROM product +WHERE product_size REGEXP '[0-9]'; --END QUERY @@ -111,7 +149,32 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling with a UNION binding them. */ --QUERY 7 - +WITH daily_sales AS ( + SELECT + market_date, + ROUND(SUM(quantity * cost_to_customer_per_qty), 2) AS total_sales + FROM customer_purchases + GROUP BY market_date +), + +ranked_sales AS ( + SELECT + market_date, + total_sales, + RANK() OVER (ORDER BY total_sales DESC) AS best_rank, + RANK() OVER (ORDER BY total_sales ASC) AS worst_rank + FROM daily_sales +) + +SELECT market_date, total_sales, 'Best Day' AS label +FROM ranked_sales +WHERE best_rank = 1 + +UNION + +SELECT market_date, total_sales, 'Worst Day' AS label +FROM ranked_sales +WHERE worst_rank = 1; --END QUERY @@ -132,7 +195,24 @@ How many customers are there (y). Before your final group by you should have the product of those two queries (x*y). */ --QUERY 8 - +SELECT + v.vendor_name, + p.product_name, + ROUND(5 * vi.original_price * COUNT(c.customer_id), 2) AS total_revenue +FROM ( + SELECT DISTINCT vendor_id, product_id, original_price + FROM vendor_inventory +) AS vi +CROSS JOIN ( + SELECT DISTINCT customer_id + FROM customer +) AS c +JOIN vendor AS v + ON vi.vendor_id = v.vendor_id +JOIN product AS p + ON vi.product_id = p.product_id +GROUP BY v.vendor_name, p.product_name +ORDER BY v.vendor_name, p.product_name; --END QUERY @@ -145,7 +225,12 @@ It should use all of the columns from the product table, as well as a new column Name the timestamp column `snapshot_timestamp`. */ --QUERY 9 - +CREATE TABLE product_units AS +SELECT + *, + CURRENT_TIMESTAMP AS snapshot_timestamp +FROM product +WHERE product_qty_type = 'unit'; --END QUERY @@ -155,7 +240,22 @@ Name the timestamp column `snapshot_timestamp`. */ This can be any product you desire (e.g. add another record for Apple Pie). */ --QUERY 10 - +INSERT INTO product_units ( + product_id, + product_name, + product_size, + product_category_id, + product_qty_type, + snapshot_timestamp +) +VALUES ( + 999, + 'Apple Pie', + '10 inch', + 1, + 'unit', + CURRENT_TIMESTAMP +); --END QUERY @@ -167,7 +267,13 @@ This can be any product you desire (e.g. add another record for Apple Pie). */ HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ --QUERY 11 - +DELETE FROM product_units +WHERE product_id = 999 + AND snapshot_timestamp = ( + SELECT MIN(snapshot_timestamp) + FROM product_units + WHERE product_id = 999 + ); --END QUERY @@ -191,7 +297,23 @@ Finally, make sure you have a WHERE statement to update the right row, When you have all of these components, you can run the update statement. */ --QUERY 12 - +ALTER TABLE product_units +ADD current_quantity INT; + +UPDATE product_units +SET current_quantity = ( + SELECT COALESCE( + (SELECT quantity + FROM vendor_inventory + WHERE vendor_inventory.product_id = product_units.product_id + ORDER BY market_date DESC + LIMIT 1), + 0 + ) +) +WHERE product_units.product_id IN ( + SELECT product_id FROM product_units +); --END QUERY