diff --git a/02_activities/assignments/DC_Cohort/Assignment 2 Bookstore Logical Model - Prompt 1.drawio b/02_activities/assignments/DC_Cohort/Assignment 2 Bookstore Logical Model - Prompt 1.drawio
new file mode 100644
index 000000000..72d432a2d
--- /dev/null
+++ b/02_activities/assignments/DC_Cohort/Assignment 2 Bookstore Logical Model - Prompt 1.drawio
@@ -0,0 +1,293 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/02_activities/assignments/DC_Cohort/Assignment 2 Bookstore Logical Model - Prompt 1.pdf b/02_activities/assignments/DC_Cohort/Assignment 2 Bookstore Logical Model - Prompt 1.pdf
new file mode 100644
index 000000000..82ca93cf8
Binary files /dev/null and b/02_activities/assignments/DC_Cohort/Assignment 2 Bookstore Logical Model - Prompt 1.pdf differ
diff --git a/02_activities/assignments/DC_Cohort/Assignment 2 Bookstore Logical Model - Prompt 2.pdf b/02_activities/assignments/DC_Cohort/Assignment 2 Bookstore Logical Model - Prompt 2.pdf
new file mode 100644
index 000000000..2cbfadaf2
Binary files /dev/null and b/02_activities/assignments/DC_Cohort/Assignment 2 Bookstore Logical Model - Prompt 2.pdf differ
diff --git a/02_activities/assignments/DC_Cohort/Assignment1.md b/02_activities/assignments/DC_Cohort/Assignment1.md
index f650c9752..116026411 100644
--- a/02_activities/assignments/DC_Cohort/Assignment1.md
+++ b/02_activities/assignments/DC_Cohort/Assignment1.md
@@ -207,7 +207,14 @@ Link if you encounter a paywall: https://archive.is/srKHV or https://web.archive
Consider, for example, concepts of fariness, inequality, social structures, marginalization, intersection of technology and society, etc.
+There are a lot of value systems that are in databases/data systems that I encounter in my day-to-day life. Often, I do not think of these systems but after reading this article, these systems are so important in how databases are created and how society functions.
-```
-Your thoughts...
-```
+The most immediate example I can think of is for any government id or identity/account creation. Most databases include only legal names and binary gender fields (although this has now changed to X for example on passports to include everyone who does not fit into this binary system). This is still stringent as there could be more flexibility for chosen names or even within in the non-binary category, X does not encompass all of the differences within this category.
+
+Similarly, for a lot of surveys I complete. Often race/ethnicity options have an 'Other' category, income brackets are centered around middle-class assumptions, marital status does not account for common law or other non-traditional relationships etc. These are all categories, while improving, could be expanded on instead of provided binary options.
+
+Another example I really thought about was banking. Historically, until the late 70s, women could not open a credit card in their own name unless there was a male cosign. This would have impacted databases and significant reform would have had to happen to ensure the system would allow me to bank as I do today.
+
+Lastly for healthcare. A lot of diagnostic tools and symptoms are predominantly based on biological male research data. If medical records and past databases are used to train machine learning tools for improving future healthcare systems, this could be grossly misinterpreting the biological female category or any other non-binary category.
+
+This article has made me realize that all of these value systems and databases are originally designed around a "normal" or "default" user. When the user differs from this default, it can have large consequences on how the database functions. As technology advances, we need to be aware that not everyone and not every database can assume a default user.
diff --git a/02_activities/assignments/DC_Cohort/Assignment2.md b/02_activities/assignments/DC_Cohort/Assignment2.md
index 01f991d02..e2023e9d0 100644
--- a/02_activities/assignments/DC_Cohort/Assignment2.md
+++ b/02_activities/assignments/DC_Cohort/Assignment2.md
@@ -55,11 +55,30 @@ The store wants to keep customer addresses. Propose two architectures for the CU
**HINT:** search type 1 vs type 2 slowly changing dimensions.
-```
-Your answer...
-```
-
-***
+Type 1 will overwrite existing data with new data whereas Type 2 will retain changes by creating a new row for changes and will keep the full history.
+
+For example for Type 1 under customer address you could have the following rows
+
+address_id
+customer_id
+street
+city
+province
+postal_code
+country
+
+But, for Type 2 you could have the following rows under customer address.
+
+address_id
+customer_id
+street
+city
+province
+postal_code
+country
+start_date
+end_date
+is_current
## Section 2:
You can start this section following *session 4*.
@@ -189,7 +208,11 @@ Read: Boykis, V. (2019, October 16). _Neural nets are just people all the way do
Consider, for example, concepts of labour, bias, LLM proliferation, moderating content, intersection of technology and society, ect.
+There are a lot of ethical concepts touched upon in this story. As discussed, briefly in the slides (Slides #6 in the DSI course), human labour is one of the largest contributors to the development of any machine learning or large database. Much of the labour (for example in the article discussing humans selecting through thousands of images for training ImageNet) is the labour that is often invisible. We hear about the codes that are developed or the way developers are making the AI systems smarter and working faster, but this is often due to human input helping train the models. The individuals training the models are probably not paid as high as the developers themselves.
+
+In addition, the development of machine learning or databases are built on human biases. Much like the reflection from Assignment 1, humans have pre-conceptions and the choices/labels/determinations they have can influence how the model functions or how the databases are setup (for example databases only allowing options of male vs female). This is critical because for a system that is supposed to be "smarter" and allow us to save time seems to be built on a system that is already prone to unfairness and biases that may inform classifications that are unfair. Unless this is addressed fully, we cannot depend on machine learning and AI as an all-knowing entity.
+
+Furthermore, one of my concerns is how anonymous AI models can be. If there is a clear bias or problem in the program and if the system is built on human-inputted information where does the blame go? How can companies moderate content and ensure it is safe/reliable without bias. Specifically, the example of ImageNET having unsafe or socially inappropriate categories that were labelled. My concern with this is how to ensure the models we are training currently account for that potential issue. Another ethical concern is regarding data privacy and consent. Any image or content can be put into these models. How is it controlled and how is consent provided/revoked? Does digital consent exist at this stage with the training of these models? Especially in the example of ImageNet where they were pulling images from the internet to train their model.
+
+Lastly, reading this article I was struck about how robots struggled to fold laundry and how what we think are menial tasks are incredibly difficult for these models. At the end of the day, the brains behind these processes are not AI, but rather humans themselves. For something that is seen as such a gold standard, this article really highlighted and brough to awareness for me that a lot of these advances are built on the work of many who may not be adequately recognized.
-```
-Your thoughts...
-```
diff --git a/02_activities/assignments/DC_Cohort/Section1_LogicalDataModel.drawio b/02_activities/assignments/DC_Cohort/Section1_LogicalDataModel.drawio
new file mode 100644
index 000000000..f28929fe6
--- /dev/null
+++ b/02_activities/assignments/DC_Cohort/Section1_LogicalDataModel.drawio
@@ -0,0 +1,40 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/02_activities/assignments/DC_Cohort/Section1_LogicalDataModel.pdf b/02_activities/assignments/DC_Cohort/Section1_LogicalDataModel.pdf
new file mode 100644
index 000000000..49e967ca7
Binary files /dev/null and b/02_activities/assignments/DC_Cohort/Section1_LogicalDataModel.pdf differ
diff --git a/02_activities/assignments/DC_Cohort/assignment1.sql b/02_activities/assignments/DC_Cohort/assignment1.sql
index 2ec561e2a..2ccfdd397 100644
--- a/02_activities/assignments/DC_Cohort/assignment1.sql
+++ b/02_activities/assignments/DC_Cohort/assignment1.sql
@@ -2,39 +2,38 @@
--Please write responses between the QUERY # and END QUERY blocks
/* SECTION 2 */
-
--SELECT
/* 1. Write a query that returns everything in the customer table. */
--QUERY 1
-
-
+SELECT *
+FROM customer;
--END QUERY
-
/* 2. Write a query that displays all of the columns and 10 rows from the customer table,
sorted by customer_last_name, then customer_first_ name. */
--QUERY 2
-
-
+SELECT *
+FROM customer
+ORDER BY customer_last_name, customer_first_name
+LIMIT 10;
--END QUERY
-
--WHERE
/* 1. Write a query that returns all customer purchases of product IDs 4 and 9.
Limit to 25 rows of output. */
--QUERY 3
-
-
+SELECT *
+FROM customer_purchases
+WHERE product_id = 4 OR product_id = 9
+LIMIT 25;
--END QUERY
-
-
/*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty),
filtered by customer IDs between 8 and 10 (inclusive) using either:
1. two conditions using AND
@@ -43,12 +42,13 @@ Limit to 25 rows of output.
*/
--QUERY 4
-
-
+SELECT *, quantity * cost_to_customer_per_qty AS price
+FROM customer_purchases
+WHERE customer_id BETWEEN 8 and 10
+LIMIT 25;
--END QUERY
-
--CASE
/* 1. Products can be sold by the individual unit or by bulk measures like lbs. or oz.
Using the product table, write a query that outputs the product_id and product_name
@@ -56,36 +56,51 @@ columns and add a column called prod_qty_type_condensed that displays the word
if the product_qty_type is “unit,” and otherwise displays the word “bulk.” */
--QUERY 5
-
-
+SELECT product_id, product_name
+, CASE WHEN product_qty_type = 'unit'
+ THEN 'unit'
+ ELSE 'bulk'
+ END AS prod_qty_type_condensed
+
+FROM product;
--END QUERY
-
/* 2. We want to flag all of the different types of pepper products that are sold at the market.
add a column to the previous query called pepper_flag that outputs a 1 if the product_name
contains the word “pepper” (regardless of capitalization), and otherwise outputs 0. */
--QUERY 6
-
-
+SELECT product_id, product_name
+, CASE WHEN product_qty_type = 'unit'
+ THEN 'unit'
+ ELSE 'bulk'
+ END AS prod_qty_type_condensed
+
+, CASE WHEN LOWER(product_name) LIKE '%pepper%'
+ THEN 1
+ ELSE 0
+ END AS pepper_flag
+
+FROM product;
--END QUERY
-
--JOIN
/* 1. Write a query that INNER JOINs the vendor table to the vendor_booth_assignments table on the
vendor_id field they both have in common, and sorts the result by market_date, then vendor_name.
Limit to 24 rows of output. */
--QUERY 7
-
-
+SELECT *
+FROM vendor
+INNER JOIN vendor_booth_assignments
+ ON vendor.vendor_id = vendor_booth_assignments.vendor_id
+ORDER BY market_date, vendor_name
+LIMIT 24
--END QUERY
-
-
/* SECTION 3 */
-- AGGREGATE
@@ -93,12 +108,12 @@ Limit to 24 rows of output. */
at the farmer’s market by counting the vendor booth assignments per vendor_id. */
--QUERY 8
-
-
+SELECT vendor_id, count(*) AS booth_count
+FROM vendor_booth_assignments
+GROUP BY vendor_id;
--END QUERY
-
/* 2. The Farmer’s Market Customer Appreciation Committee wants to give a bumper
sticker to everyone who has ever spent more than $2000 at the market. Write a query that generates a list
of customers for them to give stickers to, sorted by last name, then first name.
@@ -106,12 +121,20 @@ of customers for them to give stickers to, sorted by last name, then first name.
HINT: This query requires you to join two tables, use an aggregate function, and use the HAVING keyword. */
--QUERY 9
+SELECT
+ customer_first_name
+,customer_last_name
+, SUM(quantity*cost_to_customer_per_qty) as total_spend
-
+FROM customer_purchases as cp
+INNER JOIN customer as c
+ ON c.customer_id = cp.customer_id
+GROUP BY cp.customer_id
+HAVING total_spend > 2000
+ORDER BY customer_last_name, customer_first_name;
--END QUERY
-
--Temp Table
/* 1. Insert the original vendor table into a temp.new_vendor and then add a 10th vendor:
Thomass Superfood Store, a Fresh Focused store, owned by Thomas Rosenthal
@@ -125,13 +148,20 @@ VALUES(col1,col2,col3,col4,col5)
*/
--QUERY 10
+-- Created table from the original
+CREATE TABLE temp.new_vendor AS
+SELECT *
+FROM vendor;
-
+-- Add the 10th vendor
+INSERT INTO temp.new_vendor
+ (vendor_id, vendor_name, vendor_type, vendor_owner_first_name, vendor_owner_last_name)
+VALUES
+ (10, 'Thomass Superfood Store', 'Fresh Focused', 'Thomas', 'Rosenthal');
--END QUERY
-
--- Date
+-- Date DO NOT COMPLETE
/*1. Get the customer_id, month, and year (in separate columns) of every purchase in the customer_purchases table.
HINT: you might need to search for strfrtime modifers sqlite on the web to know what the modifers for month
@@ -139,12 +169,8 @@ and year are!
Limit to 25 rows of output. */
--QUERY 11
-
-
-
--END QUERY
-
/* 2. Using the previous query as a base, determine how much money each customer spent in April 2022.
Remember that money spent is quantity*cost_to_customer_per_qty.
@@ -153,7 +179,4 @@ but remember, STRFTIME returns a STRING for your WHERE statement...
AND be sure you remove the LIMIT from the previous query before aggregating!! */
--QUERY 12
-
-
-
--END QUERY
diff --git a/02_activities/assignments/DC_Cohort/assignment2.sql b/02_activities/assignments/DC_Cohort/assignment2.sql
index f7515f625..96340c2c9 100644
--- a/02_activities/assignments/DC_Cohort/assignment2.sql
+++ b/02_activities/assignments/DC_Cohort/assignment2.sql
@@ -23,12 +23,12 @@ Edit the appropriate columns -- you're making two edits -- and the NULL rows wil
All the other rows will remain the same. */
--QUERY 1
-
-
+SELECT
+product_name || ', ' || COALESCE(product_size, '') || ' (' || COALESCE(product_qty_type, 'unit') || ')'
+FROM product;
--END QUERY
-
--Windowed Functions
/* 1. Write a query that selects from the customer_purchases table and numbers each customer’s
visits to the farmer’s market (labeling each market date with a different number).
@@ -41,8 +41,12 @@ HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK().
Filter the visits to dates before April 29, 2022. */
--QUERY 2
-
-
+SELECT
+ customer_id,
+ market_date,
+ ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date) AS visit_number
+FROM customer_purchases
+WHERE market_date < '2022-04-29';
--END QUERY
@@ -53,8 +57,15 @@ only the customer’s most recent visit.
HINT: Do not use the previous visit dates filter. */
--QUERY 3
-
-
+SELECT *
+FROM (
+ SELECT
+ customer_id,
+ market_date,
+ ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number
+ FROM customer_purchases
+) sub
+WHERE visit_number = 1;
--END QUERY
@@ -66,12 +77,16 @@ You can make this a running count by including an ORDER BY within the PARTITION
Filter the visits to dates before April 29, 2022. */
--QUERY 4
-
-
+SELECT
+ customer_id,
+ product_id,
+ market_date,
+ COUNT(*) OVER (PARTITION BY customer_id, product_id) AS times_bought
+FROM customer_purchases
+WHERE market_date < '2022-04-29';
--END QUERY
-
-- String manipulations
/* 1. Some product names in the product table have descriptions like "Jar" or "Organic".
These are separated from the product name with a hyphen.
@@ -85,17 +100,23 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for
Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */
--QUERY 5
-
-
+SELECT
+ product_name,
+ CASE
+ WHEN INSTR(product_name, '-') > 0
+ THEN TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1))
+ ELSE NULL
+ END AS description
+FROM product;
--END QUERY
-
/* 2. Filter the query to show any product_size value that contain a number with REGEXP. */
--QUERY 6
-
-
+SELECT *
+FROM product
+WHERE product_size REGEXP '[0-9]';
--END QUERY
@@ -111,13 +132,27 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling
with a UNION binding them. */
--QUERY 7
-
-
+WITH sales AS (
+ SELECT market_date, SUM(quantity) AS total_sales
+ FROM customer_purchases
+ GROUP BY market_date
+),
+ranked AS (
+ SELECT market_date, total_sales,
+ RANK() OVER (ORDER BY total_sales DESC) AS r_best,
+ RANK() OVER (ORDER BY total_sales ASC) AS r_worst
+ FROM sales
+)
+SELECT market_date, total_sales
+FROM ranked
+WHERE r_best = 1
+UNION
+SELECT market_date, total_sales
+FROM ranked
+WHERE r_worst = 1;
--END QUERY
-
-
/* SECTION 3 */
-- Cross Join
@@ -132,12 +167,15 @@ How many customers are there (y).
Before your final group by you should have the product of those two queries (x*y). */
--QUERY 8
-
-
+SELECT v.vendor_name, p.product_name, SUM(5 * vi.original_price) AS total_money
+FROM vendor_inventory vi
+JOIN vendor v ON vi.vendor_id = v.vendor_id
+JOIN product p ON vi.product_id = p.product_id
+CROSS JOIN customer c
+GROUP BY v.vendor_name, p.product_name;
--END QUERY
-
-- INSERT
/*1. Create a new table "product_units".
This table will contain only products where the `product_qty_type = 'unit'`.
@@ -145,34 +183,39 @@ It should use all of the columns from the product table, as well as a new column
Name the timestamp column `snapshot_timestamp`. */
--QUERY 9
-
-
+CREATE TABLE product_units AS
+SELECT *, CURRENT_TIMESTAMP AS snapshot_timestamp
+FROM product
+WHERE product_qty_type = 'unit';
--END QUERY
-
/*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp).
This can be any product you desire (e.g. add another record for Apple Pie). */
--QUERY 10
-
-
+INSERT INTO product_units
+SELECT *, CURRENT_TIMESTAMP
+FROM product
+WHERE product_name = 'Apple Pie';
--END QUERY
-
-- DELETE
/* 1. Delete the older record for the whatever product you added.
HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/
--QUERY 11
-
-
-
+DELETE FROM product_units
+WHERE product_name = 'Apple Pie'
+AND snapshot_timestamp < (
+ SELECT MAX(snapshot_timestamp)
+ FROM product_units
+ WHERE product_name = 'Apple Pie');
+
--END QUERY
-
-- UPDATE
/* 1.We want to add the current_quantity to the product_units table.
First, add a new column, current_quantity to the table using the following syntax.
@@ -191,8 +234,16 @@ Finally, make sure you have a WHERE statement to update the right row,
When you have all of these components, you can run the update statement. */
--QUERY 12
+ALTER TABLE product_units
+ADD current_quantity INT;
-
+UPDATE product_units
+SET current_quantity = COALESCE(
+ (SELECT vi.quantity
+ FROM vendor_inventory vi
+ WHERE vi.product_id = product_units.product_id
+ ORDER BY vi.market_date DESC
+ LIMIT 1), 0);
--END QUERY
diff --git a/04_this_cohort/live_code/module_2/module_2.sqbpro b/04_this_cohort/live_code/module_2/module_2.sqbpro
index a9e018190..bf0eb2288 100644
--- a/04_this_cohort/live_code/module_2/module_2.sqbpro
+++ b/04_this_cohort/live_code/module_2/module_2.sqbpro
@@ -3,43 +3,35 @@
/* 1. Select everything in the customer table */
-SELECT * FROM customer;
+SELECT
/* 2. Use sql as a calculator */
-SELECT 1+1 AS addition, 10*5 as multiplication, pi() as pi;
+
/* 3. Add order by and limit clauses */
-SELECT *
-FROM customer
-ORDER BY customer_first_name
-LIMIT 10;
+
/* 4. Select multiple specific columns */
-SELECT customer_first_name,customer_last_name
-FROM customer;
/* 5. Add a static value in a column */
-SELECT 2026 as this_year, 'March' as this_month, customer_id
-FROM customer
---------------------------------------------------------------------------------------------------------------------------------------------/* MODULE 2 */
+
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 2 */
/* WHERE */
/* 1. Select only customer 1 from the customer table */
SELECT *
FROM customer
-WHERE customer_id = 1;
+WHERE
/* 2. Differentiate between AND and OR */
-SELECT *
-FROM customer
-WHERE customer_id = 1
-OR customer_id = 2; -- OR is two rows, AND is 0 rows
+
/* 3. IN */
@@ -49,27 +41,15 @@ WHERE customer_id IN (3,4,5,6);
/* 4. LIKE */
--- all the peppers
-SELECT * FROM product
-WHERE product_name LIKE '%pepper%';
+
/* 5. Nulls and Blanks*/
-SELECT *
-FROM product
-WHERE product_size IS NULL -- null
-OR product_size = ''; -- blank, two single quotes not one double quote, different from NULL
+
/* 6. BETWEEN x AND y */
-SELECT *
-FROM customer
-WHERE customer_id BETWEEN 1 AND 20;
---dates
-SELECT *
-FROM market_date_info
-WHERE market_date BETWEEN '2022-10-01' AND '2022-10-31'
--------------------------------------------------------------------------------------------------------------------------------------------/* MODULE 2 */
/* CASE */
@@ -77,56 +57,37 @@ WHERE market_date BETWEEN '2022-10-01' AND '2022-10-31'
SELECT *
/* 1. Add a CASE statement declaring which days vendors should come */
-,CASE WHEN vendor_type = 'Fresh Focused' THEN 'Wednesday'
- WHEN vendor_type = 'Eggs & Meats' THEN 'Thursday'
- ELSE 'Saturday'
- END as day_of_specialty
/* 2. Add another CASE statement for Pie Day */
-,CASE WHEN vendor_name = "Annie's Pies" -- double quotes okay here
- THEN 'Annie is the best'
- END as pi_day
+
/* 3. Add another CASE statement with an ELSE clause to handle rows evaluating to False */
-,CASE WHEN vendor_name LIKE '%pie%'
- THEN 'Wendesday'
- ELSE 'Friday'
- END as another_pie_day
-FROM vendor;
+
/* 4. Experiment with selecting a different column instead of just a string value */
-SELECT *
-,CASE WHEN cost_to_customer_per_qty < 1.00
-THEN cost_to_customer_per_qty*5
-ELSE cost_to_customer_per_qty
-END AS inflation
-FROM customer_purchases
---------------------------------------------------------------------------------------------------------------------------------------------/* MODULE 2 */
+FROM vendor
+
+
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 2 */
/* DISTINCT */
/* 1. Compare how many customer_ids are the customer_purchases table, one select with distinct, one without */
-- 4221 rows
-SELECT customer_id FROM customer_purchases ;
-
-SELECT DISTINCT customer_id FROM customer_purchases;
+SELECT customer_id FROM customer_purchases
/* 2. Compare the difference between selecting market_day in market_date_info, with and without distinct:
what do these difference mean?*/
-SELECT market_day
-FROM market_date_info;
--- market is only open on 2 days, wed and sat
-SELECT DISTINCT market_day
-FROM market_date_info;
/* 3. Which vendor has sold products to a customer */
@@ -142,8 +103,6 @@ FROM customer_purchases;
/* 5. Which vendor has sold products to a customer
... and which product was it?
... AND to whom was it sold*/
-SELECT DISTINCT vendor_id, product_id, customer_id
-FROM customer_purchases
--------------------------------------------------------------------------------------------------------------------------------------------/* MODULE 2 */
/* INNER JOIN */
@@ -153,16 +112,7 @@ FROM customer_purchases
... use an INNER JOIN to see only products that have been purchased */
-- without table aliases
-SELECT product_name, -- coming from the product table
-vendor_id, -- rest of these are coming from the customer_purchases table
-market_date,
-customer_id,
-customer_purchases.product_id,
-product.product_id
-FROM product
-INNER JOIN customer_purchases
- ON customer_purchases.product_id = product.product_id;
@@ -172,32 +122,17 @@ INNER JOIN customer_purchases
Add customers' first and last names with an INNER JOIN */
-- using table aliases
-SELECT DISTINCT
-vendor_id, -- coming from cp
-product_id,
-c.customer_id, -- coming from c
-customer_first_name,
-customer_last_name
-
-FROM customer_purchases AS cp
-INNER JOIN customer AS c
- ON c.customer_id = cp.customer_id
-
---------------------------------------------------------------------------------------------------------------------------------------------/* MODULE 2 */
+
+
+
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 2 */
/* LEFT JOIN */
/* 1. There are products that have been bought
... but are there products that have not been bought?
Use a LEFT JOIN to find out*/
-SELECT DISTINCT
-p.product_id
-,cp.product_id as [cp.product_id]
-,product_name
-
-FROM product as p
-LEFT JOIN customer_purchases as cp
- ON p.product_id = cp.product_id;
/* 2. Directions of LEFT JOINs matter ...*/
@@ -206,25 +141,21 @@ p.product_id
,cp.product_id as [cp.product_id]
,product_name
+
FROM customer_purchases as cp
LEFT JOIN product as p
ON p.product_id = cp.product_id;
-
-
-
+
+-- no number in customer purchases to begin with so will not be included
+
/* 3. As do which values you filter on ... */
-SELECT DISTINCT
-p.product_id
-,cp.product_id as [cp.product_id]
-,product_name
+
FROM product as p
LEFT JOIN customer_purchases as cp
ON p.product_id = cp.product_id
-
-WHERE p.product_id BETWEEN 1 AND 6 -- if we pick product, 6 rows (1-6) but if we pick cp....only 5 rows because zinnias never existed in customer purchases table
-
-
+
+WHERE p.product_id BETWEEN 1 AND 6; -- if we pick product, 6 rows (1-6) but if we pick cp...only 5 rows because zinnias never existed in customer_purchases
/* 4. Without using a RIGHT JOIN, make this query return the RIGHT JOIN result set
...**Hint, flip the order of the joins** ...
@@ -239,14 +170,10 @@ LEFT JOIN product AS p
...Note how the row count changed from 24 to 23
*/
-SELECT *
-FROM product as p
-LEFT JOIN product_category as pc
- ON pc.product_category_id = p.product_category_id
- ORDER by pc.product_category_id
---------------------------------------------------------------------------------------------------------------------------------------------/* MODULE 2 */
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 2 */
/* Multiple Table JOINs */
@@ -254,42 +181,14 @@ LEFT JOIN product_category as pc
(Which vendor has sold products to a customer AND which product was it AND to whom was it sold)
Replace all the IDs (customer, vendor, and product) with the names instead*/
-SELECT DISTINCT
---vendor_id
-vendor_name
-,p.product_id
-,product_name
-,c.customer_id
-,customer_first_name
-,customer_last_name
-FROM customer_purchases as cp
-INNER JOIN vendor as v
- ON v.vendor_id = cp.vendor_id
-INNER JOIN product as p
- ON p.product_id = cp.product_id
-INNER JOIN customer as c
- ON c.customer_id = cp.customer_id;
/* 2. Select product_category_name, everything from the product table, and then LEFT JOIN the customer_purchases table
... how does this LEFT JOIN affect the number of rows?
-Why do we have more rows now?*/
-SELECT product_category_name, p.*
-, cp.product_id as productid_in_cust_purchases_table
-
-FROM product_category as pc
-INNER JOIN product as p -- will give us product_name, product_size, product_qty_type
- ON p.product_category_id = pc.product_category_id
-LEFT JOIN customer_purchases as cp
- ON cp.product_id = p.product_id
-
-
-ORDER by cp.product_id
-
---------------------------------------------------------------------------------------------------------------------------------------------
+Why do we have more rows now?*/
-
+
diff --git a/04_this_cohort/live_code/module_3/module_3.sqbpro b/04_this_cohort/live_code/module_3/module_3.sqbpro
index 01fd0ba4c..8d5b8b169 100644
--- a/04_this_cohort/live_code/module_3/module_3.sqbpro
+++ b/04_this_cohort/live_code/module_3/module_3.sqbpro
@@ -1,3 +1,326 @@
+<<<<<<< HEAD
+/* MODULE 3 */
+/* COUNT */
+
+
+/* 1. Count the number of products */
+
+ SELECT COUNT(product_id) as num_of_products
+ FROM product;
+
+
+/* 2. How many products per product_qty_type */
+
+SELECT product_qty_type, COUNT(product_id) as num_of_products
+FROM product
+GROUP BY product_qty_type;
+
+/* 3. How many products per product_qty_type and per their product_size */
+
+SELECT product_size
+,product_qty_type,
+COUNT(product_id) as num_of_products
+FROM product
+GROUP BY product_size, product_qty_type;
+
+
+/* COUNT DISTINCT
+ 4. How many unique products were bought */
+
+SELECT COUNT(DISTINCT product_id) as bought_products
+FROM customer_purchases;
+
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 3 */
+/* SUM & AVG */
+
+
+/* 1. How much did customers spend each day */
+
+ SELECT
+market_date
+,customer_id
+,SUM(quantity*cost_to_customer_per_qty) as total_spend
+
+FROM customer_purchases
+GROUP BY market_date, customer_id;
+
+/* 2. How much does each customer spend on average */
+
+SELECT
+customer_first_name
+,customer_last_name
+,ROUND(AVG(quantity*cost_to_customer_per_qty),2) as avg_spend
+
+FROM customer_purchases as cp
+INNER JOIN customer as c
+ ON c.customer_id = cp.customer_id
+
+GROUP BY c.customer_id
+
+
+
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 3 */
+/* MIN & MAX */
+
+
+/* 1. What is the most expensive product
+...pay attention to how it doesn't handle ties very well
+*/
+
+SELECT product_name, max(original_price) as most_expensive
+FROM vendor_inventory as vi
+INNER JOIN product as p
+ ON p.product_id = vi.product_id;
+
+
+/* 2. Prove that max is working */
+
+SELECT DISTINCT
+product_name,
+original_price
+FROM vendor_inventory as vi
+INNER JOIN product as p
+ ON p.product_id = vi.product_id
+
+ORDER BY original_price DESC;
+
+/* 3. Find the minimum price per each product_qty_type */
+
+SELECT product_name
+,product_qty_type
+,MIN(original_price)
+
+FROM vendor_inventory as vi
+INNER JOIN product as p
+ ON p.product_id = vi.product_id
+
+GROUP BY product_qty_type
+
+
+/* 4. Prove that min is working */
+
+SELECT DISTINCT
+ product_name
+,product_qty_type
+--,MIN(original_price);
+,original_price
+
+FROM vendor_inventory as vi
+INNER JOIN product as p
+ ON p.product_id = vi.product_id
+
+ORDER BY product_qty_type, original_price;
+
+
+/* 5. Min/max on a string
+... not particularly useful? */
+
+SELECT max(product_name)
+FROM product;
+
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 3 */
+/* Arithmitic */
+
+
+/* 1. power, pi(), ceiling, division, integer division, etc */
+SELECT power(4,2), pi();
+
+SELECT 10.0 / 3.0 as division,
+CAST(10.0 as INT) / CAST(3.0 as INT) as integer_division;
+
+
+
+/* 2. Every even vendor_id with modulo */
+SELECT * FROM vendor
+WHERE vendor_id % 2 = 0;
+
+
+/* 3. What about every third? */
+
+SELECT * FROM vendor
+WHERE vendor_id % 3 = 0;
+
+
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 3 */
+/* HAVING */
+
+
+/* 1. How much did a customer spend on each day?
+Filter to customer_id between 1 and 5 and total_cost > 50
+... What order of execution occurs?*/
+
+SELECT
+market_date
+,customer_id
+,SUM(quantity*cost_to_customer_per_qty) as total_spend
+
+FROM customer_purchases
+WHERE customer_id BETWEEN 1 AND 5
+GROUP BY market_date, customer_id
+HAVING total_spend > 50;
+
+/* 2. How many products were bought?
+Filter to number of purchases between 300 and 500 */
+
+SELECT count(product_id) as num_of_prod
+,product_id
+FROM customer_purchases
+GROUP BY product_id
+HAVING count(product_id) BETWEEN 300 AND 500 -- the same as putting "num_of_prod" but not all versions accept
+
+
+
+
+
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 3 */
+/* Subquery FROM */
+
+
+/*1. Simple subquery in a FROM statement, e.g. for inflation
+...we could imagine joining this to a more complex query perhaps */
+
+
+SELECT DISTINCT
+product_id
+,inflation
+
+FROM (
+ SELECT product_id, cost_to_customer_per_qty,
+ CASE WHEN cost_to_customer_per_qty < 1.00 THEN cost_to_customer_per_qty*5
+ ELSE cost_to_customer_per_qty END as inflation
+
+ FROM customer_purchases
+);
+
+/* 2. What is the single item that has been bought in the greatest quantity?*/
+
+
+--outer QUERY
+SELECT product_name -- coming from product table
+,MAX(quantity_purchased) -- coming from the subquery ("x")
+
+FROM product AS p
+INNER JOIN (
+--inner query
+ SELECT product_id
+ ,count(quantity) as quantity_purchased
+
+ FROM customer_purchases
+ GROUP BY product_id
+) AS x ON p.product_id = x.product_id
+/* MODULE 3 */
+/* Subquery WHERE */
+
+
+/* 1. How much did each customer spend at each vendor for each day at the market WHEN IT RAINS */
+
+SELECT market_date
+,customer_id
+,vendor_id
+,SUM(quantity*cost_to_customer_per_qty) as total_spent
+
+FROM customer_purchases
+
+-- filter by rain_flag
+-- "what dates was it raining?"
+WHERE market_date IN
+(
+ SELECT market_date
+ FROM market_date_info
+ WHERE market_rain_flag = 1
+)
+GROUP BY market_date,vendor_id, customer_id
+
+/* 2. What is the name of the vendor who sells pie */
+
+SELECT DISTINCT vendor_name
+
+FROM customer_purchases as cp
+INNER JOIN vendor as v
+ ON cp.vendor_id = v.vendor_id
+
+WHERE product_id IN (
+ SELECT product_id
+ FROM product
+ WHERE product_name LIKE '%pie%'
+ )
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 3 */
+/* Temp Tables */
+
+
+/* 1. Put our inflation query into a temp table, e.g. as temp.new_vendor_inventory*/
+
+/* some structural code */
+/* ...heads up, sometimes this query can be finnicky -- it's good to try highlighting different sections to help it succeed...*/
+
+-- if a table named new_vendor_inventory exists, delete it, otherwise do NOTHING
+DROP TABLE IF EXISTS temp.new_vendor_inventory;
+
+--make the table
+CREATE TABLE temp.new_vendor_inventory AS
+
+-- definition of the table
+
+SELECT *,
+original_price*5 as inflation
+FROM vendor_inventory;
+
+
+/* 2. put the previous table into another temp table, e.g. as temp.new_new_vendor_inventory */
+
+DROP TABLE IF EXISTS temp.new_new_vendor_inventory;
+CREATE TABLE temp.new_new_vendor_inventory AS
+
+SELECT *
+,inflation * 2 as super_inflation
+FROM temp.new_vendor_inventory
+
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 3 */
+/* Common Table Expression (CTE) */
+
+
+/* 1. Calculate sales per vendor per day */
+SELECT
+
+
+
+
+
+/* ... re-aggregate the daily sales for each WEEK instead now */
+
+
+
+--------------------------------------------------------------------------------------------------------------------------------------------
+/* MODULE 3 */
+/* Date functions */
+
+
+/* 1. now */
+SELECT
+
+
+/* 2. strftime */
+
+
+
+/* 3. adding dates, e.g. last date of the month */
+
+
+
+/* 4. difference between dates,
+ a. number of days between now and each market_date
+ b. number of YEARS between now and market_date
+ c. number of HOURS bewtween now and market_date
+ */
+
+=======
/* MODULE 3 */
/* COUNT */
@@ -357,3 +680,4 @@ FROM market_date_info;
FROM market_date_info
+>>>>>>> a9a8d6ff02d88a1ce013678a5297c3e319efa373
diff --git a/05_src/sql/farmersmarket.db b/05_src/sql/farmersmarket.db
index 4720f2483..844499fed 100644
Binary files a/05_src/sql/farmersmarket.db and b/05_src/sql/farmersmarket.db differ