Skip to content

add solutions for problems 1, 2, and 3#29

Open
krishnadheerajkrovi wants to merge 1 commit into
super30admin:mainfrom
krishnadheerajkrovi:main
Open

add solutions for problems 1, 2, and 3#29
krishnadheerajkrovi wants to merge 1 commit into
super30admin:mainfrom
krishnadheerajkrovi:main

Conversation

@krishnadheerajkrovi

Copy link
Copy Markdown

Completed Pandas10

@super30admin

Copy link
Copy Markdown
Owner

Let's evaluate each solution one by one.

For problem1.py:

  • The problem requires grouping by 'sell_date' and aggregating the number of unique products and a sorted, comma-separated list of unique product names.
  • The student uses groupby and agg with 'nunique' for counting unique products and a lambda function to create the sorted comma-separated string. This is correct.
  • The lambda function uses sorted(x.unique()) which first gets unique products and then sorts them. This is efficient and meets the requirement.
  • The code then sorts by 'sell_date' and returns the result. The problem states the result should be sorted by sell_date, so this is correct.
  • Time complexity: O(n log n) due to the sorting of unique products for each group. However, the number of groups is likely small compared to the data size. The overall groupby operation is O(n), but the sorting per group adds O(m log m) per group, where m is the number of unique products in that group. But since the groups are dates and products per date are limited, it should be efficient.
  • Space complexity: O(n) for the grouped data and the resulting DataFrame.
  • Code quality: Good. The code is concise and readable. However, using a lambda might be less efficient than using a predefined function. Alternatively, we could use a custom aggregation function like lambda x: ','.join(sorted(set(x))) but the student's approach is similar. Actually, using x.unique() first is efficient as it reduces the data before sorting.

For problem2.py:

  • The problem requires grouping by 'date_id' and 'make_name', then counting unique leads and unique partners.
  • The student uses groupby and agg with 'nunique' for both columns. This is correct and efficient.
  • The code returns the result directly without sorting. The problem does not specify sorting, so it's acceptable. However, the output might be expected in a specific order (like sorted by date_id and make_name) but since not specified, it's fine.
  • Time complexity: O(n) for the groupby operation.
  • Space complexity: O(n) for the grouped data.
  • Code quality: Good. The code is very concise and clear.

For problem3.py:

  • The problem requires finding actor-director pairs who have collaborated at least 3 times.
  • The student groups by ['actor_id', 'director_id'] and uses size() to count the number of collaborations. Then filters for counts >=3 and returns the required columns.
  • This is correct and efficient.
  • Time complexity: O(n) for the groupby and O(n) for the filtering.
  • Space complexity: O(n) for the intermediate DataFrame.
  • Code quality: Good. The code is clear. However, the student uses .size() which counts including NaN values? But in this dataset, there should be no NaN for these columns since it's about collaborations. So it's safe.

General feedback:

  • The student's solutions are correct and efficient for all three problems.
  • The code is concise and uses appropriate pandas functions.
  • The student includes comments explaining the approach, which is good for readability.
  • There are no major issues. The solutions follow standard pandas best practices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants