-
Notifications
You must be signed in to change notification settings - Fork 0
Parse actor data from delimited string format #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Copilot
wants to merge
6
commits into
main
Choose a base branch
from
copilot/update-actor-data
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
1333565
Initial plan
Copilot 28257d2
Add actor data parser with tests and documentation
Copilot a991185
Add .gitignore and remove __pycache__ artifacts
Copilot 8859afd
Fix parser to correctly extract single valid actor name
Copilot 95eccd2
Update README.md
Darliewithrow 97143a0
Merge branch 'main' into copilot/update-actor-data
Darliewithrow File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| # Python | ||
| __pycache__/ | ||
| *.py[cod] | ||
| *$py.class | ||
| *.so | ||
| .Python | ||
| build/ | ||
| develop-eggs/ | ||
| dist/ | ||
| downloads/ | ||
| eggs/ | ||
| .eggs/ | ||
| lib/ | ||
| lib64/ | ||
| parts/ | ||
| sdist/ | ||
| var/ | ||
| wheels/ | ||
| *.egg-info/ | ||
| .installed.cfg | ||
| *.egg | ||
|
|
||
| # Virtual environments | ||
| venv/ | ||
| ENV/ | ||
| env/ | ||
| .venv | ||
|
|
||
| # IDEs | ||
| .vscode/ | ||
| .idea/ | ||
| *.swp | ||
| *.swo | ||
| *~ | ||
|
|
||
| # OS | ||
| .DS_Store | ||
| Thumbs.db |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,111 @@ | ||
| https://app.chime.com/link/qr?u=Darlie-Withrow | ||
| # Actor Data Parser | ||
|
|
||
| This repository contains a Python script to parse actor data from a formatted string. | ||
|
|
||
| ## Problem Statement | ||
|
|
||
| Parse actor information from the following format: | ||
| ``` | ||
| actor:Daractor:Darliewithrowliewithrowactor:Darliewithrowactor:Darliewithrow | ||
| ``` | ||
|
|
||
| ## Solution | ||
|
|
||
| The `actor_parser.py` script parses the input string by splitting on the `actor:` delimiter, then intelligently filters out corrupted and fragmented actor names to extract only valid actor names. | ||
|
|
||
| ### Usage | ||
|
|
||
| Run with default data: | ||
| ```bash | ||
| python3 actor_parser.py | ||
| ``` | ||
|
|
||
| Run with data from a file: | ||
| ```bash | ||
| python3 actor_parser.py actor_data.txt | ||
| ``` | ||
|
|
||
| ### Output | ||
|
|
||
| ``` | ||
| Parsed Actors: | ||
| 1. Darliewithrow | ||
|
|
||
| Total unique actors: 1 | ||
| ``` | ||
|
|
||
| ## Testing | ||
|
|
||
| Run the test suite: | ||
|
|
||
| ```bash | ||
| python3 test_actor_parser.py | ||
| ``` | ||
|
|
||
| ## Implementation Details | ||
|
|
||
| - The parser splits the input string by `actor:` delimiter | ||
| - Identifies and filters out corrupted names with internal repetitions | ||
| - Removes fragment names that are prefixes of longer valid names | ||
| - Maintains unique actors in order of first appearance | ||
| - Returns a list of valid actor names | ||
|
|
||
| ## License | ||
|
|
||
| This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details. | ||
| # Actor Data Parser | ||
|
|
||
| This repository contains a Python script to parse actor data from a formatted string. | ||
|
|
||
| ## Problem Statement | ||
|
|
||
| Parse actor information from the following format: | ||
| ``` | ||
| actor:Daractor:Darliewithrowliewithrowactor:Darliewithrowactor:Darliewithrow | ||
| ``` | ||
|
|
||
| ## Solution | ||
|
|
||
| The `actor_parser.py` script parses the input string by splitting on the `actor:` delimiter, then intelligently filters out corrupted and fragmented actor names to extract only valid actor names. | ||
|
|
||
| ### Usage | ||
|
|
||
| Run with default data: | ||
| ```bash | ||
| python3 actor_parser.py | ||
| ``` | ||
|
|
||
| Run with data from a file: | ||
| ```bash | ||
| python3 actor_parser.py actor_data.txt | ||
| ``` | ||
|
|
||
| ### Output | ||
|
|
||
| ``` | ||
| Parsed Actors: | ||
| 1. Darliewithrow | ||
|
|
||
| Total unique actors: 1 | ||
| ``` | ||
|
|
||
| ## Testing | ||
|
|
||
| Run the test suite: | ||
|
|
||
| ```bash | ||
| python3 test_actor_parser.py | ||
| ``` | ||
|
|
||
| ## Implementation Details | ||
|
|
||
| - The parser splits the input string by `actor:` delimiter | ||
| - Identifies and filters out corrupted names with internal repetitions | ||
| - Removes fragment names that are prefixes of longer valid names | ||
| - Maintains unique actors in order of first appearance | ||
| - Returns a list of valid actor names | ||
|
|
||
| ## License | ||
|
|
||
| This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details. | ||
| https://app.chime.com/link/qr?u=Darlie-Withrow | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| actor:Daractor:Darliewithrowliewithrowactor:Darliewithrowactor:Darliewithrow |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,104 @@ | ||||||
| #!/usr/bin/env python3 | ||||||
| """ | ||||||
| Actor Data Parser | ||||||
|
|
||||||
| This script parses actor data from a formatted string. | ||||||
| The input format is: actor:<name>actor:<name>... | ||||||
| """ | ||||||
|
|
||||||
| import re | ||||||
|
Darliewithrow marked this conversation as resolved.
Darliewithrow marked this conversation as resolved.
|
||||||
| import sys | ||||||
| from typing import List, Set | ||||||
|
||||||
| from typing import List, Set | |
| from typing import List |
Darliewithrow marked this conversation as resolved.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| #!/usr/bin/env python3 | ||
| """ | ||
| Test suite for Actor Data Parser | ||
| """ | ||
|
|
||
| import unittest | ||
| from actor_parser import parse_actors | ||
|
|
||
|
|
||
| class TestActorParser(unittest.TestCase): | ||
| """Test cases for the actor parser""" | ||
|
|
||
| def test_simple_actors(self): | ||
| """Test parsing simple actor list""" | ||
| data = "actor:John actor:Jane actor:Bob" | ||
| expected = ["John", "Jane", "Bob"] | ||
| self.assertEqual(parse_actors(data), expected) | ||
|
|
||
| def test_duplicate_actors(self): | ||
| """Test that duplicate actors are removed""" | ||
| data = "actor:Alice actor:Bob actor:Alice" | ||
| expected = ["Alice", "Bob"] | ||
| self.assertEqual(parse_actors(data), expected) | ||
|
|
||
| def test_empty_string(self): | ||
| """Test parsing empty string""" | ||
| self.assertEqual(parse_actors(""), []) | ||
|
|
||
| def test_single_actor(self): | ||
| """Test parsing single actor""" | ||
| data = "actor:SingleActor" | ||
| expected = ["SingleActor"] | ||
| self.assertEqual(parse_actors(data), expected) | ||
|
|
||
| def test_problem_statement_data(self): | ||
| """Test the actual problem statement data""" | ||
| data = "actor:Daractor:Darliewithrowliewithrowactor:Darliewithrowactor:Darliewithrow" | ||
| result = parse_actors(data) | ||
| # Should parse into distinct actors | ||
| self.assertIsInstance(result, list) | ||
| self.assertGreater(len(result), 0) | ||
| # Check that Darliewithrow is in the results | ||
| self.assertIn("Darliewithrow", result) | ||
|
|
||
| def test_actors_with_whitespace(self): | ||
| """Test parsing actors with whitespace""" | ||
| data = "actor: SpaceActor actor:NoSpace " | ||
| result = parse_actors(data) | ||
| self.assertIn("SpaceActor", result) | ||
| self.assertIn("NoSpace", result) | ||
|
|
||
| def test_no_actor_prefix(self): | ||
| """Test string without actor prefix""" | ||
| data = "JustAName" | ||
| expected = ["JustAName"] | ||
| self.assertEqual(parse_actors(data), expected) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| unittest.main() |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.