Skip to content

refactor: enable deep username scan for user_scan modules#355

Draft
kaifcodec wants to merge 3 commits into
mainfrom
feature/deep-user-scan
Draft

refactor: enable deep username scan for user_scan modules#355
kaifcodec wants to merge 3 commits into
mainfrom
feature/deep-user-scan

Conversation

@kaifcodec
Copy link
Copy Markdown
Owner

@kaifcodec kaifcodec commented May 31, 2026

  • Dynamic OSINT Metadata Extraction: Upgraded 20 Social, Developer, and Music check modules to extract public profile metadata (bios, locations, follower/following counts, join dates, and avatar URLs) dynamically.
  • Defensive Parsing Wrappers: Wrapped all JSON, GraphQL, and HTML regex parsers in try-except scopes to ensure graceful degradation to basic taken status if site structures change.
  • API Rate-Limit Bypasses: Migrated modules (like GitHub) from signup APIs to public profile scraping to prevent unauthenticated API rate limits.
  • Upgraded Social Modules (user_scan/social/):
    • anilist, bluesky, mastodon, openstreetmap: Scrapes GraphQL, NEXT_DATA, and HTML to pull detailed bio, location, followers count, and avatars.
    • 35photo, pinterest, snapchat: Extracts photographer stats, redux state JSON, and snapcode SVG profiles.
  • Upgraded Developer Modules (user_scan/dev/):
    • github: Direct profile scraping of names, bios, location, public emails, and social links using schemas (itemprop="image") to avoid sponsor avatar conflicts.
    • gitlab, huggingface, dockerhub, cratesio: Switched to public APIs to capture user IDs, joined dates, active states, and avatar URLs.
  • Upgraded Music Modules (user_scan/music/):
    • soundcloud, bandcamp: Extracts embedded hydration states and data-blobs containing full user details, followers, and play statistics.
    • discogs, bandlab, freesound, lastfm, audiojungle, beatstars: Scrapes AJAX stats, public REST APIs, and profile headers to extract comprehensive artist registries and music metrics.

TODO:

  • Refactor and mgrate rest of the modules

  • Test all the modules if any data extraction left behind or got ignored

  • Update docs

@kaifcodec kaifcodec requested a review from VamatoHD May 31, 2026 15:50
@kaifcodec kaifcodec added the enhancement New feature or request label May 31, 2026
@kaifcodec
Copy link
Copy Markdown
Owner Author

@VamatoHD You can add deep scan for other modules as well.
I will be refactoring new modules whenever I get time.

@kaifcodec kaifcodec force-pushed the feature/deep-user-scan branch 2 times, most recently from ed1bd23 to b8695ae Compare May 31, 2026 17:54
@kaifcodec kaifcodec force-pushed the feature/deep-user-scan branch from b8695ae to 379ce08 Compare May 31, 2026 17:57
@VamatoHD
Copy link
Copy Markdown
Collaborator

@kaifcodec After some testing, there is a slight problem: some extras, such as bios, are multi-line and mess up the visual, such as

 [βœ”] Anilist (___): Found
      β”œβ”€β”€ id: ___
      β”œβ”€β”€ about: ~~~

✧ ____ ✧

~!
☁️☁️☁️
on AniList since __.__.__

youtube(https://www.youtube.com/watch?v=___)
youtube(https://www.youtube.com/watch?v=___)
!~

How should they be handled?

@kaifcodec
Copy link
Copy Markdown
Owner Author

Yeah, that is definitely an issue we need to catch.

If a module returns data containing newlines (\n), we'll need to update the extra_display logic to handle it gracefully. The plan is to parse any value string containing \n and replace it with a newline combined with the correct indentation level.

Basically, we want subsequent lines to align perfectly directly below the first line of that key's block. This will keep multi-line details visually confined inside an "invisible square" layout so it doesn't break the CLI tree structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants