Skip to content

Add built-in Python source analyzer (sgraph.analyzers)#157

Open
villelaitila wants to merge 1 commit into
softagram:mainfrom
villelaitila:feature/code-analysis-for-python
Open

Add built-in Python source analyzer (sgraph.analyzers)#157
villelaitila wants to merge 1 commit into
softagram:mainfrom
villelaitila:feature/code-analysis-for-python

Conversation

@villelaitila
Copy link
Copy Markdown
Contributor

@villelaitila villelaitila commented Jun 1, 2026

Summary

Adds sgraph.analyzers, a built-in analysis layer that turns a Python project directory into an SGraph model directly — no external analyzer required.

from sgraph.analyzers import analyze_python

result = analyze_python("./src")
print(result.summary())
result.graph.to_xml("model.xml")

Implementation technique

  • Standard-library ast only — no third-party Python parser. Each file is parsed with ast.parse and walked by an ast.NodeVisitor subclass.

  • Structural pattern matching (match/case) drives name, decorator and type-annotation extraction over AST expression nodes.

  • Two-phase pipeline:

    1. Discover + parse every file, build the element tree, and register each module in a registry.
    2. Resolve the collected import statements into dependency edges once the whole registry is known.

    Deferring import resolution to phase 2 means forward references and cross-module imports resolve regardless of file processing order. Duplicate edges are suppressed and self-references skipped.

Architecture

Module Responsibility
analyzers/base.py Framework-agnostic types: AnalyzerConfig (root path, include/exclude globs, external-import + stdlib toggles), AnalysisResult (graph + errors + stats + summary()), AnalysisError, SourceLocation, and the AnalysisLevel enum.
analyzers/code/base.py Language-agnostic source plumbing: glob-based discovery with exclude filtering, file-path → module-path mapping, encoding-tolerant reading (utf-8 with latin-1 fallback).
analyzers/code/python/python_analyzer.py Orchestrates the two phases; maps the filesystem into SElements (packages from __init__.py, modules from files, nested classes/functions/methods) and builds the import/from_import SElementAssociations.
analyzers/code/python/ast_visitor.py Creates SElements per scope according to the configured level; records decorators, parameters and return-type annotations at FULL.
analyzers/code/python/import_resolver.py Resolves absolute and relative imports (dot-level aware) against the module registry with parent-module fallback; external/stdlib targets skipped unless configured.
analyzers/database/, analyzers/infrastructure/ Reserved namespaces for future analyzers.

Analysis levels

AnalysisLevel lets callers trade detail for speed:

PACKAGES_ONLY  <  FILES  <  CLASSES  <  FUNCTIONS  <  FULL
  • PACKAGES_ONLY / FILES — coarse structural graph
  • CLASSES / FUNCTIONS — nested classes, functions and methods
  • FULL — additionally captures parameters, return types and decorators

Tests

27 tests covering source discovery, module-path mapping, level gating, class/function extraction, and absolute/relative import resolution. All passing locally (pytest tests/analyzers/).

Notes for reviewers

  • Pure standard library — no new runtime dependencies.
  • database/ and infrastructure/ ship as empty reserved packages (placeholders for upcoming analyzers).
  • In-code docstrings/comments are currently in Finnish; happy to translate them to English in this PR or a follow-up if preferred for the public API surface.

@villelaitila villelaitila force-pushed the feature/code-analysis-for-python branch from 28aafcd to d5ec0a9 Compare June 1, 2026 15:02
Introduce sgraph.analyzers, a built-in analysis layer that turns a Python
project directory into an SGraph model directly, without an external analyzer.

Public API:

    from sgraph.analyzers import analyze_python
    result = analyze_python("./src")
    result.graph.to_xml("model.xml")

Implementation technique
- Pure standard-library parsing via the `ast` module — no third-party Python
  parser. Each file is parsed with `ast.parse` and walked by an
  `ast.NodeVisitor` subclass.
- Name/decorator/annotation extraction uses structural pattern matching
  (match/case) over AST expression nodes.
- Two-phase pipeline: (1) discover + parse every file, build the element tree
  and register modules; (2) resolve collected import statements into
  dependency edges once the whole module registry is known. Deferring import
  resolution to phase 2 lets forward references and cross-module imports
  resolve regardless of file processing order.

Architecture
- base.py — shared, framework-agnostic types: AnalyzerConfig (root path,
  include/exclude globs, external-import + stdlib toggles), AnalysisResult
  (graph + errors + stats + summary), AnalysisError, SourceLocation, and the
  AnalysisLevel enum (PACKAGES_ONLY < FILES < CLASSES < FUNCTIONS < FULL) that
  controls how deep the model is built.
- code/base.py — language-agnostic source-file plumbing: glob-based
  discovery with exclude filtering, file path -> module path mapping, and
  encoding-tolerant reading (utf-8 with latin-1 fallback).
- code/python/ — the Python implementation:
    - python_analyzer.py orchestrates the two phases and maps the filesystem
      into SElements (packages from __init__.py, modules from files, nested
      classes/functions/methods), then builds deduplicated import/from_import
      SElementAssociations.
    - ast_visitor.py creates SElements per scope according to AnalysisLevel
      and records decorators, parameters and return-type annotations at FULL.
    - import_resolver.py resolves both absolute and relative imports
      (dot-level aware) against the module registry with parent-module
      fallback; external/stdlib targets are skipped unless configured.
- database/ and infrastructure/ are reserved namespaces for future analyzers.

Levels let callers trade detail for speed, from a coarse package/file graph up
to a full model with classes, functions, parameters and decorators.

Tests: 27 tests covering source discovery, module-path mapping, level
gating, class/function extraction, and absolute/relative import resolution.
@villelaitila villelaitila force-pushed the feature/code-analysis-for-python branch from d5ec0a9 to f5e847e Compare June 1, 2026 15:53
@softagram-bot
Copy link
Copy Markdown

Softagram Impact Report for pull/157 (head commit: f5e847e)

TL;DR Changed code files: 15 | Directly impacted code files: 0

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

⭐ Details of Dependency Changes

details of dependency changes - click for full size
(Open in Softagram Desktop for full details)

[]

📄 Full report

Impact Report explained. Give feedback on this report to support@softagram.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants