stubpy.ast_pass

stubpy.ast_pass

AST pre-pass — harvests structural metadata from source without executing the module.

This module runs a read-only walk over the source file’s AST before (or instead of) importing the module. Because no code is executed, this pass is free from import-time side effects.

The harvested data is stored in ASTSymbols and fed into build_symbol_table() to construct the SymbolTable.

What is harvested

  • Classes — name, source line, base class expressions (as strings), decorator names, and directly-defined methods.

  • Module-level functions — name, line, async flag, decorator names, and a flag for @overload-decorated variants.

  • Annotated variablesname: Type = value at module scope.

  • ``__all__`` — the explicit public API list, when present.

  • Type alias declarations (all forms):

    • Name: TypeAlias = <rhs> — explicit PEP 613 annotation

    • Name = int | float — bare PEP 604 union

    • Name = Union[str, int] — subscripted generic

    • Name = int — known built-in or typing type name

    • type Name = <rhs> — Python 3.12+ PEP 695 soft keyword

    • type Stack[T] = list[T] — generic alias (PEP 695)

  • TypeVar / ParamSpec / TypeVarTuple / NewType call-expression declarations.

Ignore directive

If the source file begins (before any code) with a comment containing # stubpy: ignore (case-insensitive), the harvester returns an empty ASTSymbols and the caller should skip stub generation for that file. Check ASTSymbols.skip_file to detect this.

What is not harvested

  • Nested functions or classes inside other functions.

  • Import statements (handled by stubpy.imports).

  • Runtime values — those require the module to be executed.

Examples

>>> from stubpy.ast_pass import ast_harvest
>>> syms = ast_harvest("x: int = 1\nclass Foo: pass\n")
>>> syms.variables[0].name
'x'
>>> syms.classes[0].name
'Foo'
ast_harvest(source: str) ASTSymbols[source]

Parse source and return structural metadata without executing any code.

This is the main entry point for the AST pre-pass stage. A fresh ASTHarvester is created for each call, making this function fully re-entrant.

Parameters:

source (str) – Raw Python source text.

Returns:

ASTSymbols – Populated container of all harvested metadata. On a SyntaxError the result will be empty but valid — no exception is raised.

Examples

>>> syms = ast_harvest("")
>>> syms.classes
[]
>>> syms = ast_harvest("class Foo(Bar): pass")
>>> syms.classes[0].name, syms.classes[0].bases
('Foo', ['Bar'])
>>> syms = ast_harvest("async def fetch(url: str) -> None: ...")
>>> fn = syms.functions[0]
>>> fn.is_async, fn.name
(True, 'fetch')
>>> syms = ast_harvest("X = TypeVar('X')")
>>> syms.typevar_decls[0].kind
'TypeVar'
class ASTSymbols(classes: list[ClassInfo] = <factory>, functions: list[FunctionInfo] = <factory>, variables: list[VariableInfo] = <factory>, typevar_decls: list[TypeVarInfo] = <factory>, all_exports: list[str] | None = None, skip_file: bool = False)[source]

Bases: object

Container for all metadata harvested from a single source file’s AST.

Created by ast_harvest() and consumed by build_symbol_table().

classes

All top-level class definitions, in source order.

Type:

list of ClassInfo

functions

All top-level function definitions, in source order.

Type:

list of FunctionInfo

variables

All top-level annotated (and plain) variable assignments.

Type:

list of VariableInfo

typevar_decls

TypeVar / ParamSpec / TypeVarTuple / TypeAlias / NewType declarations.

Type:

list of TypeVarInfo

all_exports

Contents of __all__, or None when the module has no __all__ declaration.

Type:

list of str or None

classes: list[ClassInfo]
functions: list[FunctionInfo]
variables: list[VariableInfo]
typevar_decls: list[TypeVarInfo]
all_exports: list[str] | None = None
skip_file: bool = False
__init__(classes: list[ClassInfo] = <factory>, functions: list[FunctionInfo] = <factory>, variables: list[VariableInfo] = <factory>, typevar_decls: list[TypeVarInfo] = <factory>, all_exports: list[str] | None = None, skip_file: bool = False) None
class ASTHarvester(source: str)[source]

Bases: NodeVisitor

Walk the top-level AST of a Python source file and collect structural metadata without executing any code.

Only top-level definitions are collected (class/function/variable statements that are direct children of the module body). Statements nested inside if, with, or try blocks at the module level are visited transitively so that patterns like if TYPE_CHECKING: ... are still partially harvested.

Parameters:

source (str) – Raw Python source text.

Examples

>>> h = ASTHarvester("async def foo(): pass")
>>> syms = h.harvest()
>>> syms.functions[0].is_async
True
__init__(source: str) None[source]
harvest() ASTSymbols[source]

Parse the source and return the populated ASTSymbols.

Returns an empty (but valid) ASTSymbols on SyntaxError without raising.

If the source begins (before any code) with a # stubpy: ignore comment, skip_file is set to True and the returned ASTSymbols is otherwise empty.

visit_ClassDef(node: ClassDef) None[source]

Harvest a class definition and its directly-defined methods.

visit_FunctionDef(node: FunctionDef) None[source]

Harvest a top-level synchronous function.

visit_AsyncFunctionDef(node: AsyncFunctionDef) None[source]

Harvest a top-level asynchronous function.

visit_Assign(node: Assign) None[source]

Handle:

  1. __all__ = [...] — populates all_exports.

  2. X = TypeVar(...) / X = NewType(...) — explicit TypeVar declarations.

  3. X = int | float / X = Union[int, str] — implicit TypeAlias (bare union or subscripted generic RHS without an annotation).

  4. Plain name = value assignments — recorded as VariableInfo.

visit_TypeAlias(node: AST) None[source]

Handle Python 3.12+ type Name = ... soft-keyword statement (PEP 695).

The AST node is ast.TypeAlias (available from Python 3.12). We access fields by attribute so the code compiles on Python 3.10/3.11 where the class does not exist but the method will never be called.

Examples

The following source:

type Vector = list[float]

produces a TypeVarInfo with kind="TypeAlias" and source_str="list[float"].

visit_AnnAssign(node: AnnAssign) None[source]

Handle annotated assignments: * name: TypeAlias = int | strTypeVarInfo * name: Type = valueVariableInfo

visit_If(node: If) None[source]

Recurse into if/else bodies (handles if TYPE_CHECKING: blocks).

visit_Try(node: Try) None[source]

Recurse into try/except/else/finally bodies.

visit_TryStar(node: AST) None[source]
visit_With(node: With) None[source]

Recurse into with blocks.

generic_visit(node: AST) None[source]

Called if no explicit visitor function exists for a node.

Data containers

class FunctionInfo(name: str, lineno: int, is_async: bool = False, decorators: list[str] = <factory>, is_overload: bool = False, raw_arg_annotations: dict[str, str]=<factory>, raw_return_annotation: str | None = None, kwargs_forwarded_to: list[str] = <factory>, args_forwarded_to: list[str] = <factory>)[source]

Bases: object

Metadata for a single function or method definition from the AST.

Parameters:
  • name (str)

  • lineno (int)

  • is_async (bool) – True for async def definitions.

  • decorators (list of str) – Plain names of all decorators (e.g. ["classmethod"]).

  • is_overload (bool) – True when overload appears in decorators.

  • raw_arg_annotations (dict) – Maps parameter name → unparsed annotation string for every annotated parameter. Variadic names are prefixed: "*args", "**kwargs".

  • raw_return_annotation (str or None) – Unparsed return-annotation string, or None when absent.

  • kwargs_forwarded_to (list of str) – Names of callables to which **kwargs is forwarded in the body. Populated by the body scanner in ASTHarvester._harvest_function(). Used by resolve_function_params() to expand variadic parameters into their concrete counterparts.

  • args_forwarded_to (list of str) – Names of callables to which *args is forwarded in the body. Same purpose as kwargs_forwarded_to for positional variadics.

Examples

>>> info = FunctionInfo(name="greet", lineno=5, is_async=False)
>>> info.is_overload
False
>>> info.kwargs_forwarded_to
[]
name: str
lineno: int
is_async: bool = False
decorators: list[str]
is_overload: bool = False
raw_arg_annotations: dict[str, str]
raw_return_annotation: str | None = None
kwargs_forwarded_to: list[str]
args_forwarded_to: list[str]
__init__(name: str, lineno: int, is_async: bool = False, decorators: list[str] = <factory>, is_overload: bool = False, raw_arg_annotations: dict[str, str]=<factory>, raw_return_annotation: str | None = None, kwargs_forwarded_to: list[str] = <factory>, args_forwarded_to: list[str] = <factory>) None
class ClassInfo(name: str, lineno: int, bases: list[str] = <factory>, decorators: list[str] = <factory>, methods: list[FunctionInfo] = <factory>)[source]

Bases: object

Metadata for a single class definition from the AST.

Parameters:
  • name (str)

  • lineno (int)

  • bases (list of str) – Base class expressions as unparsed strings (e.g. ["Element"]).

  • decorators (list of str) – Plain decorator names.

  • methods (list of FunctionInfo) – Methods defined directly in the class body.

Examples

>>> info = ClassInfo(name="Widget", lineno=10, bases=["Element"])
>>> info.decorators
[]
name: str
lineno: int
bases: list[str]
decorators: list[str]
methods: list[FunctionInfo]
__init__(name: str, lineno: int, bases: list[str] = <factory>, decorators: list[str] = <factory>, methods: list[FunctionInfo] = <factory>) None
class VariableInfo(name: str, lineno: int, annotation_str: str | None = None, value_repr: str | None = None)[source]

Bases: object

Metadata for a module-level variable assignment.

Covers both annotated assignments (name: Type = value) and plain assignments without annotations (name = value).

Parameters:
  • name (str)

  • lineno (int)

  • annotation_str (str or None) – Unparsed annotation expression, or None for unannotated assignments.

  • value_repr (str or None) – Unparsed right-hand side expression, or None when absent.

name: str
lineno: int
annotation_str: str | None = None
value_repr: str | None = None
__init__(name: str, lineno: int, annotation_str: str | None = None, value_repr: str | None = None) None
class TypeVarInfo(name: str, lineno: int, kind: str, source_str: str)[source]

Bases: object

Metadata for a TypeVar, ParamSpec, TypeVarTuple, TypeAlias, or NewType declaration.

Parameters:
  • name (str)

  • lineno (int)

  • kind (str) – One of "TypeVar", "ParamSpec", "TypeVarTuple", "TypeAlias", "NewType".

  • source_str (str) – Unparsed right-hand side expression (for TypeVar/NewType) or the aliased type expression (for TypeAlias).

name: str
lineno: int
kind: str
source_str: str
__init__(name: str, lineno: int, kind: str, source_str: str) None

Variadic forwarding detection

_harvest_function() walks every function body and records call sites where the function’s own **kwargs or *args is spread into another callable. Results are stored in two fields on FunctionInfo:

These lists are consumed at emission time by resolve_function_params(). The scan runs for both top-level functions and class methods (including @classmethod bodies where the cls(...) forwarding pattern is detected as the special "cls" target).

Type alias detection

The harvester recognises type alias declarations in four forms:

  1. Explicit annotationName: TypeAlias = <rhs>

  2. PEP 604 bare unionName = int | float

  3. Subscripted genericName = Union[str, int], Name = list[int]

  4. Known type nameName = int, Name = str, Name = Any

  5. Python 3.12+ PEP 695type Name = <rhs>, type Stack[T] = list[T]

All five forms are stored as TypeVarInfo with kind="TypeAlias" and emitted via generate_alias_stub().

Assignments where the RHS is an arbitrary user-defined name (MyAlias = SomeClass) are not promoted — the harvester cannot determine at parse time whether SomeClass is a type or a runtime value. Use MyAlias: TypeAlias = SomeClass for unambiguous declaration.

The # stubpy: ignore directive

A source file that begins with # stubpy: ignore (case-insensitive, before any code) will have skip_file set to True. The generator detects this and skips emission, writing only a minimal stub. Subsequent comments and blank lines before the first code statement are also accepted:

# Auto-generated file — do not stub.
# stubpy: ignore
...