API Reference

This section is auto-generated from the source code using mkdocstrings.

Package

Pyntagma: PDF document processing package.

Document

Bases: BaseModel

A document consisting of multiple PDF files.

n_pages `cached` `property`

n_pages

Get the number of pages in the document.

Line

Bases: TextAnchor

words `cached` `property`

words

Extract words from the line.

chars `cached` `property`

chars

Extract chars from the line.

Page

Bases: BaseModel

A single page within a Document.

Indices are available both relative to the file (file_page_number) and relative to the full document (page_number).

im `property`

im

Get the image of the page.

plot_on

plot_on(items, colors, **kwargs)

Plot the page on the given items.

Word

Bases: TextAnchor

line `cached` `property`

line

Find the line that contains the word.

chars `property`

chars

Extract chars from the word.

Crop

Bases: BaseModel

Represents a rectangular crop on a PDF page and utilities to render it.

im `property`

im

Return the PIL image for this crop (respects padding/resolution).

buffer `property`

buffer

Get the cropped image as a BytesIO object.

bytes `property`

bytes

Get the cropped image as bytes.

save

save(path=Path('crop.png'))

Save the cropped image to a file.

HorizontalCoordinate

Bases: BaseModel

A horizontal coordinate (x) bound to a specific page.

relative `property`

relative

Return the x position normalized to page width (0..1).

page_number `property`

page_number

Return the document-level page index for this coordinate.

shift

shift(delta)

Return a new coordinate shifted by delta within page bounds.

HorizontalPosition

Bases: BaseModel

Horizontal span (x0..x1) on a page.

PdfAnchor

Bases: BaseModel

A base class for anchors in a PDF document.

position `property`

position

Get the position of the anchor.

horizontal `property`

horizontal

Get the horizontal position of the anchor.

vertical `property`

vertical

Get the vertical position of the anchor.

show `property`

show

Show this crop.

binary_content `property`

binary_content

Binary content of this anchor's crop for multimodal prompts.

Uses the cropped PNG bytes of the anchor's position.

plot_on_page

plot_on_page(color='red')

Plot this anchor on the page.

Position

Bases: BaseModel

A rectangular region on a page, expressed by four coordinates.

vertical `property`

vertical

Return the vertical span of this position.

horizontal `property`

horizontal

Return the horizontal span of this position.

crop `property`

crop

Return a Crop representing this position on a single page.

show `property`

show

Display the position as a crop.

bbox `property`

bbox

Get the bounding box of the position.

plot_on_page

plot_on_page(color='red')

Plot this position on its page and return the page image object.

contains

contains(other)

Return True if other is fully inside this position.

VerticalCoordinate

Bases: BaseModel

A vertical coordinate (y) bound to a specific page.

relative `property`

relative

Return the y position normalized to page height (0..1).

page_number `property`

page_number

Return the document-level page index for this coordinate.

shift

shift(delta)

Shift the vertical coordinate by a given delta.

VerticalPosition

Bases: BaseModel

Vertical span (top..bottom) on a page sequence.

silent_pdfplumber

silent_pdfplumber(path_or_fp, **kwargs)

Open a pdfplumber PDF while suppressing output within the block.

get_position

get_position(item)

Return a Position from an item or raise if unavailable.

left_position_join

left_position_join(x, y, after=True, uniquely=True, keep_empty_x=False, max_distance=None)

Bind two lists together based on their vertical positions.

position_union

position_union(items)

Return the minimal Position enclosing all items' positions.

Modules

Document Model

High-level document model built on top of pdfplumber.

Provides Document, Page and text primitives (Line, Word, Char) with geometric positions, plus helpers to navigate between them.

Document

Bases: BaseModel

A document consisting of multiple PDF files.

n_pages `cached` `property`

n_pages

Get the number of pages in the document.

Page

Bases: BaseModel

A single page within a Document.

Indices are available both relative to the file (file_page_number) and relative to the full document (page_number).

im `property`

im

Get the image of the page.

plot_on

plot_on(items, colors, **kwargs)

Plot the page on the given items.

TextAnchor

Bases: PdfAnchor

Base class for textual anchors with absolute coordinates on a page.

Word

Bases: TextAnchor

line `cached` `property`

line

Find the line that contains the word.

chars `property`

chars

Extract chars from the word.

Char

Bases: TextAnchor

word `property`

word

Find the word that contains the char.

line `property`

line

Find the line that contains the char.

Line

Bases: TextAnchor

words `cached` `property`

words

Extract words from the line.

chars `cached` `property`

chars

Extract chars from the line.

get_filelength `cached`

get_filelength(file)

Return the number of pages for a PDF file.

words_of_line

words_of_line(line)

Extract words from a line.

line_of_word

line_of_word(word)

Find the line that contains the word.

chars_of_word

chars_of_word(word)

Extract chars belonging to a given Word.

word_of_char

word_of_char(char)

Return the Word that contains the given Char.

Positions & Coordinates

Coordinate and position primitives for PDF pages.

Defines horizontal/vertical coordinates and composite positions with helpers for arithmetic, comparison and visualization.

VerticalCoordinate

Bases: BaseModel

A vertical coordinate (y) bound to a specific page.

relative `property`

relative

Return the y position normalized to page height (0..1).

page_number `property`

page_number

Return the document-level page index for this coordinate.

shift

shift(delta)

Shift the vertical coordinate by a given delta.

HorizontalCoordinate

Bases: BaseModel

A horizontal coordinate (x) bound to a specific page.

relative `property`

relative

Return the x position normalized to page width (0..1).

page_number `property`

page_number

Return the document-level page index for this coordinate.

shift

shift(delta)

Return a new coordinate shifted by delta within page bounds.

VerticalPosition

Bases: BaseModel

Vertical span (top..bottom) on a page sequence.

HorizontalPosition

Bases: BaseModel

Horizontal span (x0..x1) on a page.

Position

Bases: BaseModel

A rectangular region on a page, expressed by four coordinates.

vertical `property`

vertical

Return the vertical span of this position.

horizontal `property`

horizontal

Return the horizontal span of this position.

crop `property`

crop

Return a Crop representing this position on a single page.

show `property`

show

Display the position as a crop.

bbox `property`

bbox

Get the bounding box of the position.

plot_on_page

plot_on_page(color='red')

Plot this position on its page and return the page image object.

contains

contains(other)

Return True if other is fully inside this position.

PdfAnchor

Bases: BaseModel

A base class for anchors in a PDF document.

position `property`

position

Get the position of the anchor.

horizontal `property`

horizontal

Get the horizontal position of the anchor.

vertical `property`

vertical

Get the vertical position of the anchor.

show `property`

show

Show this crop.

binary_content `property`

binary_content

Binary content of this anchor's crop for multimodal prompts.

Uses the cropped PNG bytes of the anchor's position.

plot_on_page

plot_on_page(color='red')

Plot this anchor on the page.

get_position

get_position(item)

Return a Position from an item or raise if unavailable.

position_union

position_union(items)

Return the minimal Position enclosing all items' positions.

left_position_join

left_position_join(x, y, after=True, uniquely=True, keep_empty_x=False, max_distance=None)

Bind two lists together based on their vertical positions.

PDF IO Helpers

Thin wrappers around pdfplumber for quiet PDF IO and cropping.

Includes a silent context manager to suppress stdout/stderr and a Crop utility for extracting regions from a page as images/bytes.

Crop

Bases: BaseModel

Represents a rectangular crop on a PDF page and utilities to render it.

im `property`

im

Return the PIL image for this crop (respects padding/resolution).

buffer `property`

buffer

Get the cropped image as a BytesIO object.

bytes `property`

bytes

Get the cropped image as bytes.

save

save(path=Path('crop.png'))

Save the cropped image to a file.

silent

silent()

Temporarily silence stdout/stderr and lower logging during a block.

silent_pdfplumber

silent_pdfplumber(path_or_fp, **kwargs)

Open a pdfplumber PDF while suppressing output within the block.

Agent Utilities

Agent utilities for reasoning over PDF anchors.

This module wires PydanticAI's Agent to work with Pyntagma PDF anchors, optionally attaching cropped image bytes to prompts for multimodal models.

DocumentAgent

Bases: BaseModel

Small wrapper around a PydanticAI Agent bound to a PDF anchor.

Attaches an anchor crop as BinaryContent to the first user prompt when include_image=True (default on first run), enabling multimodal context.
Allows specifying an output_type which is wrapped with NativeOutput for certain models (e.g. Gemma on Ollama) to keep parsing consistent.

anchor_content `property`

anchor_content

Return the anchor's BinaryContent (PNG bytes of its crop).

model_post_init

model_post_init(_)

Create the underlying PydanticAI agent after model init.

run_sync

run_sync(user_prompt, anchor=None, output_type=None, include_image=None, **kwargs)

Run the agent synchronously with optional image context.

If include_image is True, append the anchor crop as BinaryContent to the prompt. When anchor is provided, that anchor is used; otherwise self.anchor is used.
user_prompt can be a string or a list of content items; the image is appended appropriately.

API Reference

Package

Document

n_pages cached property

Line

words cached property

chars cached property

Page

im property

plot_on

Word

line cached property

chars property

Crop

im property

buffer property

bytes property

save

HorizontalCoordinate

relative property

page_number property

shift

HorizontalPosition

PdfAnchor

position property

horizontal property

vertical property

show property

binary_content property

plot_on_page

Position

vertical property

horizontal property

crop property

show property

bbox property

plot_on_page

contains

VerticalCoordinate

relative property

page_number property

shift

VerticalPosition

silent_pdfplumber

get_position

left_position_join

position_union

Modules

Document Model

Document

n_pages cached property

Page

im property

plot_on

TextAnchor

Word

line cached property

chars property

Char

word property

line property

Line

words cached property

chars cached property

get_filelength cached

words_of_line

line_of_word

chars_of_word

word_of_char

Positions & Coordinates

VerticalCoordinate

relative property

page_number property

shift

HorizontalCoordinate

relative property

page_number property

shift

VerticalPosition

HorizontalPosition

n_pages `cached` `property`

words `cached` `property`

chars `cached` `property`

im `property`

line `cached` `property`

chars `property`

im `property`

buffer `property`

bytes `property`

relative `property`

page_number `property`

position `property`

horizontal `property`

vertical `property`

show `property`

binary_content `property`

vertical `property`

horizontal `property`

crop `property`

show `property`

bbox `property`

relative `property`

page_number `property`

n_pages `cached` `property`

im `property`

line `cached` `property`

chars `property`

word `property`

line `property`

words `cached` `property`

chars `cached` `property`

get_filelength `cached`

relative `property`

page_number `property`

relative `property`

page_number `property`

vertical `property`

horizontal `property`

crop `property`

show `property`

bbox `property`

position `property`

horizontal `property`

vertical `property`

show `property`

binary_content `property`

im `property`

buffer `property`

bytes `property`

anchor_content `property`