Pyntagma
Welcome to the documentation for the pyntagma
package!
Pyntagma is a Python library for creating and managing complex data extraction pipelines with ease. Its name is derived from the Greek word 'Syntagma', meaning 'composition', symbolizing that this package fits for semi-structured documents.
Pyntagma aims to bring modern document-processing tools together into a single, standardized, and convenient library. It lets practitioners and researchers compose precise, testable rules to extract complex data from large archives.
Installation
Install Pyntagma using:
pip install pyntagma
Features
- Structured PDF parsing with clear geometry
- Composable algebra on positions and regions
- Bidirectional navigation (pages ⇄ lines ⇄ words ⇄ chars)
- Multimodal AI integration for crop-aware prompts
Get started with the Overview, then see Concepts for details on the difference between algebra and bidirectional navigation, and AI Tools for model-assisted workflows.