Sparrow is an open-source Python library and API system for extracting structured data from documents using Vision Language Models (VLMs). It processes invoices, receipts, forms, bank statements, tables, and other document types to return structured JSON data. Beyond document extraction, Sparrow supports custom text instruction processing for any AI task including data analysis, summarization, decision making, and general text processing workflows.
Sparrow extracts text and data from images (PNG, JPG) and multi-page PDFs using JSON schema validation to ensure accurate data extraction. The system can process complex documents including tables, forms, and multi-page financial reports. Additionally, it handles custom instruction-based requests such as arithmetic operations, text analysis, content summarization, and other AI-powered tasks without requiring document input.
The platform offers multiple backend support including MLX for Apple Silicon, Ollama, vLLM, PyTorch, and Hugging Face Cloud GPU. It provides RESTful API endpoints for integration, an interactive web interface for document upload and processing, a command-line interface for batch processing, and includes a built-in analytics dashboard with workflow monitoring.
Sparrow features a pluggable pipeline system with Sparrow Parse for vision processing, Sparrow Instructor for text instruction processing, and Sparrow Agents for complex workflows. The system uses schema-based extraction with automatic validation and offers on-device processing capability for data privacy. Agent-based workflow orchestration includes visual monitoring powered by Prefect.
Common use cases include automated invoice and receipt processing, financial document analysis, form data extraction, table structure recognition, custom AI instruction processing, text analysis and summarization, and multi-step document processing workflows. Sparrow runs on Python 3.10+ and offers both open-source (GPL 3.0) and commercial licensing options.