
v4.0 API (GA) – A unified, stable REST API with updated SDKs across Python, .NET, Java, etc., improving consistency and performance.
Improved Read & Layout Models – Better OCR accuracy, hierarchical document structure, and support for generating searchable PDFs.
Enhanced Prebuilt Models – Stronger extraction for invoices, receipts, bank statements, tax forms, pay stubs, and other business documents.
Advanced Classification & Model Compose – Automatically classify documents, split multi-document PDFs, and route them to the right extraction model.
Batch Processing & Containers – Batch APIs now support all models, plus containerized Read/Layout models for on-prem or hybrid deployments. Review collected by and hosted on G2.com.
Higher accuracy on complex layouts
Struggles still occur with heavily nested tables, multi-column PDFs, handwritten notes mixed with print, and low-quality scans.
Better multilingual & regional support
Accuracy for Indian regional languages, mixed-language documents, and non-Latin scripts can be improved, especially in tables.
Easier custom model training
Training custom models still requires careful labeling and data preparation; more low-code / auto-labeling capabilities would help. Review collected by and hosted on G2.com.




