CATALYTIC OCR — DOCUMENT INTELLIGENCE PIPELINE LIVE

168 PDFs. Zero Failures. One Pipeline.

Catalytic OCR transforms scanned vendor invoices into structured, queryable data. Built for real-world document reconciliation — not demos. This is production OCR on local compute, no cloud APIs, no per-page fees.

Documents Processed
168
PDF invoices, 23 vendors
OCR Success Rate
100%
0 failures, 300dpi tesseract
Data Extracted
3,225
records across 4 tables
Pipeline
01 Scan vendor invoice PDFs (scanned images, no text layer) poppler-utils
02 Render pages to PNG at 300dpi pdftoppm
03 OCR each page to structured text tesseract 5.5.2
04 Parse invoice headers: vendor, date, invoice#, PO#, total Python stdlib
05 Extract line items: part#, description, qty, cost regex + heuristics
06 Load into SQLite with Tekmetric CSV cross-reference SQLite 3
07 Reconcile vendor invoices vs system-of-record purchases SQL views
Publications
Temple Parts Reconciliation — March 2026 LIVE
6 tabs · 20 vendors · $50K gross · 50.8% GP · 146 OCR'd PDFs · Sieve format