168 PDFs. Zero Failures. One Pipeline.
Catalytic OCR transforms scanned vendor invoices into structured, queryable data. Built for real-world document reconciliation — not demos. This is production OCR on local compute, no cloud APIs, no per-page fees.
01
Scan vendor invoice PDFs (scanned images, no text layer)
poppler-utils
02
Render pages to PNG at 300dpi
pdftoppm
03
OCR each page to structured text
tesseract 5.5.2
04
Parse invoice headers: vendor, date, invoice#, PO#, total
Python stdlib
05
Extract line items: part#, description, qty, cost
regex + heuristics
06
Load into SQLite with Tekmetric CSV cross-reference
SQLite 3
07
Reconcile vendor invoices vs system-of-record purchases
SQL views