See the release notes for details on the latest changes. OCRmyPDF uses Tesseract for OCR, and relies on its language packs. For Linux users, you can often find packages that provide language packs: ...
This project is a Python pipeline that uses Optical Character Recognition (OCR) to extract text and structured data from scanned PDF documents. It processes each page, cleans the recognized text, ...