Document Processing Workflow Cheatsheet
Performing OCR on Images and/or PDFs
Combine Images and Convert to PDF
combine image-01.png image-02.png output.pdf
OCR a PDF File
ocrmypdf -l eng+ben input.pdf output.pdf
-l: Language flag, multiple may be set.
--force-ocr: May be necessary for wrongly-encoded PDF source files, as is often the case with Bengali documents.
A number of image processing commands may also be used:
--deskew: will correct pages were scanned at a skewed angle by rotating them back into place.
--remove-background: attempts to detect and remove a noisy background from grayscale or color images. Monochrome images are ignored. This should not be used on documents that contain color photos as it may remove them.
Click here for full documentation on ocrmypdf.