OCR Preparation Guide
- 14 Aug 2025
- 1 Minute to read
- Print
- DarkLight
OCR Preparation Guide
- Updated on 14 Aug 2025
- 1 Minute to read
- Print
- DarkLight
Article summary
Did you find this summary helpful?
Thank you for your feedback!
OCR Preparation Guide
This guide serves to help users prepare their PDF and Word documents for OCR (Optical Character Recognition) within CobbleStone Software. Following these steps helps to provide better accuracy and fewer errors during processing.
PDF Document Checklist for OCR
- Scan at High Resolution - Use 300 DPI for best text recognition.
- Use Clear, Straight Scans - Avoid skewed, rotated, or blurry pages.
- Ensure the PDF Contains Images - OCR works on image-based PDFs (not text-based or digitally created ones).
- Avoid Password Protection - Remove any encryption or password from the PDF.
- Use Standard Image Formats - Embedded images should be JPEG, PNG, or TIFF.
- Keep File Size Reasonable - Large files may slow down or fail OCR processing.
- Avoid Annotations or Overlays - Remove sticky notes, highlights, or stamps that may interfere with text detection.
Word Document Checklist for OCR
- Use Common Fonts - Stick to fonts like Arial, Times New Roman, or Calibri.
- Embed Fonts in the Document - In Word: File > Options > Save > Embed fonts in the file.
- Avoid Complex Layouts - Tables, columns, and floating images can confuse OCR engines.
- Check for Missing Text or Placeholders - Ensure all fields are filled and content is visible before conversion.
- Avoid Watermarks and Backgrounds - These can interfere with text recognition.
- Use High Contrast - Black text on white background is ideal.
Was this article helpful?