Running OCR Validation

Using OCR Validation to improve the accuracy of your output file

PDF2XL Support avatar
Written by PDF2XL Support
Updated over a week ago

When you use the OCR, it may not be possible for the engine to accurately capture every character. Any bleeding, broken, or speckled text can affect how the OCR engine matches the characters to its pre-programmed character set.


For this reason, we have the Validation option.

When you run validation, the OCR checks any character that it doesn't recognize and asks you to verify that it's correct by clicking the "Accept" button, or make the necessary adjustments in the "Suggested word" field.

In this image example, the word in the document is VENDOR (1). This shows up with broken text in the Suspect Word section (2), and the result is that it is recognized as VFNPOR (3).


To correct it, type the proper word (VENDOR) in the Suggested word field and click "Accept" to save the changes. It will automatically move to the next suspect word.

If your document is poorly scanned or contains a good deal of bleeding/speckled text, you should be prepared for the OCR to try and validate most of the words.

For the easiest conversion, we recommend scan settings of 300 DPI, Black & White (if possible).

Did this answer your question?