The PDF2XL Business and Enterprise editions both come equipped with an OCR engine that can convert scanned files.
*Note that the Business plan has a single basic OCR engine, while the Enterprise plan offers the basic OCR engine, plus the option of selecting a more advanced OCR engine with additional features.
For the best result, we recommend scan settings of 300 DPI, Black & White (if possible). Note that the accuracy of the conversion is dependent on the quality of your document.
Converting a Scanned PDF FiIe
- As usual, open your PDF using one of the "Open File" options.
- If the document has not been previously run through an OCR engine, you will see a prompt notifying you that this is a scanned PDF and OCR Mode is going into effect. Just click "OK" and let it do its thing.
- If your page is rotated, use the rotation buttons on the Source pane toolbar.
- Create your layout, splitting and merging columns where necessary.
- When you are ready to convert, make sure you have selected the number of pages you want to convert in the Convert Pages field of the "Convert" menu. By default, this is set to convert all the pages in your document.
- Click the "Convert Document" button.
- You will see a prompt asking you to validate your document. This is optional, but it allows you to correct any errors before you convert it.
- When you agree to validate, another prompt will appear. This one grabs the words that the OCR does not recognize perfectly. If the word in the "Suggested word" field is correct, you can click "Accept". If it is incorrect, you can type in the correct word and "Accept". You can click "Done" at any time to close the validation prompt. Note that a poorly scanned document can result in the majority of the data being suspected.
- Once validation is complete, the application will complete the conversion.
Enterprise "Advanced OCR" Features
If you are subscribed to the Enterprise plan, you will have an additional OCR engine to help with a better result.
To use the following tools, please be sure to select "Use Advanced OCR" from the Options menu.
1) If your document is just a little skewed, the Enterprise plan has Fine Rotation options in the "OCR" menu when Advanced OCR is selected.
2) If your document is in another language, you can select it from the dropdown or use the "Add Languages" button to import it.
3) The OCR Tweaking settings at the bottom allow you to adjust how the OCR recognizes the data.
- Threshold is used when the page is too light or too dark.
- Despeckle is used if there happens to be a lot of noise on the page.
- Remove Lines will try to clear out any vertical and horizontal lines.
- Force DPI will affect the dots per inch to try and provide more clarity.
Since these settings are based entirely on the quality of your document, there is no recommended adjustment - you simply need to play with it until you have the best possible outcome.
If your conversion isn't accurate and you've done all you can using the OCR tools, there are a few things you can look at to determine why you ended up with this result.
Open the PDF in Adobe Acrobat Reader and view it at high zoom (800-1000%)
- Is the background clean?
- Are there different background colors?
- Are there pixels around the text?
- Are there any watermarks or handwritten text?
- Any vertical or horizontal dividers in the table?
- Any language that is not European? The current OCR can only recognize Latin text. (Go here to see a list of supported languages)
Any of the above issues can affect your result.