All Collections
Using PDF2XL
PDF2XL OCR
Converting a Scanned PDF File
Converting a Scanned PDF File

Information and tips to help get the best possible output from a scanned PDF file.

PDF2XL Support avatar
Written by PDF2XL Support
Updated over a week ago

The PDF2XL Business and Enterprise plans, and PDF2XL Pro each come equipped with an OCR engine that can convert scanned files.


*Note:

  • The Business plan and PDF2XL Pro have a single basic OCR engine

  • The Enterprise plan offers the basic OCR engine, plus the option of selecting a more advanced OCR engine that may be better at converting your file accurately.


For the best result, we recommend scan settings of 300 DPI, Black & White (if possible). Note that the accuracy of the conversion is dependent on the quality of your document.

! We cannot guarantee accuracy on any scanned PDF file !


Converting a Scanned PDF FiIe

  1. As usual, open your PDF using one of the "Open File" options.

    • You may see a pop-up that is engaging the OCR. Just let it do its thing - it can take a few seconds longer, depending on the size of your PDF file.

    • If the OCR does not automatically engage, you can manually engage it by going to the OCR tab and clicking the Start button, but note these possible causes:

      1. The document may have been previously OCR'd (this does not mean that a previously OCR'd document will not engage the OCR again).

      2. You have the setting disabled. Go to your options > OCR, and make sure you've selected "Automatically OCR scanned PDF files":

  2. If your page is rotated, use the rotation buttons on the Source pane toolbar.

  3. Create your layout, splitting and merging columns where necessary.

  4. When you are ready to convert, make sure you have selected the number of pages you want to convert in the Convert Pages field of the "Convert" menu.

    • By default, this is set to convert all the pages in your document.

  5. Click the "Convert Document" button.

  6. You should see a prompt asking you to validate your document. This is optional, but it allows you to correct any errors before you convert it.

    • When you agree to validate, another prompt will appear. This one grabs the words that the OCR does not recognize perfectly.

      • If the word in the "Suggested word" field is correct, you can click "Accept".

      • If it is incorrect, you can type in the correct word and "Accept".

      • You can click "Done" at any time to close the validation prompt.

      • Note that a poorly scanned document can result in the majority of the data being suspected.

    • If the prompt does not automatically appear, you can manually select it from your OCR tab.

  7. Once validation is complete, the application will complete the conversion.


Enterprise "Advanced OCR" Features

If you are subscribed to the Enterprise plan, you will have an additional OCR engine to help with a better result.
To use the following tools, please be sure to select "Use Advanced OCR" from the Options menu.

  1. If your document is just a little skewed, the Enterprise plan has Fine Rotation options in the "OCR" menu when Advanced OCR is selected.

  2. If your document is in another language, you can select it from the dropdown or use the "Add Languages" button to import it.

  3. Force DPI will affect the dots per inch to try and provide more clarity.


Default OCR

The OCR Tweaking settings are a feature of the default OCR and can be found at the bottom of the screen. These allow you to adjust how the OCR recognizes the data. 

  • Threshold is used when the page is too light or too dark. 

  • Despeckle is used if there happens to be a lot of noise on the page.

  • Remove Lines will try to clear out any vertical and horizontal lines.

Since these settings are based entirely on the quality of your document, there is no recommended adjustment - you simply need to play with it until you have the best possible outcome.


Poor Conversion

If your conversion isn't accurate and you've done all you can using the OCR tools, there are a few things you can look at to determine why you ended up with this result.

Open the PDF in Adobe Acrobat Reader and view it at high zoom (800-1000%)

  • Is the background clean?

  • Are there different background colors?

  • Are there pixels around the text?

  • Are there any watermarks or handwritten text?

  • Any vertical or horizontal dividers in the table?

  • Any language that is not European? The current OCR can only recognize Latin text. (Go here to see a list of supported languages)

Any of the above issues can affect your result.

Did this answer your question?