All Collections
Getting Started Guide
Native vs. Scanned PDF Files
Native vs. Scanned PDF Files
PDF2XL Support avatar
Written by PDF2XL Support
Updated over a week ago

Scroll through each section to learn ...

  • the difference between a Native and Scanned PDF

  • how to use the two OCR engines integrated into PDF2XL

  • what to do when you have poor conversion results on a scanned PDF

The first thing to consider before attempting your conversion is what type of PDF file you have

A Native PDF file is the easiest to convert, has a 100% accuracy rate, and requires no manipulation for character recognition.

A Scanned PDF file is a little more complex. It will require an OCR engine to convert and depends a lot on the scan quality of the PDF file. A PDF that has been scanned at a high resolution with little to no background noise or speckling will result in a more accurate conversion.

When you open your PDF file in PDF2XL, it will let you know when your file is scanned by displaying the following dialog:

*Note that if you are running the PDF2XL Home edition, it will be necessary to upgrade to Business or Enterprise to complete your conversion.

Clicking the “OK” button will automatically start the OCR engine for you.
If you ever need to engage the OCR manually, you can do so by going to the OCR tab and clicking the “Start” button.

Regular or Advanced OCR?

PDF2XL has two OCR engines installed. The default “regular” OCR engine can easily handle high quality PDF files without additional manipulation, but does include additional OCR Tweaking Settings to help improve results with low quality documents.

If, however, you have a poorly scanned PDF file, or the default OCR isn’t properly recognizing your data, you can switch to the “advanced” OCR engine, which is only available in the Enterprise plan. 

To do so, go to “OCR Options” on the OCR tab:

Place a check mark in the “Use advanced OCR” selection box.

Once you click “OK”, the program will ask to restart and the advanced OCR engine will be engaged when the program reopens.

If you are still getting poor results from your conversion, the default OCR allows you to make some adjustments in the “OCR Tweaking” section of the OCR options.

Every PDF will have very specific settings, so you will need to keep adjusting these sliders until you see the best possible result.
Bear in mind that some PDF files are so poorly scanned, that no setting seems to help. In these extreme cases, we recommend that you try to obtain a copy with a better scan resolution.

Did this answer your question?