There are a few distinct types of PDF documents:
These documents are usually created using Adobe Acrobat or a special printer driver that prints into a PDF file. These files contain actual text and are the easiest to convert with 100% accuracy.
Scanned documents, on the other hand, are created by scanning a hard copy (paper) document into the computer, and therefore contains only an image of the text. Often, this text could have a lot of broken or speckled text, which can make accuracy a little more difficult.
The OCR module attempts to scan the text inside the images so it can be converted to Excel properly.
When you open a scanned document, PDF2XL will usually recognize it and suggest to turn on OCR Mode. A message box will be displayed, saying that the document is scanned and that it will be displayed in OCR Mode.
You can also check a "Don't ask me again" box, which will make any scanned document you open in the future use OCR Mode automatically without notifying you.
If you are using the PDF2XL Home edition, you should see a pop-up that lets you know the file cannot be converted with this version.
If you're unsure if your PDF is scanned or not, you can send it to [email protected] so our team can test it for you.
To learn more about converting a scanned PDF file, click here.
A PDF file may contain embedded fonts.
This is a style of text that is stored within the PDF file. These fonts contain additional information that indicate how the font is shown on the file, and can contain hidden data within.
Due to this, a document with embedded fonts often shows random characters when you attempt to convert it.
To convert a file with Embedded Text, you need to use the OCR functionality and treat it as a image file.
To learn more about converting a file with embedded text, click here.