Choosing PDF Conversion Models in Fido AI

Choosing PDF Conversion Models in Fido AI

Fido supports a variety of PDF conversion tools, both locally hosted solutions and cloud services. The list below provides guidance on some of the conversion processes to help you with your PDF conversion.

MuPDF is a fast, open-source library for working with PDFs. It processes files locally, so it doesn’t need an internet connection or an API key, which makes it a secure and private option. MuPDF is excellent at extracting text and images from well-structured PDFs while preserving their original formatting, including headings, paragraphs, lists, and tables.

Suggestion: use MuPDF when you have searchable PDF files that you know are well structured.

Adobe Acrobat Pro uses a licensed copy installed on your Windows computer to convert the PDF. Fido adds the page markup as specified in the Fido dialog. Math equations are converted to images.

Suggestion: use Acrobat Pro when you have a well-structured PDF that you wish to convert with page numbering.

Mistral OCR is a paid online service that uses a specialized AI model for Optical Character Recognition (OCR), converting PDFs into structured data. It performs well with a variety of PDFs, including those that are image-only. There is a generous free allowance and costs for additional pages are very reasonable.

Suggestion: use Mistral OCR when you are not content with the results from MuPDF.

Marker is a pipeline of technologies for converting PDFs into structured documents. While it’s available as an open-source model, Fido can also use the paid for service hosted by Datalab. A key component of Marker is the Surya OCR model, which achieves superior results for text extraction with some scripts compared to other conversions. Additionally, Marker can optionally use large language models (LLMs) to further improve the quality of the converted document.

Suggestion: use Marker when Mistral’s OCR results are poor.

Google Gemini and Anthropic Claude are different to the OCR services others, in that a prompt is used. Beyond simply asking to convert the PDF to markdown (which can then be further processed by Fido) the prompt can be edited to adjust what is returned. For example, to exclude reference numbers and footnotes, or to indicate the presence of textboxes.

Suggestion: if you want to modify the content as it is converted then experiment with Gemini and Claude.

Mathpix is an online AI service that specializes in the accurate conversion of STEM content, which includes complex math, chemistry, and tables. While Mistral and Marker also handle some scientific content, Mathpix is specifically designed for this purpose and is often considered a leader in the field for its accuracy.

Suggestion: If your content has scientific expressions, use Mathpix to check whether you get more reliable results than Mistral or Marker.

Tensorlake is a later addition to Fido. It shows superior structure and data extraction performance in public benchmarks (OCRBench, OmniDocBench) so we’re interested to learn if it offers better results across real-world PDFs converted by DAISY members.

Suggestion: use Tensorlake when want to experiment with a new model.

Doctly is a paid online AI service that specializes in processing difficult documents. It uses an intelligent, multi-model approach to accurately parse complex PDFs, including those with mixed content, multi-column layouts, and poorly scanned pages. Doctly does not extract the images from the PDF.

Suggestion: try Doctly when other services like MuPDF, Mistral, and Marker struggle with complex, non-standard documents.

PaddleOCR is an opensource solution that has garnered impressive reviews and test scores. Whilst it can be locally hosted, Fido connects to a hosted service. At the time of writing it was possible to sign up for a free account. The data retention policy for this service is unclear.

Suggestion: try PaddleOCR to check out one of the latest models without charge.

Tags: Fido / PDF