Converting Print to Digital

Converting Print to Digital

Printed material has to be converted to digital form before it can be distributed in accessible formats such as Braille, DAISY or EPUB.

Print materials can be converted to a Microsoft Word document using the process described below. Once the content is in Microsoft Word, it is easy to convert it to other accessible formats.

  1. Scanning: Creating a digital image of every page of the book)
  2. OCR: Optical Character Recognition (converting an image of the page to editable text and exporting it to Microsoft Word format)

Scanning

Scanning is the process of creating a digital image of a printed page. This is performed using a device called a scanner, which is attached to a computer. The digital image of the page is saved on the computer. Some scanners can also be attached to a mobile phone through a cable, Bluetooth, or over Wi-Fi. There is a wide variety of scanners available in the market. Such as:

  • Flatbed scanners suitable for individuals or small organizations, typically costing USD 50 or more. In such scanners, each page of the book has to be manually turned. It takes about 30 to 40 seconds to scan one page. Typically, it would take about 2 to 3 hours to scan a 200 to 250 pages book. A disadvantage of flatbed scanners is the inability to flatten the page on the scanning surface when the book is thick. You may prefer flatbed scanners with scanning glass which continues until the edge as shown in the picture below. With a scanning surface that reaches the edge, the spine of the open book can be easily placed over it. This flattens the page to be scanned on the scanning surface.
    Picture flatbed scanner showing scanning area till the edge and how a thick book can be placed easily
  • Scanners with automatic document feeders (ADF) can be priced from USD 400 to USD 1000. When using ADF scanners the book spine needs to be cut, with all pages placed in the scanner paper tray. The ADF scanner takes one page at a time and scans the whole stack by itself. A full-duplex ADF scanner will scan both sides of the page simultaneously. A 200-page book could be scanned in 5 to 7 minutes. Note that when you use an ADF scanner, you will need to cut the spine of the book and then after scanning is complete bind/file/stitch the book otherwise it will be unusable for reading.
    Picture of Automatic feeder scanner
  • Camera Scanners consist of a digital camera placed on a stand. The open book is placed under the camera. The camera takes photos of the open page. The curvature that gets created in the center of an open book (between two pages) often creates problems in capturing good quality photos of the pages.
    picture of camera scanner
  • High-Speed Scanners: High-Speed Scanners: If the need is to scan a large number of books, as in a university library, without cutting the book spine, then a high-speed professional book scanner can be used. These professional scanners have the capability of turning pages themselves and can scan between 1500 and 2000 pages per hour. The cost of such professional scanners is in the range of USD 100,000 to USD 120,000 and well-trained technical human resources are also required to operate and maintain them.
    picture of fully automatic ROBOT scanner
  • Smartphone apps: On the other end of the spectrum, a smartphone could also work as a scanner. There are dedicated apps for scanning using the phone camera. Additionally, cloud storage apps such as Google Drive, Dropbox, and OneDrive have OCR features.

Diffrent types of scanners can be seen in the video below.

Video transcript for Scanners

Optical Character Recognition (OCR)

Once a scanner is attached to the computer, the scanning process is done using OCR software such as ABBYY Fine Reader, Omnipage, Kurzweil, or Open-book. After the image is captured, we get the option to extract text from the image. Instead of extracting text from every page, we can scan multiple pages and then begin the text extraction process in one go. If an ADF scanner is used, then images of the whole book is captured before running the text extraction process. Text extraction is possible only if the language of the book is supported by the OCR software being used. After the text is extracted, we can export the text as a Microsoft Word document.

Watch the video below to see the demonstration of the process.

Video transcript for OCR

Although ABBYY FineReader OCR supports more than 200 languages, many Asian and African languages are not yet supported. In such a case, you should check if your language is supported in other OCR solutions such as the Google OCR. If it is, you can scan the book using FineReader or any other scanning utility and save the book as image PDF file.

Tags: DAISY / EPUB / Word