Converting images of text to editable format

Converting images of text to editable format

Documents like agreements and meeting notes are generally scanned and saved as image PDFs. Often, books from publishers are also received as image files.

One way to convert such image file to Word file is to use an OCR like ABBYY FineReader, a popular tool with which you may be familiar. However, ABBYY Fine Reader doesn’t support many Asian or African languages such as Hindi or Amharic.

Another way to extract text from images is to use Google OCR. Google Drive is a file storage and synchronization service created by Google. It allows users to store files in the cloud, share and edit documents. With two simple steps it converts PDF files to Microsoft Word by performing OCR as follows.

  1. Open Google Drive by going to https://drive.google.com and sign in using your Google/Gmail account. Click on the My Drive button, then click on Upload Files and select the PDF files you want to convert. Note that there is a file size limit of 2 MB or 10 pages and you will have to choose files accordingly.
    screenshot of Google Drive Upload
  2. Once the file is uploaded, you will be able to locate it in the files listed in the browser. Right-click on the file and select Open With Google Docs. Your PDF file will then be converted to an editable text document after using the Google OCR service.
    Screenshot Google Drive OCR

  3. The converted Google Docs document can now be downloaded as a Microsoft Word document and saved on your local drive. This process is explained in the video below.

Video transcript for Google Docs OCR

Google Drive OCR can also handle multi-lingual and multi-font documents. If a page has text in multiple languages, the OCR will often detect those languages and convert the text to become editable.

The free Google accounts have a restriction, the file cannot be more than 10 pages or 2 MB in file size. Google also offers its OCR service for businesses, enabling them to create products which embed the Google OCR process without limitations. An example of such a product is called Accessital, which offers accessible book production workflow management, where many volunteers are involved in proofreading scanned books. Accessital offers Google OCR service without any page or size restrictions as part of their subscription. More information is available at
https://iaccessible.net/accessital

Tags: DAISY / EPUB / PDF / Word