Why apply OCR on Historic Documents?

OCR Opens our Cultural Heritage

  • Text based search technologies can only access old documents after OCR is applied
    • OCR allows much better access to historic documents and books
  • OCRed historic text is easier to read
  • Conversion to “modern” digital formats such as
    • XML - with meta information, like layout information
    • Searchable PDFs
    • Ebooks
  • OCRed text can be re-used, for example:
    • re-print
    • online access

Historical Knowledge is needed for Modern Science

  • Scientists, Librarians and Researchers can extend information retrieval systems point/reference on a much more granular level, for example:
    • Paragraphs or sentences or words can be directly accessed – instead of “just” giving an issue number, page or paragraph
    • the required text can be found via full text search
  • Electronic side by side comparison of books/documents/articles becomes possible which offers advantages for scientific work

Differences in Fonttypes

The following diagram shows the differences between “round” and “broken” fonts. It is obvious that documents printed in “old” fonts look very different and that they are hard to read, even for humans.

Image Source: http://de.wikipedia.org/wiki/Gebrochene_Schrift

More about this topic can be found on Blackletter Fonts on Wikipedia

Back to: Historic OCR Overview

Further Information

A more technical details about the optical character recognition (OCR) can be found on the ABBYY Developer Portal

This website uses cookies which enable you to see pages or use other functions of our websites. You can turn off such cookies in your browser’s settings. If you continue to use these pages, you consent to the use of cookies.