Home

The Challenge: Digitising Old Texts

  • Black letter fonts, also known as “Gebrochene Schriften” or broken scripts, first emerged as early as the 12th century, and evolved over the years to consist of a variety of derivations and font types.
  • The Fraktur typeface, domi­nant in Germany, was created on behalf of the German Emporer Maximilian and soon became popular in many parts of Europe.

  • Common characteristics and peculiarities of the type include the elongated s and ligatures, or “joined” letters for certain letter combinations. The frequency of its application makes the understanding of Fraktur essential for studying text and developing recognition technologies for the period between 1800 and 1938.
  • Now that the worldwide flow of information is becoming digital, and digital library collections are being created - so it is important to start to make historic documents available on-line.
  • Scanning is just the first step - Optical Character Recognition is just as important to “open” the content for humans, for search and for other analysis technologies.

A Solution form ABBYY: Standard OCR v.s. "Gothic/Fraktur" OCR

ABBYY began developing Fraktur OCR technologies in 2003 as at that time no technology was available on the market for:

  • Sophisticated “old font” OCR technologies
  • Historical (computer) dictionaries suitable for OCR
  • Language models for analysing and verifying printed historic texts


*Processed with ABBYY Recognition Server: Gothic/Fraktur enabled/disabled

Resume:

  • The sample clearly shows that tuned and optimized recognition technologies have to be used when processing historic documents printed in old fonts.
  • The same, of course, applies when “old” and “modern” fonts are mixed.

Further Information

IMPACT Centre of Competence
… is a new, none profit organisation with the mission to make the digitisation of historical printed text “better, faster, cheaper”. It will provide tools, services and facilities to further advance the state-of-the-art in the field of document imaging, language technology and the processing of historic text.