Linkedin Twitter
Nome del progetto:

A tech company is looking for partners to apply HORIZON-CL2-2023-HERITAGE-01-03: Re-visiting the digitisation of cultural heritage: What, how and why?

Stato: Idea
Data di creazione: 03-02-2023

Obbiettivi del progetto:

Short summary A startup, set up by two PhDs in Istanbul University-Cerrahpasa, currently develops technologies for the conversion of Ottoman documents to modern Turkish using NLP and deep learning techniques. Projects include Ottoman OCR, Ottoman-Turkish Transliteration and Ottoman-Turkish Translation. Team includes computer scientists, historians, and language experts. Looking for new projects in cultural heritage and are eager to cooperate other parties in projects in the broader field of digital humanities

Full description Six hundred years of Ottoman history is rich with all kinds of legacies inherited from the lands conquered in 3 continents. These lands today geographicaly encompass more than 20 countries, including Turkey, many Arab countries, Iran, the Balkans and Eastern Europe, North Africa, etc. The Ottoman sources were originally written in at least 25 languages, and at least 8 alphabets although the great majority was written in the Ottoman language using the Ottoman alphabet. The Ottoman heritage includes innumerable architectural, pictorial and other forms of arts, and many types of crafts in textile, metal, wood, and other materials from a variety of historic, geographic, and political backgrounds that accounts for an important part of the European cultural heritage today. The Ottoman empire has left behind many libraries and state archives as a legacy in Turkey, Europe, and around the world. These resourses contain literally thousands of books, magazines, newspapers, millions of pages documents in the forms of financial and commercial records, census records, court records, military, educational and tax records, land records, diplomatic records, etc. written mainly in the Ottoman language and the script. Looking for partners in developing a digital heritage project titled “End-to-End conversion of Ottoman documents to modern Turkish” in order to transform documents in Ottoman libraries and archives into modern Turkish and possible other European languages. We take a three-step holistic approach to this century-old conversion problem: (i) the Ottoman transcription via OCR and HTR, (ii) the Ottoman-Turkish transliteration, and (iii) the Ottoman-Turkish (and possibly other languages) translation. An example of this conversion process is as follows: IMAGE → transcription (via OCR/HTR) → دولت علي عثماني → Devlet-i Âl-i Osmâni → Büyük Osmanlı Devleti (or Great Ottoman Empire). This project follows the footsteps of the Transcribus project previously developed for the transcription of the medieval European manuscripts in European languages. Our project adds two extra steps to transcription which Transcribus use as the main method to make manuscripts searchable: transliteration and translation. Transliteration makes manuscripts readable by citizens and translation makes manuscripts understandable to citizens. OCR or HTR is the process of converting documents to editable text by recognizing letters in digital images. We already developed an OCR tool for printed documents in naksh font. However HTR is required for the millions of handwritten documents in Ottoman archives. OCR and HTR together will enable us to transcribe documents very efficiently. Transliteration is simply the process of converting texts from one alphabet to another. We developed a phrase-based Ottoman-Turkish transliteration tool which currently operates at 75% word-transliteration accuracy. To achieve practically acceptable accuracies, sequence to sequence based deep learning methods should be implemented. Once transliterated to Latin-based Turkish alphabet, the Ottoman text can be translated to both modern Turkish and other European languages. The translation can be performed by using traditional NLP methods for the Ottoman to modern Turkish since they have similar lexical, morphological and syntactic, and semantic structures, or by using sequence to sequence deep learning methods for the Ottoman to other European languages.

Advantages and innovations
  • Currently citizens can’t read or understand historical Ottoman documents because of alphabet and language barriers; the alphabet barier is that citizens can’t read the Arab-based Ottoman alphabet was replaced with Latin-based Turkish alphabet in 1928. The language barier is that people nowadays can’t understand the Ottoman language laced with Arabic and Persian words and structures to a great degree.
  • The End to End Ottoman-Turkish Conversion project will enable citizens to search, read, and understand Ottoman manuscripts, books and news papers, printed or handwritten. This conversion tool will be used in many branches of science including history, language, literature, arts and humanities for studying Ottoman documents otherwise accesible to expert only. It will greatly change the way studies are conducted. It will enable teachers and students alike to read and study Ottoman books, magazines and newspapers in libraries. This project will also enable people to translate the historical Ottoman documents to many European languages.
Technical Specification or Expertise Sought Looking for partnership from academia, libraries, archives, goverment agencies, or IT companies for taking part and playing different roles in this project. Academicians from arts and humanities departments can take places as research partners. Libraries and archives can work with us in developing and converting the historical documents they own into digital objects thru the 3-step conversion and help share these digital objects with public. Educators, faculty, and staff members in universities, schools and other institutions can participate in the choosing and use of historical texts in their curriculum. The government agencies working in cultural heritage can play important parts in the preservation and integration of the output of this projects into similar projects. We are open to work with IT companies who develop or market similar projects as well

Contact / source: NEXT EEN Widgets (europa.eu)

SE SEI INTERESSATO/A AD AVERE PIU' INFORMAZIONI SU QUESTO PROGETTO REGISTRATI O EFFETTUA IL LOGIN SE SEI GIA' REGISTRATO/A