The team ran into two major problems: hyphens and abbreviations. Medieval scribes often saved valuable parchment by abbreviating words – sometimes dramatically. They would also write up to the very border of the script area before arbitrarily hyphenating whatever word they were on when they ran out of space. Since Transkribus “reads” whole words rather than individual letters, it had to learn to recognize words even when abbreviated or hyphenated.If an algorithm can accurately transcribe 13th-century Latin handwriting without help, I'm impressed.
Clearing those hurdles is now paying off. The new Latin-reading Transkribus is capable of precisely transcribing the peculiar handwriting found in 13th-century Latin legal documents.
But wait. This article concludes with something closer to home for PaleoJudaica:
Gervers notes that Transkribus would be an ideal program for Ge’ez, an Ethiopic script he has worked with alongside Latin since the 1990s. Largely unchanged over its 2,000-year history, the Ge’ez script was used in one of the earliest known complete Gospel manuscripts and is still used in Ethiopia today.Regular readers will recall that the University of Toronto has an impressive program in Ethiopic (Ge'ez) studies.
Gervers says the script is “perfect for machine transcription.” Why? Ge’ez has no abbreviations and conveniently puts colons at the ends of words and sentences.
For past posts on algorithms being applied to the study of antiquity, see here and links. Cross-file under Ethiopic Watch, Algorithm Watch, and The Singularity is Near.
Visit PaleoJudaica daily for the latest news on ancient Judaism and the biblical world.