Pages

Tuesday, April 16, 2024

Using AI to reconstruct damaged Hebrew & Aramaic inscriptions?

TECHNOLOGY WATCH: Beersheba researchers use AI to read illegible words in ancient Hebrew, Aramaic. This study is the first attempt to apply a masked language modeling approach to corrupted inscriptions in Hebrew and Aramaic languages (Judy Siegel-Itzkovich, Jerusalem Post).
Now, students in the software and information systems engineering department at Ben-Gurion University of the Negev (BGU) in Beersheba have approached this challenge as an extended masked language modeling task where the damaged content can comprise single characters, character n-grams (partial words), single complete words, and multi-word n-grams.

This study is the first attempt to apply the masked language modeling approach to corrupted inscriptions in Hebrew and Aramaic languages, both using the Hebrew alphabet consisting mostly of consonant symbols.

Just to be clear, this project did not analyze any actual ancient inscriptions. It used passages in the Hebrew Bible, with parts randomly masked, to test in principle how well it worked in reconstructing the missing bits. It worked pretty well.

Will it work as well on damaged ancient inscriptions outside the Bible? Maybe. That would be pretty hard to test. You would need multiple copies of the same inscription with damage in different places. Possible in principle, but very rare.

What about the technology's promise in principle?

On the one hand, used judiciously, it could well serve as a useful tool for scholars working on deciphering damaged ancient inscriptions. So all respect to the researchers who developed this technology. They are doing good and constructive work.

But on the other hand, its usefulness is limited. Overuse of it could even harm the field. The so-called (and I would say, mis-named) "AI" that has come into vogue in the last few years is just glorified autocorrect. It can catalogue and compare what we already know, which can be very helpful, but it can't add anything new.

The danger with regard to ancient Hebrew and Aramaic inscriptions is that the reconstructions could make them over in the image of the Bible, just because the comparison corpus is the Bible.

Human judgment and creativity are still required to make sense of any results a computer algorithm produces. And AI technology is nowhere near replicating human judgment and creativity. It if ever does, it won't be through the "AI" that we have now.

A fair counterpoint (I've run out of hands) is that human scholars, using those "time-consuming manual procedures to estimate the missing content" can also remake the inscription in the image of the Bible. I've seen it happen and I've also seen it called out when it did. (I'm going to be nice and not give examples.)

But the danger remains that results from AI will be received as somehow more infallible because they are computer generated and we tend, naively, to trust computers not to make mistakes. A final critical assessment of the results by human judgment is still essential.

The underlying article is available for free in the ACL Anthology, March 2024:

Embible: Reconstruction of Ancient Hebrew and Aramaic Texts Using Transformers
Niv Fono, Harel Moshayof, Eldar Karol, Itai Assraf, Mark Last

Abstract

Hebrew and Aramaic inscriptions serve as an essential source of information on the ancient history of the Near East. Unfortunately, some parts of the inscribed texts become illegible over time. Special experts, called epigraphists, use time-consuming manual procedures to estimate the missing content. This problem can be considered an extended masked language modeling task, where the damaged content can comprise single characters, character n-grams (partial words), single complete words, and multi-word n-grams.This study is the first attempt to apply the masked language modeling approach to corrupted inscriptions in Hebrew and Aramaic languages, both using the Hebrew alphabet consisting mostly of consonant symbols. In our experiments, we evaluate several transformer-based models, which are fine-tuned on the Biblical texts and tested on three different percentages of randomly masked parts in the testing corpus. For any masking percentage, the highest text completion accuracy is obtained with a novel ensemble of word and character prediction models.

Visit PaleoJudaica daily for the latest news on ancient Judaism and the biblical world.