Tuesday, April 10, 2012

'There is no new Hebrew without ancient Hebrew'

THE ACADEMY OF THE HEBREW LANGUAGE is making progress on its Historical Dictionary of the Hebrew Language, but there is still much work to be done.
'There is no new Hebrew without ancient Hebrew'

Historical dictionary aims for greater comprehensiveness than researchers say computers can yield.


By Nir Hasson (Haaretz)
This is a rich article that is difficult to excerpt (read it all), but here are a few highlights:
The digitization of old texts has become fairly common - for instance, the Hebrew University of Jerusalem recently announced it was expanding the digital version of its Einstein archives - but, in something of a switch for the Internet era, the Historical Dictionary of the Hebrew Language is aiming for greater comprehensiveness than researchers say computers can yield.

[...]

The dictionary already contains more than 20,000 entries and has been under construction for 58 years. But it could take another generation until the dictionary is complete, said the president of the language academy, Moshe Bar-Asher.

The labor-intensive project requires many hours spent reading, defining and breaking down word after word in an effort to create the most comprehensive map of the Hebrew language. There is something seemingly Sisyphean about the snail's pace of fastidiously compiling a massive database of Hebrew words.

The 7,919 texts that have been entered into the database include the Mishna and Talmud - but not the Bible, for which concordances already exist. Other ancient texts include the Dead Sea Scrolls, Gaonic literature from the 6th to 11th centuries and medieval poetry. The most recent writer is S.Y. Agnon, who died in 1970, and other modern writers whose work is going in the database include Chaim Nachman Bialik, Ahad Ha'am, Vladimir Jabotinsky and Mendele Mocher Sefarim. Haim Be'er, A.B. Yehoshua and Amos Oz could also make it in, according to Bar-Asher.

[...]

The problem starts after the year 1050. At that time the language began to expand significantly, and it is not possible to enter all the texts ever written in Hebrew into the database. From then to today, Ben-Asher estimates that 500 million to 1 billion words have been written in Hebrew. In order to decide whether to include a given text in the database, scholars assess the influence a text has had on the Hebrew writing that came later or affords a look at unique Hebrew words, such as scientific literature from the Middle Ages.

[...]
Prediction: Technological advances in AI in the next decade or two will mean that this project is completed well ahead of schedule and more comprehensively than planned now.