How The Voynich Manuscript Secrets Are Revealed: An Investigation - Alternative View

Table of contents:

How The Voynich Manuscript Secrets Are Revealed: An Investigation - Alternative View
How The Voynich Manuscript Secrets Are Revealed: An Investigation - Alternative View

Video: How The Voynich Manuscript Secrets Are Revealed: An Investigation - Alternative View

Video: How The Voynich Manuscript Secrets Are Revealed: An Investigation - Alternative View
Video: The Voynich Manuscript Decoded and Solved? 2024, May
Anonim

What is behind the sensational news about the Voynich manuscript and about Russian scientists, is it possible to accurately determine the language from the text, how adequate mathematicians are in working on the "field" of linguistics.

On April 19, the Russian media disseminated the news about the "epoch-making" discovery of Russian mathematicians: using the new method, scientists not only proved the meaningfulness of the famous "Voynich manuscript", but were also able to determine that it was written in two languages and with the exception of letters for vowels.

The Voynich manuscript is a medieval illustrated manuscript purchased in 1912 by the antiquary Wilfred Voynich. Created in the 15th century (based on radiocarbon analysis of parchment - but most scientists at the moment do not consider the text itself a later forgery), it is written in an unknown language using an unknown alphabet. Judging by the illustrations, the text consists of thematic blocks: botanical, astronomical, pharmacological and others. The complexity of decoding the text made the Voynich manuscript the “holy grail” for cryptographers and the object of many studies, including those using Big Data methods.

The news about the manuscript was reported as something sensational. This immediately aroused some concern. “Before that, all attempts to decipher a unique document and even just understand whether it is a meaningful text failed. 600 years of useless efforts!.. Cryptographers of the CIA and NSA, supercomputers and even doctors of "occult sciences" signed their complete impotence. The latest post by cryptologist Gordon Rugg of Keele University in the UK reads: “The Voynich manuscript is bogus. Such a “complex text” is easy to construct for anyone who is familiar with simple copying methods,”the article said.

First, the meaningfulness of the text was recognized back in the 1970s and several times confirmed in studies of the 2010s, which was written about in sufficient detail even in the domestic media. Secondly, the discovery submitted to the news was presented only in the form of an institute preprint, and not in an article in an international peer-reviewed journal (the preprint was also published back in 2016).

These oddities in the presentation of the material forced us to seek clarifications first from the author of the study, and then to independent experts - linguists who work with statistical and mathematical methods, as well as with decoding of ancient scripts.

It is easy to write a formula, but it is very expensive to carry out numerical analysis

Promotional video:

First, briefly about the essence of the study. The authors of the preprint, mathematicians from the Moscow Institute of Physics and Technology and the Institute of Applied Mathematics of the Russian Academy of Sciences, rely on their works, according to which "the frequency distribution of text symbols is a stable characteristic not of the author or the subject of the text, but of the language." That is, using a set with the help of mathematical tools, you can determine in which language it is written, due to the fact that each language has its own characteristic "profile" (distribution of the Hurst exponent). Further, taking these methods as a basis, the scientists established that the text of the manuscript was written in a mixture of several languages. At the same time, false spaces were added to it and the symbols denoting vowel sounds were removed.

The lead author of the study, Yuri Orlov (IPM RAS and MIPT), stressed that the Voynich manuscript is not at all the main goal of their work. "The 'sensational' manuscript is just an illustration of the mathematical method of recognizing languages from text - a problem, in fact, for machine learning," Orlov said.

The manuscript itself is absolutely not interesting to us. Science refers specifically to the statistics of languages. Through it, we can understand in what language this manuscript is written. But not what is written there, this is an important point. - Yuri Orlov. MIPT and the Institute of Applied Mathematics named after M. V. Keldysh

Regarding the linguistic method used in the work, Orlov notes that the analysis of the frequency of letter combinations in texts itself is a well-known thing. However, the Hurst indicator is poorly known to linguists, since it is difficult to calculate even in mathematical terms. The formula itself is easy to write, but numerical analysis is very costly. For this, the supercomputer located at the Institute named after M. V. Keldysh, the mathematician emphasizes.

The choice of Indo-European languages for analysis is explained by the fact that they are all very similar, Orlov says. Indicators developed by mathematicians make it easy to distinguish languages within the same language group, but not between families. Of course, it is theoretically possible to carry out the same work with other groups (Ural, Altai or others), but the value of the analysis lies in its completeness, Orlov is sure. In the case of Indo-European languages, it is not difficult to type a corpus of texts for each language; it is more difficult to do it with other families.

Returning to the Voynich manuscript, Orlov noted that he and his colleagues cited five proofs (the logarithmic profile of the frequency ordering of letters in the text in one and several languages, the distribution of the Hurst exponent, the spectral portrait of the matrix of conditional probabilities, and others) of the hypothesis about the mixture of languages in the manuscript and the deletion letters for vowels. They emphatically distance themselves from the "hangout around the manuscript", but they presented a unique result - an open method, statistical analysis with an assessment of reliability that can be independently verified.

The conclusion is depreciated by the fact that we do not understand on what material they derived and on what they checked their formula

The very assumption that the text of the Voynich manuscript is devoid of letters for vowels, with incorrectly spaced spaces is beautiful and good, notes the linguist Evgenia Korovina, who is engaged in mathematical statistics of language (Institute of Linguistics, Russian Academy of Sciences). Previously, no one put forward such a hypothesis. For example, she beautifully explains why there are fewer letters than would be expected for a European text. But the problem is that the authors of the study did not even indicate which texts in different languages they compared and what was the volume of these tests. A huge number of languages are mentioned in the preprint. Therefore, the study is not reproducible: if you take arbitrary texts in the same languages, it is not a fact that the same patterns will come out.

Maria Molina, a specialist in corpus methods in the study of ancient languages (Institute of Linguistics, RAS), agrees with Korovina. New methods of processing linguistic data, in her opinion, help to obtain information about what was previously closed to language researchers. However, insufficiently well-prepared input material often discredits even the finest data processing techniques.

The conclusion is depreciated by the fact that we do not understand on what material they were drawing and on what they were checking their formula. For my material, I know for sure that there is a small methodological error - and I get critically different numbers. - Maria Molina. Institute of Linguistics RAS

“Garbage in - garbage out,” adds Molina (GIGO is a principle in computer science that means that incorrect input data will result in incorrect results, even if the algorithm itself is correct, - note Indicator. Ru).

Statistical methods are still hints of results, not results

Albert Davletshin (an employee of the Center for Linguistic Comparative Studies of the Institute for Comparative Studies of the Russian State Humanitarian University, studies the Maya and Polynesian languages) spoke even more sharply. If the authors of the preprint weren't going to decipher the Voynich manuscript, why are they doing it? And further, if we talk specifically about the decoding of unknown writing, question after question arises: “There are no initial data on writing - what type of letter? How are the different transcriptions obtained? How many characters? What underlies the existing assumptions about the nature of writing? What is the length of a word separated by spaces and without spaces? What do spaces mean? How large is the dictionary? What is the ratio of signatures and drawings?

At first, it turns out that the text is Danish and only Danish (and this is historically impossible, about which there is not a word in the work). Then it turns out that the text is in two unknown languages (verification at this stage turns out to be impossible and is taken on faith). Moreover, there are many conservative ways to show that two (large) pages are written in one letter, but in different languages, without resorting to complex mathematical models. Finally, if vowels are removed from the text, how much is this confirmed by standard, long-known methods (for example, Sukhotin, Shevoroshkina and Ventris)?"

Davletshin also criticizes the insensitivity to philology and history characteristic of this kind of research:

What I see in the text: there are often people who want to take source X and forget that it is a source and exists in some historical, including linguistic, context, and somehow count something in it. The hypothesis that there is more than one language in a manuscript is interesting. But you could somehow show it humanly. Statistical methods are still hints of results, not results. -Albert Davletshin. Center for Linguistic Comparative Studies IVKA RSUH

There is no criterion for distinguishing interesting results from terrible ones

A more balanced position was taken by Georgy Starostin, an expert on comparative historical linguistics (RSUH). He was more interested in how useful new mathematical methods are for solving problems facing linguists. “The model presented in the article makes a strange impression. On the one hand, it seems to belong to the category of "blind", analyzing text data without any preliminary judgments about the structure of the alphabet (for example, digraphs, like the English ch, sh, should be considered combinations of two letters, although this is actually one sound). On the other hand, vowels are thrown out of the compared strings, which, according to the authors of the text, contain less information and rather add noise. In general, the test base is clearly very small, it is impossible to talk about something fundamental in so many languages."

The results of the comparison of Indo-European and Uralic languages, presented in comparative table 3 in the article, do not inspire particular optimism in Starostin. Some indicators of the degree of proximity of languages are captured well (for example, intra-Germanic or intra-Romanesque connections), some poorly (for example, the methodology no longer reveals the Indo-European family). The main thing is that there is no criterion for distinguishing interesting results from terrible ones. In the best case, the method makes it possible to single out small linguistic groups (although even here it does not work between closely related Finnish and Estonian), but all these groups can be reliably identified without it.

Table 3 from the preprint, which presents the results comparing Indo-European and Uralic languages. The same color in the table. 3 groups of languages are identified that are pairwise close (in the sense of the L1 norm of the distributions of ordered frequencies in texts without vowel). Some unexpectedly close language pairs are marked in red, such as German / Hungarian, English / Estonian, Latin / Basque, and Greek / Finnish. Preprint authors: Arutyunov A. A., Borisov L. A., Zenyuk D. A., Ivchenko A. Yu., Kirina-Lilinskaya E. P., Orlov Yu. N., Osminin K. P., Fedorov S. L., Shilin S. A
Table 3 from the preprint, which presents the results comparing Indo-European and Uralic languages. The same color in the table. 3 groups of languages are identified that are pairwise close (in the sense of the L1 norm of the distributions of ordered frequencies in texts without vowel). Some unexpectedly close language pairs are marked in red, such as German / Hungarian, English / Estonian, Latin / Basque, and Greek / Finnish. Preprint authors: Arutyunov A. A., Borisov L. A., Zenyuk D. A., Ivchenko A. Yu., Kirina-Lilinskaya E. P., Orlov Yu. N., Osminin K. P., Fedorov S. L., Shilin S. A

Table 3 from the preprint, which presents the results comparing Indo-European and Uralic languages. The same color in the table. 3 groups of languages are identified that are pairwise close (in the sense of the L1 norm of the distributions of ordered frequencies in texts without vowel). Some unexpectedly close language pairs are marked in red, such as German / Hungarian, English / Estonian, Latin / Basque, and Greek / Finnish. Preprint authors: Arutyunov A. A., Borisov L. A., Zenyuk D. A., Ivchenko A. Yu., Kirina-Lilinskaya E. P., Orlov Yu. N., Osminin K. P., Fedorov S. L., Shilin S. A.

Finally, it is an interesting idea to determine the genetic characteristic of a language by the distribution of the Hurst exponent and, perhaps, even brought to some scientific point. But this will require processing a large number of texts in different languages. And the problem immediately arises: many languages are unwritten, and how correct it is to compare the alphabetic recording systems with phonetic transcriptions remains unclear. There will be very little practical sense from this idea, Starostin is sure. At best, it really can be applied to incidents like the Voynich manuscript, when there is a hypothesis that some language with a standard alphabetic writing is encrypted according to certain principles (for example, with the deletion of vowels, etc.). However, there are very few such incidents in the world.

Summing up

What's in the bottom line? The discussion around the IPM and MIPT research revealed a deep rift between the linguistic community (even those using statistical methods) and “outsiders” regarding linguistics specialists who decided to apply their mathematical tools to linguistic material.

The fact that mathematicians do not want to work together with linguists does not just give rise to gross blunders, which then migrate to the media (for example, the Basque language in the preprint is called Indo-European, there is the phrase "vowel letters"). The beauty of the models and the computational power of supercomputers is actually devalued by errors at the point of entry. Again, with the desire and openness of contacts with colleagues from a different discipline, these mistakes could easily be avoided.

See the Voynich Manuscript itself here.