Why Languages and Dialects - These Are Completely Different Things - Alternative View

Why Languages and Dialects - These Are Completely Different Things - Alternative View
Why Languages and Dialects - These Are Completely Different Things - Alternative View

Video: Why Languages and Dialects - These Are Completely Different Things - Alternative View

Video: Why Languages and Dialects - These Are Completely Different Things - Alternative View
Video: How language shapes the way we think | Lera Boroditsky 2024, May
Anonim

Often times, simple questions have complex answers. For example: what is the difference between language and dialect? If you decide to ask a linguist about this, then sit back. Although at first glance there is nothing difficult in this question, it can be answered in different ways.

The distinction between language and dialect can depend on one point of view or another. From a political point of view, language is what the people usually speak as a nation. For example, from 1850 to 1992, the so-called Serbo-Croatian language existed, which included several dialects, including Serbian, Croatian and Bosnian. But after Yugoslavia split into several independent states in the mid-1990s, these dialects were recognized as separate languages. This political definition partly works, although it brings with it more problems than solutions: there are languages that are spoken in different countries, a striking example of this is Spanish in Latin America. No one would argue that Mexican Spanish and Colombian Spanish are different languages. Possibly Spanish, spoken in parts of Spain,so different from its Latin American varieties that it deserves to be called a separate language, but this is not completely clear.

But what if we draw the line between language and dialect, guided by the criterion of mutual understanding? Unfortunately, this approach is not ideal either. A Dane will understand Swedish somewhat better than a Swede will understand Danish. Likewise, a person who speaks the peculiar peasant dialect of British English will understand an American from Los Angeles much better than an American. Mutual understanding often depends on external factors - they are a rather uncontrollable variable - and not on inherent properties of the language itself.

So maybe we should adopt a purely linguistic approach. Imagine that we can systematically measure the difference (D) between two speech variants. The specific meaning of D would then allow us to define the boundary between two dialects and two languages. Finding such a value should not be very difficult, since the two languages can be compared according to many criteria, such as sound inventory, grammatical characteristics or vocabulary.

But what if the differences between speech variations are gradual and the probability of finding a given value of D is as high as the probability of finding some other value? Then we would have to choose an arbitrary value of D as a starting point, and an arbitrary value would throw us back to considerations of a political or practical nature that we did not intend to consider. Do we want our break line to be at the level where Serbian and Croatian are different languages or one? Cataloging the languages of the world, how many thousands of languages do we want to consign to oblivion: five? Or seven? Maybe ten thousand?

In recent years, two main obstacles to distinguishing between language and dialect have been overcome. The first is how to measure the differences between speech variants - that is, find a value for D. In 2008, a group of linguists created the Automated Similarity Judgment Program (ASJP), a daily curator and founder of which is yours truly. The ASJP has painstakingly prepared a systematic language comparison dataset that currently contains 7655 word lists. They are relevant to two-thirds of the world's languages, assuming that, for our purposes, languages are defined by the ISO 639-3 standard code. Since each word list contains a fixed set of 40 concepts that are interpreted consistently, it is easy to compare them and thus get a measure of the difference. The most commonly used measure of difference between two words is the Levenshtein distance, a term named after Vladimir Levenshtein, a Soviet scientist who in 1965 developed an algorithm for comparing two character strings. He defined "spacing" as the minimum number of substitutions, insertions, and deletions required to transform one line into another. The Levenshtein distance can usefully be divided by the length of the longest of the two lines, and thus put all distances on a scale from 0 to 1. This phenomenon is known as the normalized Levenshtein distance, or LDN. He defined "distance" as the minimum number of replacements, insertions and deletions required to transform one line into another. The Levenshtein distance can usefully be divided by the length of the longest of the two lines, and thus put all distances on a scale from 0 to 1. This phenomenon is known as the normalized Levenshtein distance, or LDN. He defined "distance" as the minimum number of replacements, insertions and deletions required to transform one line into another. The Levenshtein distance can usefully be divided by the length of the longest of the two lines, and thus put all distances on a scale from 0 to 1. This phenomenon is known as the normalized Levenshtein distance, or LDN.

The second obstacle is that perhaps "language" and "dialect" are concepts that can only be defined in arbitrary order. Here we have some very encouraging news for you. If we look at all the language families in the ASJP database, where the project participants have included a significant proportion of closely related varieties, we can begin to search for different types of behavior of languages and dialects. An intriguing picture arises before us: the distances, as a rule, fluctuate either around a relatively small value, or around a relatively large one, and a depression forms between them. As it turns out, most often this trough lies in a narrow range, averaging about 0.48 LDN. We will not be wrong if we say that speech variants tend not to be partially similar to each other in their basic vocabulary. Either there is a tendency towards greater similarity, in which case the speech variants can be defined as different dialects, or towards less similarity - in which case they can be attributed to different languages. This is where the borderline between language and dialect lies.

This phenomenon was probably the result of social circumstances. Dialects diverge as people settle in new territories and form new identities, but if there is still some contact between them, convergence can take place such that speech variants remain more than 50 percent similar (and therefore speech is about one language). However, a slight push towards divergence may cause language variants to diverge relatively quickly, increasing the Levenshtein distance, in which case we can qualify them as separate languages. There may be a relationship between the boundary for the distances between words in the standard list used by ASJP and the corresponding distances in other parts of the language structure, which can lead to a serious loss of understanding. In other words,the threshold of understanding may correlate with the threshold between languages and dialects. We have not yet studied this issue, but as an object for research it is extremely interesting.

Promotional video:

Having arrived at an objective rather than arbitrary criterion for separating languages from dialects, we can apply it to the languages of the world. Some pairs of speech varieties that are considered national languages, such as Bosnian and Croatian, are well below the LDN threshold of 0.48 (which means they are the same language regardless of the existence of Yugoslavia). Not far from this threshold are languages such as Hindi and Urdu (they can hardly be called two different languages). And varieties of Arabic and Chinese, which are often viewed as the same languages, soar above LDN = 0.48 (these varieties are separate languages). In fact, there are several pairs of varieties that are usually considered different languages but fluctuate on the border: for example, Danish and Swedish, which have an LDN of 0.4921.

Finally, a technique derived from datasets, called ASJP chronology, can be applied to establish the amount of time it takes for dialects to distance themselves far enough apart to be considered separate languages. The result we found is 1059 years, if we do not take into account some error. These findings are supported by how long it usually takes for the generic language of a language family to split into daughter languages, which later become the ancestors of subfamilies. For this we need other methods of analysis, but the results are similar: it takes about a millennium to turn dialects into languages. We know this because we can now distinguish one from the other.

Søren Wichmann is a Danish linguist who collaborates with Leiden University (Netherlands), Kazan Federal University (Russia) and Peking Linguistic University (China). His latest book, Temporal Stability of Linguistic Typological Features (2009), was co-authored with Eric W. Holman.

Recommended: