Tuesday, October 25, 2016

Sound change as systemic evolution

I have been discussing the peculiarities of sound change in linguistics in a range of blog posts in the past (see Alignments and Phylogenetic Reconstruction, Directional Processes in Language Change, Productive and Unproductive Analogies). My core message was that it is really difficult to find an analogy with biology, as sound change is not the simple mutation of one sound in a certain word, but the regular modification of all sounds of all words in the lexicon which occur in a specific contextual slot.

Scholars have tried to model this as concerted evolution (Hruschka et al. 2015). But the analogy with biology does not sound very convincing, as the change concerns the production of speech rather than its product. By this, I mean that sound change concerns the abstract system by which speakers produce the words of their language. Think of speakers in comic books who lose a tooth in some fight. Often, in order to show how their speech suffers from this loss, writers illustrate this by replacing certain "s" sounds in the speech of the victims with a "th" (in German, it would be an "f"). They do this in order to illustrate that with a lost tooth, it is "very difficult to thpeak". In the same way, writers imitate speech of people suffering from speech impediments like sigmatism (lisp). The loss of a tooth changes all "s"es in a person's language. Sound change, at least one type of sound change, is identical with this.

In a recent talk I gave with Nathan Hill at a conference in Poznań, we found a way to demonstrate this on actual language data. In this talk, we used data from eight Burmish languages (a language family spoken mainly in the South-West of China and in Myanmar), which we coded for partial cognates (as these languages contain many compounds). We aligned these cognate sets automatically, and then searched for recurring patterns in the alignments. One needs to keep in mind that our words in linguistics are extremely short, and we have no more than five sounds per alignment in our data, which translates to five sites in an alignment in biology.

While biology knows certain contextual patterns like hydrophilic stretches in alignments (as already demonstrated in the famous ClustalW software, compare Thompson et al. 1994), the context in which a sound occurs in language evolution is even more important. We can, for example, say, that the beginning of a word or morpheme is usually the most stable part, where sounds change much more slowly than in the other parts (in the end of a word or of a syllable). We thus concentrated only on the first sound of each word and looked at the patterns of sounds we could find there.

Those patterns in our data usually look like this:

Cognate set L1 L2 L3 L4 L5 L6 L7 L8
word 1 p p p Ø f f Ø p
word 2 p Ø p p Ø f p p
word 3 k Ø k s k Ø k
word 4 Ø k Ø s Ø s k
... ... ... ... ... ... ... ... ...

Note that the symbol "Ø" in this context denotes missing data, as we did not find a cognate set in the given language. As always, most of our data is patchy, and we have to deal with that. You can see that when looking only at the first sound in each alignment, we find quite a degree of variation; and if you look at all the data, you can see some things that seem to structure, but the amount of complexity is still immense. You may see this from the following plot, showing only some 100 of the more than 300 patterns we created (coloured cells represent not necessarily the same sound, but one of ten different sound classes to which the more than 50 different sounds in our data belong):

Sound patterns (initial consonant) in the aligned cognates sets of the Burmish languages

Interestingly, however, most of the variation can be reduced quite efficiently with help of network techniques. Since we are dealing with systemic evolution, it is straightforward to group our more than 300 alignments into groups that evolve in an identical manner. At least this is what our linguistic theory predicts, and what linguists have been studying for the last 200 years. When looking at the patterns I gave above, you can see that we can easily group the four sounds into two groups:
Cognate set L1 L2 L3 L4 L5 L6 L7 L8
word 1 p p p Ø f f Ø p
word 2 p Ø p p Ø f p p
- - - - - - - - -
word 3 k Ø k s k Ø k
word 4 Ø k Ø s Ø s k

Essentially, the two groups reflect only two patterns, if we disregard the gaps and merge them into one row each:
Cognate set L1 L2 L3 L4 L5 L6 L7 L8
word 1 / word 2 p p p p f f p p
- - - - - - - - -
word 3 / word 4 k k k s k s k

What is important when grouping two alignments into one pattern is to make sure that they do not contain any conflicting positions. This can be checked in a rather straightforward manner by constructing a network from the data. In this network, the nodes are the alignment sites (word 1, word 2, etc. in our examples), and links are drawn between nodes if two sites are not in conflict with each other. If we use this criterion of compatibility on our data, we receive following network:

Compatibility network of the sites in our aligned cognate sets

In the network, I further coloured the nodes according to the overall similarity of sounds present in them. The legend gives capital letters for major sound classes, in order to facilitate seeing the structure.

This network itself, however, does not tell us how to group the data into classes that correspond to one identical process of systemic evolution, as we can still see many conflicts. In order to solve this, we need to carry out a specific partitioning analysis that cuts the network into an ideally minimal number of cliques. Why cliques? Because a clique will represent patterns in our data that do not show any conflicts in their sounds, and this is exactly what we want to see: those patterns that behave identically, without exceptions.

The problem of finding the minimal clique partition of a network is, unfortunately, a hard one (see Bhasker and Samad 1991), so we needed to use some approximate shortcuts. Nevertheless, with a very simple procedure of clique partitioning, we succeeded at reducing the 317 cognate sets that we selected for our study down to 35 groups that covered 74% of the data (234 cognate set), with a minimal size of 2 alignments per group. The "manual" inspection by the Burmish expert in our team (that is Nathan Hill) showed that many of these patterns correspond to what experts assume was one single sound in the ancestral Proto-Burmish language.

But to just illustrate more closely what I mean by reducing patterns to unique groups, look at the following pattern, which shows different nasal sounds in the data:

Nasal sounds in the Burmish data

And then at another pattern, showing s-sounds:

S-sounds in the Burmish data

I think (at least I hope) that the amount of regularity we find here is enough to demonstrate what is meant by the regularity of sound change in linguistics: sound change is in some sense just like losing a tooth, but for a complete population of speakers, not just one speaker, as the population starts to change all sounds occurring in a certain environment to some other sound.

Our results are not perfect: the 26% of unique patterns, for example, are something we will need to look into in more detail in the near future. A quick check showed that they may result from errors in the cognate annotation, but also from peculiarities in the data, and even simply from sounds that are rare in the languages under investigation.

We are currently looking into these issues, trying to refine our approach. I realized, for example, that the minimal clique coverage problem has been studied before by other researchers, and I found a rather large amount of Russian literature on the topic (see, for example, Bratceva and Čerenin 1994 and Ryzhkov 1975), but those approaches do not seem to have been thoroughly studied in the Western literature. We also know that at some point we need to relax our approach, allowing for some exceptions — we know that systemic sound change processes are easily overridden by language-specific factors, be it lateral transfer, or pragmatics in a larger sense (think of Bob Dylan, talking of "the words I never KNOWED" in order to make sure the word rhymes with "ROAD", or the form "wanna" as a shortcut for "want to").

Not all cases in which speakers changed the pronunciation of sounds have systemic reasons, and we are still far from actually understanding the systemic reasons that lead to the regular aspects of sound change. What we can show, however, is that sound change is really something peculiar in language evolution, with no real counterpart in biology. At least, I do not know of any case where a set of 300 alignments could be reduced to some 35 largely identical patterns. This shows, on the other hand, that the classical biological approaches that try to model each site of an alignment independently are definitely not what we need in order to model sound change realistically. The assumption of independence of sites in an alignment is already problematic in biology. In linguistics, at least in the cases illustrated above, it seems to be just as useless as tossing a coin to predict the weather in a desert: it is too much of an effort with very poor results to be expected.

  • Bhasker, J. and T. Samad (1991): The clique-partitioning problem. Computers \& Mathematics with Applications 22.6. 1 - 11.
  • Bratceva, E. and V. Čerenin (1994): Otyskanie vsex naimen’šix porkrytij grafa klikami [Searching all minimal clique coverages of a graph]. Žurnal Vyčislitel’noj Matematiki i Matematičeskoj Fisiki [Journal of Computational Mathematics and Physics] 34.8-9. 1272-1292.
  • Hruschka, D., S. Branford, E. Smith, J. Wilkins, A. Meade, M. Pagel, and T. Bhattacharya (2015): Detecting regular sound changes in linguistics as events of concerted evolution. Curr. Biol. 25.1. 1-9.
  • Ryzhkov, A. (1975): Partitioning a graph into the minimal number of complete subgraphs. Cybernetics 11.6. 939-943. Original article: Рыжков А. П., Разбиение графа на минимальное число полных подграфов .. 90-96. Kybernetika 1975. 6.
  • Thompson, J., D. Higgins, and T. Gibson (1994): CLUSTAL W. Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22.22. 4673–4680.