How robust are these findings?

The strength of any model-based inference also depends on an assessment of the robustness of the findings. One concern may be that our choice of ancient or contemporary languages somehow biased results. The 20 ancient languages might provide more reliable location information, being the earliest representatives of the main Indo-European lineages. Conversely, the position of the ancient languages in the tree, particularly the three Anatolian varieties (Hittite, Lycian and Luvian), might have unduly biased our results in favour of an Anatolian origin. We investigated both possibilities by repeating our analyses separately on only the ancient languages and only the contemporary languages (which excludes Anatolian). Consistent with our findings based on the full data set, both these analyses still supported an Anatolian origin.

Another possible concern is that our initial model did not differentiate between movement across land and water (although it did rule out solutions that placed ancestral languages in water). This may be particularly important for testing between an Anatolian and Steppe homeland due to their placement next to the Caspian Sea and Black Sea. We therefore developed a novel methodology to allow for different rates of movement onto and across land and water and then looked at the impact that varying these rates had on support for an Anatolian origin. We allowed rates of movement into water to be 10-fold and 100-fold less likely than rates of movement across land. We also tried a model in which there was no aversion to water and rapid movement across water (we called this the ‘sailor’ model). In each case, we continued to find strong support for the Anatolian theory.

Finally, to build our language phylogeny we rely on vocabulary data, but other sources of data can also be used. Phonological data (the sounds present in languages) or morphological data (the way they structure words and sentences) also provides information about language ancestry. Whilst our results incorporate uncertainty in the inferred family tree relationships, phonological and morphological data have been interpreted to support an Indo-European branching structure that differs slightly from the pattern we find, particularly near the base of the tree. We therefore repeated our analyses, constraining the tree to fit with this alternative branching pattern. This analysis produced even stronger support for an Anatolian origin.

What about the potential for borrowing words?

One objection to modelling language evolution along the branches of a tree might be that languages aren’t like species because, whereas species cannot swap their genes, languages can and do borrow words from other languages. Wouldn’t borrowing between languages produce misleading results?

It’s true that languages borrow words from other languages, but it is also worth noting that across much of the tree of life (certainly among plants and bacteria) species also swap genes – known as horizontal gene transfer in biology. The more interesting question, then, is how frequently do languages borrow from one another and, is this a problem for inferring language family trees?

The first point to note is that rates of borrowing are much lower in the kinds of basic vocabulary terms that we base our analyses on than in the rest of the lexicon. In English, for example, over 50% of the Oxford English dictionary comprises words from French, borrowed following the Norman conquest. However, among basic vocabulary terms, this number falls to just 5%.

Second, the cognate data we use excludes known cases of borrowing, such as English mountain borrowed from French montagne. So, to the extent that we can identify these cases, there should be very few borrowings in the data.

And third, the sensitivity of the phylogenetic methods to borrowing can be tested by simulating data on a tree with varying levels of borrowing. When this simulated data is then analysed, we find the methods perform well at recovering the ‘true tree’ for all realistic levels of borrowing.

How does this study build on our 2003 paper in Nature?

The two main competing theories for the origin of the Indo-European languages imply very different age ranges – 5000-6000 years BP for the Steppe hypothesis versus 8000 to 9500 for the Anatolian farming hypothesis. In 2003, we published a paper in Nature [download] in which we used phylogenetic methods to date the age of the Indo-European language family and thereby test between the two hypotheses. We inferred an age range for the family of between 7800 and 9800 years, consistent with the Anatolian farming theory and outside the range implied by the Steppe hypothesis.

However, some scholars remained unconvinced. One criticism raised about our initial paper was that we did not include information from ancient languages, deep in the family tree. A second criticism was that whilst we examined the timing of the origin of Indo-European, the question of the location of the origin, which is crucial to both theories, remains untested.

Our most recent paper addresses both of these criticisms. First, we analyse a newly compiled dataset of 103 languages that includes 20 ancient languages, providing more certainty about relationships that are deep in the tree. Second, we apply new phylogeographic methods that allow us to trace the history of the language family in space as well as time. As we report in the paper, both the inferred timing and location of Indo-European origin strongly supports the Anatolian farming theory.

As the languages expanded, did people move with them?

This is a fascinating question that we still don’t yet know the answer to. Langauges can expand with the movement of people who speak them. Sometimes, however, language expansion involves the adoption of a language without the movement of people per se – for example, when the majority adopts a high status language, perhaps as a result of a ruling elite. In fact, there are a myriad of possibilities here and a lot of excellent research in anthropology and linguistics on the processes that lead one language to win out over another. An interesting line of evidence in Europe combines our findings with recent work in population genetics using ancient human DNA. Our results point to a spread of Indo-European languages with the spread of farming 8-9.5kya. New genetic studies using ancient DNA also find evidence of an expansion of genes (and hence the people carrying them) from Anatolia commensurate with the expansion of agriculture. That would suggest that in this case people and languages were moving together to some extent. However, it also seems plausible that at least some of the previous inhabitants of Europe might have found the agricultural way of life appealing and adopted the language of the Indo-Europeans along with their way of life.

Why do people think Indo-European languages came from the Steppes?

Advocates of the Steppe origin theory claim that there is a compelling reason why the Indo-European language family fits with a 5000 to 6000 year old Steppe origin – evidence from “linguistic paleontology”, an approach in which terms reconstructed in the ancestral “proto-language” are used to make inferences about its speakers’ culture and environment. Scholars have noted that words for technological innovations, such as terms for ‘wheel’, ‘axle’, ‘yoke’, ‘horse’ and ‘to go, transport in a vehicle’, are consistent across many Indo-European sub-families. On the basis of this, it is argued that these words and the associated technologies must have been present in the Proto-Indo-European culture, However, wheeled vehicles do not appear in the archaeological record until about 5000 years BP. Hence, the argument goes, the Indo-European languages must have expanded at around this time.

We are sceptical of these claims because inferences based on linguistic palaeontology have thus far failed to satisfy the following three requirements: -

  1. In order to reconstruct a term to Proto-Indo-European, the common ancestor of all Indo-European languages, it must be present in those languages that are first to branch off from the base of the tree. It is not enough to point to similar terms in some sub-groups of the family. Thus, in the case of Indo-European, if a word is not present in the Anatolian languages at the base of the tree, there is no reason to think it was present in Proto-Indo-European.
  2. The putative shared forms across the family cannot be the result of more recent borrowing. However, terms for new technologies are highly likely to be borrowed along with the technology itself, and wheeled vehicles appear to be a prime example. It is true that linguists can sometimes identify borrowed words (particularly more recent borrowings) on the basis of the presence or absence of certain systematic sound correspondences. However, not all borrowings can be identified in this way. In the case of wheeled vehicles, borrowed terms are unlikely to be identifiable as such – if terms associated with wheeled transport were borrowed 5000-6000 years ago, as we would expect, then the terms in each of the major Indo-European lineages will have undergone all of the sound changes that characterize each lineage. This would make the words appear native to the lineage and thus inherited from Proto-Indo-European when in fact they could were early borrowings.
  3. Whilst linguists can reconstruct the sound of words in proto-languages with some degree of certainty (the above caveats aside), reconstructued meanings are much less certain. Arguments for linguistic palaeontology also need to rule out the possiblity of independent semantic innovations from a common root, which can produce apparently related words with meanings that were not present in the common ancestral language. For example, upon the development of wheeled transport, words derived from the Proto- Indo-European (PIE) term *kwel- (meaning ‘to turn, rotate’) may have been independently co-opted to describe the wheel “*kwekwlo-”.

We have not yet seen any compelling evidence that meets these requirements.

The historical linguist, Larry Trask, captures most of the above arguments more succinctly: -
“There is a PIE word *ekwo- ‘horse’, as well as *wegh- ‘convey, go in a vehicle’, *kwekwlo- ‘wheel’, *aks- ‘axle’, and *nobh- ‘hub of a wheel’. This has led some scholars to conclude that the PIE-speakers not only rode horses but had wagons and chariots as well. This is debateable, however, since everyone places PIE at least 6000 years in the past, while hard evidence for wheeled vehicles is perhaps no earlier than 5000 years ago. Watkins (1969) considers that these terms pertaining to wheeled vehicles were chiefly metaphorical extensions of older IE words with different senses (*nobh-, for example, meant ‘navel’). The word *kwekwlo- ‘wheel’ itself is derived from the root *kwel- ‘turn, revolve’.  Nevertheless, the vision of fierce IE warriors, riding horses and driving chariots, sweeping down on their neighbours brandishing bloody swords, has proven to be an enduring one, and scholars have found it difficult to dislodge from the popular consciousness the idea of the PIE-speakers as warlike conquerors in chariots.”  (Trask, 1996).