Indians as hybrids (a.k.a Aryan invasion in the house!)
A few months ago a friend tipped me off to the fact that David Reich was going to publish a paper about the genetics of Indians which he ascertained was going to model these populations as hybrids between “Europeans and Andaman Islanders.” The paper is out, and my friend was roughly right. Reconstructing Indian population history:
India has been underrepresented in genome-wide surveys of human variation.
We analyse 25 diverse groups in India to provide strong evidence for two ancient populations, genetically divergent, that are ancestral to most Indians today. One, the ‘Ancestral North Indians’ (ANI), is genetically close to Middle Easterners, Central Asians, and Europeans, whereas the other, the ‘Ancestral South Indians’ (ASI), is as distinct from ANI and East Asians as they are from each other. By introducing methods that can estimate ancestry without accurate ancestral populations,
we show that ANI ancestry ranges from 39-71% in most Indian groups, and is higher in traditionally upper caste and Indo-European speakers. Groups with only ASI ancestry may no longer exist in mainland India. However, the indigenous Andaman Islanders are unique in being ASI-related groups without ANI ancestry. Allele frequency differences between groups in India are larger than in Europe, reflecting strong founder effects whose signatures have been maintained for thousands of years owing to endogamy. We therefore predict that there will be an excess of recessive diseases in India, which should be possible to screen and map genetically.
The paper itself is relatively tight and concise; a lot of the sausage-making is thrown into the supplementary information. This is freely available online, and in fact I would suggest that the first half of supplement 1 has more meat than the paper itself.
As for that, the text is not as bold than the abstract, or the press summations which have appeared in its wake. For example, they say:
We warn that ‘models’ in population genetics should be treated with caution. Although they provide an important framework for testing historical hypotheses, they are oversimplifications. For example, the true ancestral populations of India were probably not homogeneous as we assume in our model, but instead were probably formed by clusters of related groups that mixed at different times. However, modelling them as homogeneous fits the data and seems to capture meaningful features of history.
I generally agree with the gist of this. The main issue I would also highlight is that these results only clarify and solidify what was likely from previous analyses of worldwide genetic variation. That is, the populations of Northwest India are closer to those of the Middle East & Europe than those of Southeast India are. It was rather awesome that they confirm that the Onge, who are almost extinct, are a relatively unadmixed ancient population. The Onge branch seems to descend from an ancestral population which also gave rise what is termed in the paper “Ancestral South Indian” (ASI). They exhibit no admixture with “Ancestral North Indians” (ANI). This paper confirmed and clarified as well as that the proportion of West Eurasian related lineages increases both as a function of geography and caste. That is, there is a SE-NW and lower-to-upper caste gradient whereby West Eurasian related lineages become more prevalent. This has long been known, but this paper did it with more SNPs across the genome.
Here is a table which shows the proportion of ANI is a range of populations:
All you really need to know about the Z-score is that negative scores indicate high levels of admixture. Here is a table which tells you a bit more about the populations above:
The following figure illustrates the general model which looms in the background of this paper:
Note that the Andaman Islanders, the Onge, aren’t really the ancestors of Indians on the mainland. Rather, they’re a branch of the ancient population which presumably first settled South Asia, and close to the ASI. Who were the ASI? Since they aren’t really around, we can only generate conjectures and inferences. In this paper the ANI are actually represented in some ways by Europeans, even though presumably the assumption is that both these are daughter populations of another group. Though not pushed very hard, they do mention proto-Indo-Europeans as the candidate for the ANI.
At this point, let’s look at the PCA chart (I’ve reedited and labelled as usual):
This should not surprise, previous work shows that South Asians distribute along an axis away from Europeans. One of the points in the paper is that there is both geographic and caste stratification. I added some labels, but I thought drilling-down was probably useful. I don’t know all these groups off the top of my head, and I assume few of readers do either. So I zoomed in:
I think some of the shortcomings with a sample size on the order of the low hundreds is rather clear. They couldn’t even use all their samples, or some of the samples were not relevant to the question on hand. The Siddis are an Indian-African mix which emerged during the period of Muslim domination when that group imported black slaves. The Tibeto-Burman groups of Northeast India are interesting, but outliers. The general trends are clear, North Indian groups have more ANI than South Indian groups, and upper caste groups have more ANI than lower caste groups, but that is only with “all things equal.” Note that upper caste South Indian groups clearly have more ANI than lower caste South Indians, but they have a lower proportion than some North Indian lower castes, and are in the range of one North Indian tribal group. Some of the outliers are also interesting; the lower caste individual similar to Austro-Asiatic tribals is from a group which resides in a region with many Austro-Asiatic peoples. Clearly there has been identity switching, so you have aberrations such as one North Indian tribal who clusters with Kashmiri Pandit Brahmins! The Austro-Asiatic group is also interesting, because they speak languages related to those of Southeast Asia. Here is a map of the Austro-Asiatic languages:
We know with near 100% certainty that much of Burma & Thailand were dominated by Mon-Khmer languages before the arrival of the Shan, Bamar (Burmans) and Thai peoples (to mention a few). This is matter of historical record, the rise of modern Burma and Thailand was largely a story of the eclipse of Mon and Khmer societies who transmitted to them much of the Indic character which they have (e.g., the northern populations often arrived as Mahayana Buddhists, but the Mon and Khmer Theravada Buddhism was adopted as the dominant religions in the new states). The position of the Munda languages is more confused, as some posit that they arrived from the east, while others argue that the the Austro-Asiatic languages expanded east from India. This is not going to be resolved in this blog post, but let me note that the genetic data above, which show an “eastern” affinity of the Munda, can be combined to with cultural datum such as the arrival of rice farming from the east and historical records which document the migration of populations from Burma, to construct a plausible east-west narrative. In contrast it seems an almost default position by many that the Austro-Asiatics are the most ancient South Asians, marginalized by Dravidians, and later Indo-Europeans. I would not be surprised if it was actually first Dravidians, then Austro-Asiatics and finally Indo-Europeans. Dravidian are found in every corner of the subcontinent (Brahui in Pakistan, a few groups in Bengal, and scattered through the center) while the Austro-Asiatics exhibit a more restricted northeastern range. As I noted above, supplement 1 has a lot of gems. For example, the authors note that previous work which found little regional differentiation in Indian Americans might have been problematic because there is a great deal of intraregional variance which when collapsed loses essential information. This chart shows South Asians + Utah Whites + 85 American Gujaratis in light blue:
Note that about half of Gujaratis form their own unexplained cluster! Throwing them together in one pool would mask this phenomenon. Here’s their possible explanation:
Interestingly, one of the GIH subgroups fall outside the main gradient of Indian groups, suggesting that they harbor substantial ancestry that is not a simple mixture of ASI and ANI. A speculative hypothesisis that some Gujarati groups descend from the founders of the “Gurjara Pratihara” empire, which is thought to have been founded by Central Asian invaders in the 7th century A.D. and to have ruled parts of northwest India from the 7-12th centuries. I. Karve noted that endogamous groups with names like “Gurjar” are now distributed throughout the northwest of the subcontinent, and hypothesized that that they likely trace their names to this invading group.
I don’t know if this is plausible; perhaps a Gujarati reader would immediately recognize what this cryptic substructure is.
Next are two charts which shows Indians, Europeans, and Chinese. In the first the PCA was originally constructed with Europeans & Chinese, and the Indians were projected onto it using the variation found in the first two groups. In the second case, Indians and Chinese were used to construct the PCA, and Europeans projected.