Insights into Population Origins
The suggested population origins (table 1) can now be considered in the light of these Y results. Information is provided by haplogroup frequencies, which can be used to produce admixture estimates, and these are easy to interpret if populations are large and isolated and the source populations have different frequencies. When these conditions are not met, the presence of distinct Y lineages can still be informative.
The origins of the Parsis are well-documented (Nanavutty 1997) and thus provide a useful test case. They are followers of the Iranian prophet Zoroaster, who migrated to India after the collapse of the Sassanian empire in the 7th century a.d. They settled in 900 a.d. in Gujarat, India, where they were called the “Parsi” (meaning “from Iran”. Eventually they moved to Mumbai in India and Karachi in Pakistan, from where the present population was sampled (fig. 7). Their frequencies for haplogroups 3 (8%) and 9 (39%) do indeed resemble those in Iran more than those of their current neighbors in Pakistan. They show the lowest frequency for haplogroup 3 in Pakistan (apart from the Hazaras; fig. 1C). The mean for eight Iranian populations was 14% (n=401) (Quintana-Murci et al. 2001), whereas that for Pakistan, excluding the Parsis, was 36%. The corresponding figures for haplogroup 9 were 39% in the Parsis, 40% in Iran, and 15% in Pakistan excluding the Parsis. These figures lead to an admixture estimate of 100% from Iran (table 3). Given the small effective population size of the Parsis, the closeness of their match to the Iranian data may be fortuitous, and the presence of haplogroup 28 chromosomes at 18% (4% in Iran; Wells et al. 2001) suggests some gene flow from the surrounding populations. The TMRCA for the Parsi-specific cluster in the haplogroup 28 networks was 1,800 (600–4,500) years (table 8), consistent with the migration of a small number of lineages from Iran. Overall, these results demonstrate a close match between the historical records and the Y data, and thus suggest that the Y data will be useful when less historical information is available.
The population that is genetically most distinct, the Hazaras, claims descent from Genghis Khan’s army; their name is derived from the Persian word “hazar,” meaning “thousand,” because troops were left behind in detachments of a thousand. Toward the end of the 19th century, some Hazaras moved from Afghanistan to the Khurram Valley in Pakistan, the source of the samples investigated here. Thus, their oral history identifies an origin in Mongolia and population bottlenecks ∼800 and ∼100 years ago. Of the two predominant Y haplogroups present in this population, haplogroup 1 is widespread in Pakistan, much of Asia, Europe, and the Americas, and so provides little information about the place of origin. Haplogroup 10, in contrast, is rare in most Pakistani populations (1.4%, when the Hazaras are excluded) but is common in East Asia, including Mongolia, where it makes up over half of the population (unpublished results). Admixture estimates (table 3) are consistent with a substantial contribution from Mongolia. BATWING analysis of the Hazara-specific haplotype clusters in haplogroups 1 and 10 suggested TMRCAs of 400 (120–1,200) and 100 (6–600) years (table 8), respectively. Thus, the genetic evidence is consistent with the oral tradition and, in view of its independent nature, provides strong support for it (fig. 7).
Some other suggested origins receive more limited support from the Y data. The Negroid Makrani, with a postulated origin in Africa, carry the highest frequency of haplogroup 8 chromosomes found in any Pakistani population, as noted elsewhere (Qamar et al. 1999). This haplogroup is largely confined to sub-Saharan Africa, where it constitutes about half of the population (Hammer et al. 2001) and can thus be regarded as a marker of African Y chromosomes. Nevertheless, it makes up only 9% of the Negroid Makrani sample, and haplogroup 28 (along with other typical Pakistani haplogroups) is present in this population. If the Y chromosomes were initially African (fig. 7), most have subsequently been replaced: the overall estimate of the African contribution is ∼12% (table 3).
The Balti are thought to have originated in Tibet, where the predominant haplogroups are 4 and 26. Neither was present in the sample from this study, providing no support for a Tibetan origin of the Y chromosome lineages and an admixture estimate of zero (table 3). However, this result must be interpreted with caution, because of the small sample size.
Three populations have possible origins from the armies of Alexander the Great: the Burusho, the Kalash, and the Pathans. Modern Greeks show a moderately high frequency of haplogroup 21 (28%; Rosser et al. 2000), but this haplogroup was not seen in either the Burusho or the Kalash sample and was found in only 2% of the Pathans, whereas the local haplogroup 28 was present at 17%, 25%, and 13%, respectively. Greek-admixture estimates of 0% were obtained for the Burusho and the Pathans, but figures of 20%–40% were observed for the Kalash (table 3). In view of the absence of haplogroup 21, we ascribe this result either to drift in the frequencies of the other haplogroups, particularly haplogroups 2 and 1, or to the poor resolution of lineages within these haplogroups, resulting in distinct lineages being classified into the same paraphyletic haplogroups. Overall, no support for a Greek origin of their Y chromosomes was found, but this conclusion does require the assumption that modern Greeks are representative of Alexander’s armies. Two populations, the Kashmiris and the Pathans, also lay claim to a possible Jewish origin. Jewish populations commonly have a moderate frequency of haplogroup 21 (e.g., 20%) and a high frequency of haplogroup 9 (e.g., 36%; (Hammer et al. 2000). The frequencies of both of these haplogroups are low in the Kashmiris and Pathans, and haplogroup 28 is present at 13% in the Pathans, so no support for a Jewish origin is found, and the admixture estimate was 0% (table 3), although, again, this conclusion is limited both by the small sample size available from Kashmir and by the assumption that the modern samples are representative of ancient populations.
The suggested origin of the Baluch is in Syria. Syrians, like Iranians, are characterized by a low frequency of haplogroup 3 and a high frequency of haplogroup 9 (9% and 57%, respectively; Hammer et al. 2000), whereas the corresponding frequencies in the Baluch are 29% and 12%. This difference and the high frequency of haplogroup 28 in the Baluch (29%) make a predominantly Syrian origin for their Y chromosome unlikely, and the admixture estimate was 0% (table 3), although the 8% frequency for haplogroup 21, the highest identified in Pakistan thus far, does indicate some western contribution to their Y lineages. The Brahuis have a possible origin in West Asia (Hughes-Buller 1991) and it has been suggested that a spread of haplogroup 9 Y chromosomes was associated with the expansion of Dravidian-speaking farmers (Quintana-Murci et al. 2001). Brahuis have the highest frequency of haplogroup 9 chromosomes in Pakistan (28%) after the Parsis, providing some support for this hypothesis, but their higher frequency of haplogroup 3 (39%) is not typical of the Fertile Crescent (Quintana-Murci et al. 2001) and suggests a more complex origin, possibly with admixture from later migrations, such as those of Indo-Iranian speakers from the steppes of Central Asia and others from further east. This possibility is supported by the detection of low frequencies of haplogroups 10, 12, and 13 in the Brahuis, all rare in Pakistan and typical of East Asia, East and northern Asia, and Southeast Asia, respectively.
The failure to find a Y link with a suggested population of origin does not disprove a historical association, but it does demonstrate that the Y chromosomes derived from such historical events have been lost or replaced. Analyses of mitochondrial DNA and other loci would help to elucidate the population histories and would be particularly interesting in populations like the Negroid Makrani and the Balti, in which there is a contrast between the phenotype and the typical Pakistani Y haplotypes.