2.4.3 | Case-study population genetics datasets
We investigate, as two separate case studies, the admixture histories of the African American (ASW) and Barbadian (ACB) population samples from the 1000 Genomes Project Phase 3 (1000 Genomes Project Consortium, 2015). Previous studies identified, within the same database, the West European Great-Britain (GBR) and the West African Yoruba (YRI) populations as reasonable proxies for the sources of both ACB and ASW, consistently with the macro-history of the Transatlantic Slave-Trade (Baharian et al., 2016; Martin et al., 2017; Verdu et al. 2017).
Samples in the 1000 Genomes Project were a priori sampled to be family unrelated. To avoid confounding factors due to cryptic relatedness in our sample compared to MetHis simulations, we excluded individuals more closely related than first-degree cousins in the four populations separately using RELPAIR (Epstein, Duren, & Boehnke, 2002), as previously done (Verdu et al. 2017). We also excluded the three ASW individuals showing traces of Native American or East-Asian admixture, as reported in previous studies (Martin et al., 2017). Among the remaining individuals we randomly drew 50 individuals in the targeted admixed ACB and ASW, respectively, and included the remaining 90 YRI individuals and 89 GBR individuals.
We extracted biallelic polymorphic sites (SNPs as defined by the 1000 Genomes Project Phase 3) from the merged ACB+ASW+GBR+YRI data set, excluding singletons. Since MetHis only simulates independent markers, we LD-pruned the ACB and ASW SNP-sets using the PLINK (Purcell et al., 2007) –indep-pairwise option with a sliding window of 100 SNPs, moving in increments of 10 SNPs, and r2 threshold of 0.1. Finally, we randomly drew 100,000 SNPs from the remaining SNP-set.