My Blog List

Tuesday, May 31, 2011

Troubles with StepPCO,Part II: The Solution

Now, i would like to return to the previosuly discussed problems with instantiating StepPCO model.
I've been able to contact one of  the software developpers and it appears that my troubles stems from the fact that the spco cannot find a window that would separate your parental groups.  
Indeed, if one looks at the PCA-based plot (see attached below ), the samples I  choosed as your parental groups (pop1(Lithuanians)=colnames(Chr1[,27:32]), pop2 (Belarusians)=colnames(Chr1[,72:78])) are found right in the middle of the "cloud", so there is no separation between them (no matter how big a window you take) The extension of principal component analysis (PCA) is used to obtain a signal of admixture from an individual genome, starting from the center of each window anc increasing the window until the mean PC1 coordinates for the parental populations are separated by three standard deviations ("3 sigmas") from each mean. The goal is to achieve a complete separation of the parental populations within each window, so there is no ambiguity in assigning chromosomal segments in an admixed genome to either ancestral population. 
Thus, in our case, there is no a signal of (either complete or  partial) separation between pop1 and pop2, and this lack of signal implies that parental populations (or individuals) are not chosen correctly.

Yesterday i decided to give  StepPCO a second chance (since this excellent piece of R code really deserved that). This time i limited the scope of my analysis to the thinned set of high quality SNPs (2212 independent SNPs in linkage equilibrium) on Chromosome 22.
Then I extracted from my datatset 96 reference persons (HGDP+Behar dataset: Orcadians, Romanians, Russians, Belorusians and Lithuanians) and re-calculated the mean PCA  coordinates for populations  of this new pruned dataset, omitting the "real" participants of the MDL project. Instead of default 3 SDs window size, i set  window.size parameter to 100 SNPs.


PCA plot 
PCA plot. Orcadians=forestgreen, Romanians=blue, Hungarians=brown, Lithuanians=red, Belorusians=orange, Russians=purple.








  Then i calculated   StepPCO parameters for the selected genotypes, using HGDP's Orcadians and  Russians as parental populations (or, using more proper phrasing, distnict populations), and ran a wavelet transform analysis on choosen individuals from Hungarian, Romanian, Belarussian and Lithuanian populations.





The real advantage of StepPCO is that calculated WT coefficients can be used to obtain accurate estimates of the time of admixture from suitable genome-wide SNP data in ancestral populations, which can be statistically-differentiated:
 "The spectral analysis of the StepPCO signal revealed that the average dominant frequency for the African-Americans is located at level 1.8, which would correspond to an abundance of low frequency wavelets (that is, wider ancestry blocks), while for the Fijians and the Polynesians the average dominant frequency is at level 3.06 and 3.63 respectively, which is indicative of much narrower ancestry blocks (Figure 7). Based on simulations, the WT center of 1.8 corresponds to an admixture time of 6 generations ago (95% CI: 4-8 generations) for the African Americans. Assuming a generation time of 30 years [33], our results indicate that the admixture in the African Americans started about 180 years ago. Similarly, the simulations indicate that the WT center of 3.63 for the Polynesians corresponds to an admixture time of 90 generations (95% CI: 77-131 generations), or about 2,700 years ago (Figure 8). The time estimation for Fiji is based on simulated data with a 40% admixture rate (to match the higher admixture rate of Fiji), and here the WT center of 3.06 corresponds to an admixture time of 37 generations (95% CI: 29-39) or about 1,100 years ago. "

Using the same method,  StepPCO's authors esitimated  an average of 19% European ancestry in Afircan-Americans, with a wide range of less than 5% to more than 40% European ancestry across individuals. Both the average and the observation of a wide range of individual admixture estimates are in keeping with previous studies . The estimated time of admixture is about 180 years ago (95% CI: 120-240 years ago), which is probably an underestimate since admixture in the African-American population is ongoing.

 Returning to our own sample of 96 reference individuals, we tested the performance of the StepPCO method on Orcadians and Russians. We calculated WT centers for the statistically-differentiated individuals (Orcadians -HGDP00806, HGDP00805, HGDP00801,HGDP00804,HGDP00797,HGDP00810; Russians -HGDP00879. The most common center of WT is 2.44, which probably corresponds to an admixture time of 15-20 generations (the correct corresponding value can be obtained using the simulations, but i haven't done it yet).


No comments:

Post a Comment