My Blog List

Thursday, June 2, 2011

Graphoanalytical approach to the analysis of genetic relationships

It goes without saying that the admixture analysis and MDA&PCA (statistical) calculations are most typical methods for inferring the underlying genetic structures in the human populations. But sometimes it is very useful to visualize the shared genetic components (or, rather, segments) as graphs. Graph visualization can be used to better understand underlying patterns of genetic effects (such as the genetic drift, the founder's effect etc). 
Earlier i attempted to  re-formulate the basic aspects of the graphoanalytical  approach  , proposing graph visualization techniques, which can be easily applied to the analysis of IBD (identical by descent) segments. I even succeeded in  producing a neat graphical output for IBD files, but the method itself required more amending and refining efforts. 

As I was willing to invest more time and efforts to make of my preliminary considerations of graph visualization something meaningful,  then this moning i decided to repeat my previous experiment. Prior to running PLINK routine commands for detecting IBD  segments, i removed almost all  individuals with 3 or more standard deviations from the mean Z-score. Another subset of individuals (c.9% of all dataset)  was removed due to the suprisingly significant values of inbreeding coefficient F (in PLINK, the calculations of inbreeding coeffecient F are based on the observed versus expected number of homozygous genotypes). The champions of inbreeding coefficient F are Orcadians, and some Lithuanians/Belarusians, whose F-score is significantly higher than European average value of F. As some of project's participants might be interested in the re-evaluation of Z-score and F-inbreeding coeffecient, i've uploaded the corresponding NEAREST and HET files. 

Then i reduced the dataset by performing linkage disequilibrium based SNP pruning  and removing low quality SNPs. Then i created the genetic map (in cM unit) for the remainig set of  circa 72000 SNPs (this time i sorted&joined the interpolated data from HapMap)  and ran the segmental sharing analysis in PLINK (it took c.70 minutes to complete this task, since the scope of the analysis was restricted to the low sharing 1cM/1Mb smps).The resulting file (available for donwloading here)  has been converted into delimited table format and then read into R tool for network analysis with Igraph.

I used values of NSEG (number of pairwise shared segments -see INDIV.SUMMARY file) to create a weighted undirected graph. The created graph was then analyzed using custom Igraph's methods of community detection by extended modularity and the calculation of weighted betweenness  in igraph. The vertex and edge betweenness are  defined by the number of geodesics (shortest paths) going through a vertex or an edge. With all preprocessing of graph done in R, i converted Igraph object into GraphXML and pulled it into GEPHI  , to apply some minor visual and aesthetic tweaks.







You can see the final graph in its full glory in PDF format (with a loseless invariant scaling), although the better representation of graph is achieved in GEPHI (compare the pictures uploaded above).







 

No comments:

Post a Comment