To form a reference population necessary for genomic selection of dairy cattle, it is important to acquire information on the genetic diversity of the original population. Our report is the first among the studies on breeding of farm animals to implement Wright's F-statistics for this purpose. Genotyping of animals was performed using BovineSNP50 chip. In total, we genotyped 499 heifers from 13 breeding farms in the Leningrad oblast. We calculated Weir and Cockerham's F(st) estimate for all pairwise combinations of herds of breeding farms and the values obtained were in the range of 0.016-0.115 with the mean of 0.076 ± 0.002. Theoretical F(st) values for the same pairwise combinations of herds were calculated using the ADMIXTURE program. These values were significantly (p
The use of genome-wide single nucleotide polymorphism (SNP) data has recently proven useful in the study of human population structure. We have studied the internal genetic structure of the Swedish population using more than 350,000 SNPs from 1525 Swedes from all over the country genotyped on the Illumina HumanHap550 array. We have also compared them to 3212 worldwide reference samples, including Finns, northern Germans, British and Russians, based on the more than 29,000 SNPs that overlap between the Illumina and Affymetrix 250K Sty arrays. The Swedes--especially southern Swedes--were genetically close to the Germans and British, while their genetic distance to Finns was substantially longer. The overall structure within Sweden appeared clinal, and the substructure in the southern and middle parts was subtle. In contrast, the northern part of Sweden, Norrland, exhibited pronounced genetic differences both within the area and relative to the rest of the country. These distinctive genetic features of Norrland probably result mainly from isolation by distance and genetic drift caused by low population density. The internal structure within Sweden (F(ST)?=?0.0005 between provinces) was stronger than that in many Central European populations, although smaller than what has been observed for instance in Finland; importantly, it is of the magnitude that may hamper association studies with a moderate number of markers if cases and controls are not properly matched geographically. Overall, our results underline the potential of genome-wide data in analyzing substructure in populations that might otherwise appear relatively homogeneous, such as the Swedes.
Cites: Ann Hum Genet. 2008 May;72(Pt 3):337-4818294359
Cites: Science. 2008 Feb 22;319(5866):1100-418292342
Cites: BMC Genet. 2008;9:5418713460
Cites: PLoS One. 2008;3(10):e351918949038
Cites: Nature. 2008 Nov 6;456(7218):98-10118758442
We tested for associations between single nucleotide polymorphisms (SNPs) in five candidate genes allied with the growth hormone axis and the age-specific growth rate of Arctic charr (Salvelinus alpinus L.: Salmonidae). Two large full sib families (N=217 and 95) were created by backcrossing males that were hybrids between two phenotypically divergent populations from Labrador, Canada and from Nauyuk Lake, Canada to females that were from Nauyuk Lake. Measures of individual growth rate (wet weight and fork length) were made three times during a 420-day period after the juveniles were transferred from 4 to 11 degrees C. We then identified SNP markers in 10 proposed candidate genes known to be related to the growth hormone axis. Comparative alignments of amino-acid sequences and nucleotide sequences from other fish species were used to design PCR primers that would amplify 0.5-3 kb DNA regions of the candidate genes. All the individuals in the two backcross families were genotyped for these SNP markers using either polymerase chain reaction-restriction fragment length polymorphisms (PCR-RFLP) or bidirectional amplification of specific alleles (Bi-PASA) approaches. A significant association between a particular SNP allele and early growth was found for the locus containing the growth hormone-releasing hormone and pituitary adenylate cyclase-activating polypeptide genes (GHRH/PACAP2, P=0.00001). We argue that using comparative sequence information to design PCR primers for candidate genes is an efficient method for locating quantitative triat loci in nonmodel organisms.
The results of genome-wide association studies of complex traits, such as life span or age at onset of chronic disease, suggest that such traits are typically affected by a large number of small-effect alleles. Individually such alleles have little predictive values, therefore they were usually excluded from further analyses. The results of our study strongly suggest that the alleles with small individual effects on longevity may jointly influence life span so that the resulting influence can be both substantial and significant. We show that this joint influence can be described by a relatively simple "genetic dose - phenotypic response" relationship.
Pedigrees contain information about the genealogical relationships among individuals and are of fundamental importance in many areas of genetic studies. However, pedigrees are often unknown and must be inferred from genetic data. Despite the importance of pedigree inference, existing methods are limited to inferring only close relationships or analyzing a small number of individuals or loci. We present a simulated annealing method for estimating pedigrees in large samples of otherwise seemingly unrelated individuals using genome-wide SNP data. The method supports complex pedigree structures such as polygamous families, multi-generational families, and pedigrees in which many of the member individuals are missing. Computational speed is greatly enhanced by the use of a composite likelihood function which approximates the full likelihood. We validate our method on simulated data and show that it can infer distant relatives more accurately than existing methods. Furthermore, we illustrate the utility of the method on a sample of Greenlandic Inuit.
Genes with a possible role for the development of the insulin resistance syndrome (IRS) were scanned for novel single-nucleotide polymorphisms (SNPs) using bioinformatics.
GenBank mRNA sequences were compared to the human EST database using gapped BLAST, software that is available on the internet. Mismatches between the search and the EST sequences indicated potential SNPs. Thirty-two SNPs in 13 genes were randomly chosen for experimental verification. PCR and direct sequencing were used to determine the 'true' SNPs. A random sample of 30 Swedish men with slightly elevated diastolic blood pressure (85-94 mmHg) obtained from a population-based study was selected for the sequencing. After completion of these stages, the potential SNPs were checked against the large and rapidly expanding SNP databases HGBASE and NCBI.
EST searches of 146 genes revealed 106 potential SNPs in 44 genes. Experimental analysis of 32 of these potential SNPs verified two SNPs; endothelin receptor A 1471 G/C (3' UTR) and PAI-1 Trp514Arg from a T/C exchange. These two SNPs were also identified in the NCBI and HGBASE databases together with two polymorphisms that were not experimentally identified in our homogeneous Swedish population. Overall, the HGBASE and NCBI databases contained entries of 22% (23 out of 106) of the SNPs identified through our EST searches.
In the search for genetic variations causing complex diseases like IRS in homogeneous populations (such as the Swedish one used here), important information can be obtained through bioinformatic searches of human genome databases and experimental verification.
By analyzing more next-generation sequencing data, researchers have affirmed that rare genetic variants are widespread among populations and likely play an important role in complex phenotypes. Recently, a handful of statistical models have been developed to analyze rare variant (RV) association in different study designs. However, due to the scarce occurrence of minor alleles in data, appropriate statistical methods for detecting RV interaction effects are still difficult to develop. We propose a hierarchical Bayesian latent variable collapsing method (BLVCM), which circumvents the obstacles by parameterizing the signals of RVs with latent variables in a Bayesian framework and is parameterized for twin data. The BLVCM can tackle nonassociated variants, allow both protective and deleterious effects, capture SNP-SNP synergistic effect, provide estimates for the gene level and individual SNP contributions, and can be applied to both independent and various twin designs. We assessed the statistical properties of the BLVCM using simulated data, and found that it achieved better performance in terms of power for interaction effect detection compared to the Granvil and the SKAT. As proof of practical application, the BLVCM was then applied to a twin study analysis of more than 20,000 gene regions to identify significant RVs associated with low-density lipoprotein cholesterol level. The results show that some of the findings are consistent with previous studies, and we identified some novel gene regions with significant SNP-SNP synergistic effects.
The consensus approach to genome-wide association studies (GWAS) has been to assign equal prior probability of association to all sequence variants tested. However, some sequence variants, such as loss-of-function and missense variants, are more likely than others to affect protein function and are therefore more likely to be causative. Using data from whole-genome sequencing of 2,636 Icelanders and the association results for 96 quantitative and 123 binary phenotypes, we estimated the enrichment of association signals by sequence annotation. We propose a weighted Bonferroni adjustment that controls for the family-wise error rate (FWER), using as weights the enrichment of sequence annotations among association signals. We show that this weighted adjustment increases the power to detect association over the standard Bonferroni correction. We use the enrichment of associations by sequence annotation we have estimated in Iceland to derive significance thresholds for other populations with different numbers and combinations of sequence variants.
Alpha 1-antitrypsin (A1AT) deficiency, one of the most common inborn errors of metabolism in Caucasians, is characterized by a low serum concentration of A1AT and a high risk of pulmonary emphysema and liver disease. The allelic frequency for the most common protease inhibitor (PI) Z mutation in the SERPINA1 gene is 2-5% in Caucasians of European descent. The objective of our study was to estimate the PI Z mutation age using molecular analysis in Latvian and Swedish populations, which have the highest frequency of PI Z mutation. DNA samples of heterozygous and homozygous PI Z allele carriers from Latvia (n = 21) and Sweden (n = 65) were analysed; 113 unrelated healthy donors from Latvia were used as a control group. MALDI-TOF analysis was performed on all samples. Pairwise Fst was computed to compare the PI Z mutation ages between the two populations and controls. A p value less than 0.05 was considered significant. Analysis of non-recombinant SNPs revealed that the PI Z mutation age was 2902 years in Latvia (SD 1983) and 2362 years in Sweden (SD 1614) which correlates with previous studies based on microsatellite analysis.