There is a good understanding of the functional impact of protein-coding variants owing to historical studies of Mendelian disorders, the predictable consequences of amino acid changes and the recent availability of exome sequencing data. However, protein-coding regions represent less than 2% of the total genome, and relatively little is known about the functional consequence of variation in the remaining 98% of the genome. The non-protein-coding sequences of the genome have been annotated through the ENCODE project, which relies on identification of biochemically active elements in the human genome, with attention paid to regulatory elements that control gene activity.
Understanding the significance of genetic variants in the noncoding genome is emerging as the next challenge in human genomics. This paper published on Nature Genetics website used the power of 11,257 whole-genome sequences and 16,384 heptamers (7-nt motifs) to build a map of sequence constraint for the human species.
DirectorsTalk caught up with Alexandre Akoulitchev the CEO of Oxford BioDynamics plc (LON:OBD) for his thoughts on this paper:
“ When it comes to sequencing and analysis of human genome, Craig Ventor, the founder of Celera Genomics, J.Craig Ventor Institute, Human Longevity Inc., and Time’s magazine 207 and 2008 one of the 100 most influential people in the world, hardly needs an introduction.
In this latest milestone paper in Nature Genetics, a collaboration between Craig Ventor’s team with Scripps Institute has put forward results of an elegant systemic analysis, followed by a striking conclusion. The researchers tackled the ever elusive relationship between genetic variants associated with pathologies, genome regulation and manifestation of clinical conditions. The puzzling paradigm is well know – the majority of over 16, 000 common genetic variants known today are located in non-coding region of the genome, outside of protein-coding regions which only occupy 2% of the genome.
In this study, analysis of over 11,000 human genomes for genetic variability, selective constrains and patterns of conservation, reveal that genes essential for manifestation of pathological phenotypes display controls through highly constrained proximal and distal cis-elements within non-coding regions. Those elements show over 52-fold enrichments of disease associated genetic variants in the top 1% of the most constrained regulatory non-coding elements. And to make it even more exciting, they show consistency with the pattern of three-dimensional genome organisation and topologically associating domaines (TAD). This has been done with the traditional low resolution Hi-C and promoter associated Hi-C chromosome conformation capture studies.
One of the conclusions drawn by the researchers is that sequence based search for the regulatory targets associated with clinical outcomes should be moving beyond the current trends of exome sequencing and focus on non-coding constrained part for the genome. Importantly, the regulatory role of genome architecture has been noted, clear and loud, in the context of this exciting study.
In the past several years the field of genome architecture and chromosome conformation signatures has been accumulating many individual examples of rare genetic variants mapping to the anchoring sites of regulatory chromosome conformation – a fundamental link between genetic and epigenetic regulation over the same genetic loci. This study brings forward a clear systemic evidence for such cases.
With Oxford BioDynamics Plc EpiSwitch technology consistently building its successful stratifications of patients on the basis of over 1 million identified sites of regulatory conditional chromosome conformations, this paper is very timely. It discusses and reinforces the same fundamental premise – understanding of the framework of constrained regulatory non-coding sites, associated with genetic variants and genome architecture, is the shortest way to a build successful stratifications of pathological manifestations and clinical outcomes.”