Workshop themes
Mutation Databases and Metadata: Design, Content, Accuracy
Over 400 mutation databases have been produced in the past (determined via ‘google’ search). Many are no longer maintained and cover very specific data sets. In total, these repositories have been designed to support a wide range of features including listings of SNPs, point mutations, insertions, deletions, and observed phenotypes. Furthermore they incorporate a wide range of modified protein features and metrics in the accompanying annotations to the mutation descriptions. In the main these databases are manually curated however mutation annotations are frequently inaccurate e.g. in the PDB, inaccurate to the degree of 40 % of all PDB records. In addition to assessing content and coverage issues this session will explore issues related to storage and representation of mutations information showcasing a spectrum of mutation repositories types from traditional databases to RDF triple stores semantic knowledgebases and mutation ontologies.
back to top.
Extraction of mutations and annotations from literature
AI techniques such as text mining and natural language processing have been used in BioNLP to enable the extraction and grounding of named entities (mutations, protein, organisms) and impact annotations (protein properties, directions and scale of impact) from the mutation literature, with high levels of precision and recall, albeit prototype in scale. To facilitate their adoption it is necessary to measure the accuracy, recreation and update of existing mutation databases as we as their incorporation into semi manual annotation pipelines - the next milestone. In addition there is continuing discussion over the appropriate metrics for individual tasks within these systems which requires community involvement. This emergent technology now needs standardization. For the workshop we will solicit presentations, posters and demos of NLP tools, evaluations of mutation pipelines, mutation ontology population, and invite suggestions for a database reconstruction challenge to illustrate state of the art performance.
back to top.
Impacts of Mutations: Prediction and Bootstrapping
The ability to predict the impact of a mutation or the consequence of a sequence variant is central to the diagnosis of genetic diseases. Non-synonymous mutations may impact translational regulation, mRNA stability, mRNA splicing and rates of translation. Proteins affected by nsSNPs may have altered; catalytic sites, stability, ability to aggregate, and or post-translational modifications. Moving from SNP to sequence to structure and function has been addressed with varying degrees of accuracy with sequence and structure based (molecular mechanism, empirical energy function or machine learning) methods. Applying such techniques at a genome scale requires that robust approaches are identified, benchmarked with standard metrics in order to assign valid significance to ns mutations. Reuse of existing mutation databases and text extracted data for training prediction algorithms and checking quality of predictions is pivotal.
back to top.
Mutation Data Integration and Reuse
For scientists to make rapid advances in our understanding of living systems our infrastructures and techniques for knowledge translation are insufficient. Hypothesis generation based on the reuse of extracted information and in-silico predictions remains a distant capability for most scientists. Furthermore building the derived insights of mutational studies into robust models of a specific biological domain also seems far off. A multi level approach to biology must be accompanied by integrated infrastructures build from a diverse toolset. Integration with information from different systems will require the adoption of rich metadata for semantic knowledge integration, such as provided by existing phenotype ontologies and ontologies specific to impacts, sequence rearrangements and in vitro methodologies to construct mutants. For integration of bioinformatics data, discoverable semantic web services and workflows for mutation integration are emerging paradigms and this session will host examples of reusable mutation extraction and data integration workflows. Semantic assistant clients facilitating real time mutation annotation integration to desktop applications e.g. when browsing pubmed abstracts will be also be showcased.