Research Overview
SABOC conducts research related to the structural analysis of biomedical ontologies and terminologies.
Our core research topics include ontology quality assurance methodologies, ontology summarization techniques, ontology change analysis, software tools for browsing and visualizing ontologies and summaries of ontologies, and family-based structural analysis of biomedical ontologies. Our current research is funded by the NCI/NIH (R01 CA190779 "A family-based framework of quality assurance for biomedical ontologies"). Between 2006 and 2011 our research was funded by the NLM/NIH (R01 LM008445 "Partitioning to Support Auditing and Extending the UMLS" and R01 LM008912 "Taxonomies Supporting Orientation, Navigation and Auditing of Terminologies").
1. Ontology Quality Assurance Methodologies
For the last 25 years we have conducted extensive research into structural and lexical techniques for uncovering errors in biomedical ontologies and terminologies. An overarching theme of our research is the automatic identification of sets of concepts that are statistically more likely to have errors. These sets of concepts are identified using various structural (and/or lexical) criteria. We have applied our ontology QA techniques on many important ontologies, including SNOMED CT, NCIt, GO, ChEBI, NDF-RT, Uberon, the UMLS, and Columbia's MED, among others. In our research we have made extensive use of abstraction-network-based ontology quality assurance methodologies, which we will describe below.
2. Abstraction Networks for Summarization and Quality Assurance
An ontology is organized as a complex graph structure, where each node represents a concept and the edges between nodes represent the relationships between concepts. Ontologies tend to be quite large and complex. For example, SNOMED CT currently contains more than 300,000 active concepts and over 1.5 million relationships. Diagrammatic presentations have long been used to display large, complex knowledge structures, including ontologies. However, the size of most ontologies makes understanding and visualizing their structure difficult. To support comprehension of ontology content, and to enable quality assurance of ontologies, we have developed structural summaries called abstraction networks.
What is an Abstraction Network?
We define an abstraction network as an algorithmically-derived summary of an ontology's structure and content. We have developed different types of abstraction networks to summarizes different aspects of ontology structure (e.g., according to its semantic relationships or its hierarchy). An abstraction network consists of a hierarchy of nodes, where each node represents a set of structurally similar concepts.
The general process of creating an abstraction network from an ontology. We have developed different types of abstraction networks that capture different aspects of an ontology's structure. These abstraction networks are applicable to families of structurally similar ontologies. (a) Represents a subhierarchy of concepts (classes) from an ontology. (b) Represents the abstraction network summarizing (a).
Types of Abstraction Networks
Below we list some examples of abstraction networks we have created and applied to the ontologies in the NCBO BioPortal.
- Area Taxonomies and Partial-area Taxonomies: Summarize subhierarchies of concepts that are modeled with the same types of defining relationships (e.g., properties in OWL, relationships in OBO, and attribute relationships in SNOMED CT). Concepts are separated into disjoint sets called areas, which contain of concepts with the exact same set of defining relationships. Areas are separated into partial-areas, which summarize subhierarchies of semantically similar concepts in each area.
- Disjoint Partial-area Taxonomies: When an ontology allows concepts to have multiple parents a given concept may be summarized by multiple partial-areas, according to which subhierarchies it belongs to. A disjoint partial-area taxonomy separates concepts into disjoint sets called disjoint partial-areas, which visually identify points of intersection between partial-areas.
- Tribal Abstraction Network (TAN): Summarize the intersections between user-selected subhierarchies in an ontology. Concepts are partitioned into disjoint sets called bands, which identify the subhierarchy/subhierarchies a set of concepts belongs to. Bands are partitioned into clusters, which identify subhierarchies of concepts that exist at a specific point of intersection between two or more subhierarchies.
- Disjoint Tribal Abstraction Networks: As with partial-areas, a given concept may be summarized by multiple clusters. A Disjoint TAN separates concepts into disjoint units called disjoint clusters, which summarize the points of intersection between clusters in a band.
- Target Abstraction Network (Target AbN): Also called Range AbN (when applied to OWL/OBO ontologies) or Ingredient AbN (when applied to NDF-RT). Summarizes subhierarchies of concepts that serve as targets for defining relationships, along with the source concepts that have relationships pointing to the target concepts.
An example of an area taxonomy and a partial-area taxonomy created from NCIt's Disease, Disorder, or Finding subhierarchy.
How do we use Abstraction Networks?
By far our most extensive use of abstraction networks has been in the context of ontology quality assurance (QA). Fundamentally, an abstraction network serves to capture the essence of an underlying ontology while ignoring its minutiae. The nodes of an abstraction network identify sets of concepts that meet the same structural criteria. Among these concepts, we can identify categorizations that indicate complex or uncommon modelling, relative to other concepts in the ontology. Beyond quality assurance, we have also utilized abstraction networks to support comprehension of ontology content, enable automatic identification of major subject areas in an ontology, summarize and analyze changes to an ontology's structure, and investigate the structural complexity of ontologies.
3. Family-based Analysis and Quality Assurance of Ontologies
In our current research we are investigating quality assurance methodologies that can be applied to families of structurally similar ontologies. Our test bed for this research are the ontologies hosted on the NCBO BioPortal. The development of abstraction-network-based QA methodologies is labor intensive. For each ontology, we have to design a proper abstraction network, a process that involves reviewing the ontology’s structure manually or semi-automatically, and identifying structural elements that can be used to automatically summarize the ontology in a way that is useful for QA. In our family-based project we are developing abstraction networks that can be applied to entire families of structurally similar ontologies in the NCBO BioPortal. When a family of structurally similar ontologies is discovered, we create and abstraction network that is applicable to all of the ontologies in the family. For example, we have created various types of partial-area taxonomy abstraction networks that can be applied to a wide variety of biomedical ontologies.
To identify families of biomedical ontologies in the NCBO BioPortal we developed a methodology of creating structural meta-ontologies to classify ontologies according to their structure. The structural meta-ontology shown above categorizes, at several levels of granularity, a total of 373 BioPortal ontologies according to their use of object properties and data properties.
We are currently conducting several studies to investigate the properties of abstraction networks derived for the BioPortal ontologies. In more recent research we are also analyzing the extend of content reuse among the BioPortal ontologies, along with how content reuse affects the quality of biomedical ontologies.
4. Ontology Change Analysis
Understanding how ontologies change across time is important for identifying, and understanding the cause of, errors and inconsitencies in their content. In our research we have developed several techniques to summarize structural changes in ontologies. Specifically, we have developed diff abstraction networks to capture the global picture of change between two ontology releases. Rather then displaying a list of thousands of changes, diff abstraction networks display a graphical view that highlights important areas of change.
An example of a diff partial-area taxonomy (a type of diff abstraction network) created from two NCIt releases. Changes to the introduction and inheritance of restrictions, according to their property types, are highlighted.
We have also developed a visual semantic delta technique for analyzing changes in SNOMED CT. The visual semantic delta combines diff partial-area taxonomies with a concept-level change analysis technique called a descriptive delta. Within a visual semantic delta, a user obtains a high-level view of change via a diff partial-area taxonomy, and then "drills down" to view specific changes to concepts of interest.
In a visual semantic delta a user can review individual changes to SNOMED CT concepts.
5. Software
A major aspect of our research is the development of software for creating and visualizing abstraction networks. To support our current research grant we developed the Ontology Abstraction Framework (OAF), an open source software system that implements most of the research we describe above.
For more details on our software click here to view our Software page, which describes the OAF.