Medicine has a huge and rapidly growing vocabulary even before you get to the complex scientific names. Is your pain shooting, or stabbing, or throbbing? Is your cough dry, wet, brassy, or barking? There is a huge difference between a “broken leg” which is a “greenstick fracture of the femur” and an “displaced transverse fracture of the patella”. There are dozens of brand names for plain aspirin, including, in some countries, the term “aspirin” itself. (To avoid registered trademarks entirely, you must say “acetylsalicylic acid”.)
Furthermore, there’s lots to know about each term and what you need to know varies by context. For example, acetylsalicylic acid is an analgesic
(pain fighter), and an antipyretic
(fever fighter), and a NSAID
(that is, a non-steroidal anti-inflammatory drug, a kind of inflammation fighter).
There are a lot of concepts in medicine (analgesic
, acetylsalicylic acid
, pain
) and even more terms (that is, “acetaminophen”, “paracetamol”, and “para-acetylaminophen” all are names for the same thing, Tylenol) corresponding to those concepts. When different care providers use different terms for the same thing there is the possibility for miscommunication. The more variation in terminology in health care records, the harder it is to analyze those records (for example, to monitor potential medical errors or to find candidates for clinical trials).
A core challenge in medical informatics is managing the huge, evolving terminologies that permeate all aspects of medicine. Most of these terminologies have a complex hierarchic structure, e.g., breast cancer
is a kind of cancer
is a kind of disease
is a kind of pathology
. The problem is that there are hundreds of thousands of terms in any reasonable terminology with a rats nest of connections between them. And these terminologies grow very fast. For example, The NCI Cancer Thesaurus grew from around 20,000 terms in 2004 to over 50,000 terms today. Each term corresponds to a potentially complex concept of specialized medical knowledge which is related to many more concepts in a variety of ways. Various problems emerge with manual curation of such terminologies: sometimes there are wrong connections between terms, or missing connections, or the text defining the term is out of date, confused, or just garbled. None of these errors are detectable from natural language or from simple, graph based representations of the terminology. These are semantic errors, that is, gaps between what the curators wrote down and what is true.
One way to improve terminology development is to write down the meanings of terms in a language that a program can understand. That way, we can run a program (an automated reasoner) to sanity check what we wrote and to find new connections that a person would recognize if they read all the definitions and didn’t get tired. Such a language is an ontology language and representations of a terminology with the definitions written so a program can reason with them is called an ontology. Ontologies and ontology languages have a rich history in computer science, artificial intelligence, and bio-medical informatics. A popular family of ontology languages are build on so-called description logics which allow people to reasonable express the definition of their concepts while still being amenable to state of the art automated reasoning techniques. Description logics form the basis of the standardized ontology language, OWL and its latest version, OWL 2.
The first version of OWL (the “Web Ontology Language”) was standardized by the W3C in 2004 and quickly became the common default language for ontology development. Key bio-medical ontologies, such as SNOMED-CT and the NCI thesaurus migrated to OWL and to an OWL based toolchain, allowing them to move from proprietory languages and their vender locked in toolchains.
In 2009, the W3C announced the finalization of the next generation of OWL, OWL 2. OWL 2 is based on continuing research from the Universities of Manchester and Oxford into all aspects of ontology engineering. Professor Ian Horrocks, whose early work in reasoning with description logics at Manchester made them a feasible technology, co-chaired the OWL 2 working group.
OWL 2 addresses key expressive and computational limitations of OWL 1. By adding new constructs to the langauge, OWL 2 more directly supports medical applications. For example, so called “role chains” allow ontologists to express the connection between spatial relations and part-whole relations, e.g., that if a fracture is located on
a bone which is part of
a leg, that fracture is a fracture of that leg. General reasoning with such constructs in the presence of other OWL features was an open problem solved by Ian Horrocks and Professor Uli Sattler (of Manchester).