OWL research at the University of Manchester

Joint research by members of the Information Management Group and the Bio-Health Informatics Group.

Simple SNOMED Module Extraction

Notes as of January 2011

The OWL-ME Module Extractor is a GUI interface to the standard module extraction facilities in the OWL 3 API adapted to the format of the UMLS Core Problem List Subset. However, it can also be used to exract a “module” for any arbitrary ontology and signature (including SNOMED signatures). The program itself can be found here.

We are expecting to provide more convenient forms of this tool and to integrate it with Protege 4.1, so that we hope this page will become redundant. Alternatively, this is all open source, so please feel free to improve on it, but please acknowledge the author and do contribute your work back.

However, for the time being, here is the information you need.

Basic information

Obtaining the SNOMED CT Files

The A package operates on the OWL conversion of the SNOMED Stated Form.  The Stated Form and a Perl script to convert it to OWL are provided in the standard SNOMED distribution under “OtherResources”.  How you access the SNOMED release depends on where you live and your national and academic status.  If you don’t already have the resources, contact the SNOMED organisation, the IHTSDO.

Before you start you must have a copy of the stated form converted to OWL.  The Perl script contains instructions for running it with the other files provided.

What is a “module” and a “signature”

A “signature” is a set of OWL class and individual names.  Since SNOMED contains no individuals, for SNOMED it is simply a set of SNOMED concept IDs.

A “module” is a subset of SNOMED that contains everything in SNOMED relevant to the classes in the signature – i.e. all the axioms necessary to classify them as they would be classified in the complete SNOMED OWL ontology.

Therefore, you can investigate classes in the module, investigate anomalies, modify their definitions, and be confident the results will be the same if applied in all of SNOMED. Since the module is likely to be at least an order of magnitude smaller than SNOMED, this can be a major advantage and speed up classification by a corresponding amount.

For details on the theory and mechanisms for module extraction see:
Grau BC, Horrocks I, Kazakov Y, Sattler U. Modular reuse of ontologies: Theory and practice. Journal of Artificial Intelligence Research. 2008;31:273-318.

File Formats

The program asks for the SNOMED stated form file in OWL format, which is simply the file produced by the Perl script provided by IHTSDO.

It also asks for the signature file.  The basic format is that provided by UMLS, which may be used as is.

However, if you want to create your own signature file,  the file need only be a list of SNOMED IDs, one per line, with a bar before the line terminator, with a first line header which will be ignored e.g.

SNOMED_CID|SNOMED_FSN|SNOMED_CONCEPT_STATUS|UMLS_CUI|OCCURRENCE|USAGE|FIRST_IN_SUBSET|IS_RETIRED_FROM_SUBSET|LAST_IN_SUBSET|REPLACED_BY_SNOMED_CI|
38341003|
10725009|
48146000|
69909000|
59621000|
429198000|

Running the Program

The Program directory comes with brief instrutions.  It assumes Antler to assemble the Java.  We recommend you put all files in the same directory and then just type “ant”.  This should bring up a GUI window which allows you to browse for the various files required.

Supplements as of January 2011:

Slightly modified version of Protege 4.0

As of January 2011, SNOROCKET only runs with Protege 4.0, which, unlike Protege 4.1, puts up lists of classes in the hierarchy in random order.  A slightly modified version of Protege 4.0 that keeps lists of classes in alphabetic order is available here. Note that this is an unsupported version distributed entirely as is without any guarantees.  Protege is subject to the license arrangements listed at http://protege.stanford.edu.

SNOROCKET is distributed separately.  To use SNOROCKET with Protege 4.0, you must download the SNOROCKET plugin from http://aehrc.com/hie/snorocket.htmland unzip the plugin into the Protege plugins folder.

Extracting a SNOMED signature using Protege 4

The following works for the common case where you want to extract a signature consisting of all of the subclasses of a given class.  There are several alternatives, including the use of OPPL, but this is relatively simple.  It can easily be adapted to other cases.

You must install the query view if you have not already done so.  To do so, go to View->Misc views->Query.  When the cursor changes to a dot, position it in the center or some quadrant of an existing pane and click.  The pane will be installed.  If it is on top of another pane a tab will be placed at the top of the pane to select which subpane is required.

Once the query view is installed:

  • Load the SNOMED stated form file into Protege
  • Classify it using any of the classifiers.  If in doubt using FaCT++ or Pellet, although they will take slightly longer.
  • In the query view type a fully specified name enclosed in single quotes, for example ‘Hypertensive disorder, systemic arterial (disorder)’.  You must include the opening and closing single quotes.  The easiest way is to use the autocomplete – tab or ctrl-space (cmd-space on a Mac).
  • Tick the subclass and descendants boxes in the query response.
  • Execute the query
  • Go to the Proege->preferences->renderer and select “Render using URI fragment”.  This will cause all results to be shown in the form “SCT_concept_id”.
  • Select all of the query results and copy them
  • Paste them into the text editor of your choice.  Use the editor, Perl, or similar  to remove the “SCT_” prefixes and to add a “|” at the end of every line.
  • Copy the header line from the example above (or leave the first line blank).
  • Save the result with the suffix .txt.  This file should now work in the program as a signature file.

Yes, we know it is clumsy, but it works.  A smoother solution coming “Real Soon Now” – or join in the open source Protege community and produce one yourself and share it with us all.