Anatomy

The Anatomy track of the 2008 campaign consists of finding alignments (4 specific sub tasks) between the Adult Mouse Anatomy and a part of the NCI Thesaurus (describing the human anatomy). The task is placed in a domain where we find large, carefully designed ontologies that are described in technical terms. Besides their large size and a conceptualization that is only to a limited degree based on the use of natural language, they also differ from other ontologies with respect to the use of specific annotations and roles, e.g. the extensive use of the partOf relation. The manual harmonization of the ontologies leads to a situation, where we have a high number of rather trivial mappings that can be found by simple string comparison techniques. At the same time, we have a good share of non-trivial mappings that require a careful analysis and sometimes also medical background knowledge.

Data sets

Use the following owl-ontologies as input for your matching system. All of the four subtasks are based on matching these ontologies.

If you take a look at some of the concepts you will find out, that the names (ids) of the concepts do not describe the concepts, they are like MA_000016 or like NCI_29394 and there is no relation between the numbers of these ids. That means that you have to go for the labels instead of using names. You will see that there are some similarities between the labels. Notice that the obo-conversion resulted in some oboInOwl:tags that contain additional information that can be interpreted similar to labels.

Subtask #4 is about matching two ontologies based on a partial reference alignment that has been e.g. generated by domain expert. This mapping has to be specified as additional input to your system with respect to task #4. See section 'Modalities' for detailed information on this task.

partial_reference.rdf (NEW, added at 9th of June, 2008)

This alignment contains all 'trivial correspondences' as well as a small subset of non trivial correspondences.

Versions

Due to a difficulties with different versions of ontologies and reference mappings, a minor mistake in the 2007 evaluation caused small deviations with respect to the recall values of the submitted alignments. The evaluation was based on a reference mapping that contained approx. 20 correspondences (of 1544) between non existing concepts. Therefore, recall (and also f-values) presented in the evaluation for all participants has been too low (about 0.01). Since all participants have been affected in the same way (in a very limited degree), all comparisons between different matching systems were meaningful and correct.

In this year the results of 2007 will be compared with the 2008 evaluation to see in how far the systems have been optimized. This evaluation will also include a corrected 2007 evaluation, for all system participating a second time.

Modalities

Subtracks

The anatomy track consists of four subtracks. Substrack #1 is obligatory for all participants of the anatomy track, while subtrack #2, #3, and #4 are optional.

Subtrack #1, #2, and #3 are standard matching tasks with respect to the input (two ontologies to be matched). For all of these subtracks your matching system should generate an alignment between the mouse-anatomy and the human-anatomy that differs with respect to recall and precision. For subtrack #1 the generated alignment should be an optimal solution to the matching problem with respect to both recall and precision as far as possible (= precision and recall are evenly weighted). In the evaluation we will focus on the f-value. You should apply your system with standard parameters or at least with standard parameters for the biomedical domain. For subtrack #2 your matching system should generate a result that is optimized for precisison. Think of an scenario where the result of your system is directly used without verification of a domain expert. For subtrack #3 your matching system should generate a result that is optimized for recall. Think of a scenario where the result of your system is used afterwards as a comprehensive candidate mapping, that will be revised by a domain expert afterwards, by removing the incorrect correspondences manually. Comparing the results of subtrack #1, #2, and #3 will show in how far you system can be adjusted / parameterised for certain requirements.

Subtrack #4 has been added to the anatomy track for the first time. While we expect most systems to solve tasks #1, #2, and #3, we expect only few systems to solve this task. For this subtrack a part of the reference mapping is available as additional input.

#1, #2, #3: Matcher(Mouse, Human) => Mapping
#4: Matcher(Mouse, Human, PartialReferenceMapping) => ExtendedReferenceMapping

Suppose that this part of the reference mapping has been generated by e.g. a group of domain experts. You job is to use the information encoded in the mapping to imrprove the matching process. We believe that the information that certain correspondences are definitely correct can be used in some way within the matching process. In the evaluation we will compare the results of subtrack #1 with the results of #4, in particular we will compare Mapping \ PartialReferenceMapping to ExtendedReferenceMapping \ PartialReferenceMapping to see wether or not the additional information had positive effects.

Research Questions

Within the evaluation we try to focus on the following aspects:

Which system performed best (mainly with respect to #1)?
What about the runtime of your system (mainly with respect to #1)?
Can your system be adjusted for certain requirements (comparing results for #1, #2, and #3)?
Which system is best in finding non trivial correspondences (based on subtask #1 and #3)?
Can your system solve subtrack #4? How strong are the positive effects of exploiting the partial reference mapping?

Participation Conditions

Due to our 2007 experiences we know that certain correspondences in the partial reference mapping are hard to detect by a matching system. If some of these correspondences are part of the submissions for #1, #2 or #3, we will ask the authors of the matching system to explain how these correspondences could be detected by the implemented algorithms. If it cannot be shown or at least be suggested how these correspondences have been generated automatically, we will exclude the system from taking part in the anatomy track!

We will choose a small sample of matching systems, install these systems, perform some of the matching tasks, and reproduce the results. In case your system has been chosen, we expect support to get your system running!

Format of submission

Your submission should contain the following folders and files:

+- anatomy
|  +- 1
|  |  +- participant.rdf
|  |  +- configuration-runtime.txt
|  +- 2
|  |  +- participant.rdf
|  |  +- configuration-runtime.txt
|  +- 3
|  |  +- participant.rdf
|  |  +- configuration-runtime.txt
|  +- 4
|  |  +- participant.rdf
|  |  +- configuration-runtime.txt

The files participant.rdf (replace 'partcipant' by the name of your system) contain the mappings generated by your system. These files have to follow the format described here (standard format for submissions to the OAEI). The files configuration-runtime.txt should contain a few lines describing the parameter setting, as well as the runtime specfied in seconds and a short description of the used machine (CPU + RAM). There is no specific format for these files. If you do not participate in all subtask, do not include the corresponding folders in your submission. Submission are only accepted for a certain subtrack, if the corresponding folders contains both files!

The reference mapping contains only equivalence correspondences between concepts of the ontologies. No correspondences between properties (roles) are specified. If your system also creates correspondences between properties, or correspondences that describe subsumption relations, these results will not influence the evaluation (but can nevertheless be part of your submitted results)

Please submit the files (preliminary and final results) directly to the email address given under 'Contact' (below). Please send the results zipped in a file participant.zip or participant.rar and let the name of your matching systems occur somewhere in the subject heading of the mail.

Schedule

The schedule available at http://oaei.ontologymatching.org/2008/ is obligatory. It contains deadlines for sending preliminary and final results etc.

Acknowledgements

We would like to gratefully thank Martin Ringwald and Terry Hayamizu (Mouse Genome Informatics - http://www.informatics.jax.org/), who provided us with a reference mapping for the matching task of this track.

In addition, we would like to thank all of the participants of the OAEI 07 anatomy track for hints and discussions with respect to the realization and evaluation of the last year.

Contacts

This track is organized by Christian Meilicke and Heiner Stuckenschmidt. If you have any problems working with the ontologies, any questions, or any suggestions related to the anatomy track, feel free to write an email to christian [at] informatik [.] uni-mannheim [.] de.