Ontology Alignment Evaluation Initiative - OAEI-2012 Campaign

Results for OAEI 2012 - Library Track

The following content is (mainly) based on the final version of the library section in the OAEI results paper.
If you notice any kind of error (wrong numbers, incorrect information on a matching system) do not hesitate to contact us.

Reference Alignment

The reference alignment we used for the evaluation is now available. Download Reference Alignment


Libraries play an important role in the linked data web, and they widely agree that linked data technologies are ideal to integrate the data of libraries around the world and to foster the collaboration on cataloguing among the libraries. Library data does not only consist of the vast amount of cataloguing data, but especially -- and probably more interesting for other communities -- also of authority data, i.e., normed descriptions of locations, events, persons, corporate bodies, and subject concepts. The subject concepts are usually organized in more or less hierarchical knowledge organization systems, together with semantic relations between the concepts. A thesaurus is such a knowledge organization system that is used for indexing purposes and that provides quasi-synonymous, describing labels for each concept. Thesauri are sometimes referred to as lightweight ontologies, however, we will see that this definition can be misleading.

Thesauri, and authority data in general, have a long history in libraries and are actively used and maintained by information professionals and domain experts. Due to their high quality and their long-term development, they could function as a "backbone of the Semantic Web".

Most thesauri are domain-dependent and specialized to be used within a certain field, e.g., to index publications with an economical focus. During previous experiments, we examined the topical overlap between the two thesauri used in this challenge: TheSoz (social sciences) and STW (economics). They share not only a lot of concepts, there is also a manually created alignment that can be used as reference. Many thesauri exist that cover the same or overlapping domains, often in different languages. Multilingual thesauri are an important means to bridge the gap between catalogs in different languages, so that users can search for relevant literature using their own language. Another possibility is the creation of links between concepts across different thesauri, possibly in different languages. Such alignments -- or correspondences or cross-concordances -- can be exploited to mutually add further information to both thesauri and subsequently improve the retrieval. Therefore, for many, selected thesauri exist alignments that are manually created by domain experts. Nevertheless, the automatic identification of alignments is strongly desired, mainly due to two reasons: First, the manual creation of alignments between all existing thesauri is not feasible, so additional alignments have to be created, possibly by exploiting existing alignments (e.g., their transitivity). Second, automatically created alignments can be used to improve and enhance existing alignments, after approval by a domain expert. This is necessary, as most existing alignments are not complete and even if they are supposed to be complete, they have to be maintained just like the thesauri themselves, i.e., a constant effort is required to keep them up-to-date.

This library track is a new track within OAEI. However, there has already been a library track from 2007 to 2009 using different thesauri, as well as other thesaurus tracks like the food track and the environment track. A common motivation is that these tracks use a real-world scenario, i.e., real thesauri. For us, it is still a motivation to develop a better understanding, how thesauri differ from ontologies and how these differences affect state-of-the-art ontology matchers. We hope that the community accepts the challenge and that subsequently significant improvements can be seen that push the quality of automatic alignments between thesauri. Furthermore, we will use the matching results as input for the maintainers of the reference alignment to improve the alignment. While a full manual evaluation of all matching results is certainly not feasible, this way we constantly improve the reference alignment and mitigate possible weaknesses and incompleteness.

Test data

The library track uses two real-world thesauri, that are in many aspects comparable. They have roughly the same size, are both originally developed in German, are today both multilingual, both have English translations, and, most important, despite being from two different domains, they have huge overlapping areas. Not least, both are freely available in RDF using SKOS.


The STW Thesaurus for Economics provides vocabulary on any economic subject: more than 6,000 standardized subject headings (skos:Concepts, with preferred labels in English and German) and 19,000 additional keywords (skos:altLabels) in both languages. The vocabulary was developed for indexing purposes in libraries and economic research institutions and includes technical terms used in law, sociology, or politics, and geographic names. The entries are richly interconnected by 16,000 skos:broader/narrower and 10,000 skos:related relations. An additional hierarchy of main categories provides a high level overview. The vocabulary is maintained on a regular basis by ZBW German National Library of Economics - Leibniz Centre for Economics and has been translated into SKOS.


The Thesaurus for the Social Sciences (TheSoz) serves as a crucial instrument for indexing documents and research information in the social sciences. It contains overall about 12,000 keywords, from which 8,000 are standardized subject headings (in English and German) and 4,000 additional keywords. The thesaurus covers all topics and sub-disciplines of the social sciences. Additionally terms from associated and related disciplines are included in order to support an accurate and adequate indexing process of interdisciplinary, practical-oriented and multi-cultural documents. The thesaurus is owned and maintained by GESIS- Leibniz Institute for the Social Sciences and is available in SKOS.

Reference Alignment

A mapping between STW and TheSoz already exists and has been manually created by domain experts in the KoMoHe project \cite{Mayr2008}. However, it does not cover the changes and enhancements in both thesauri since 2006. It is available in SKOS with the different matching types SKOS:exactMatch, SKOS:broaderMatch and SKOS:narrwowerMatch. Within the reference alignment, concepts of one thesaurus are aligned to more than one concept of the second thesaurus. Thus, we face a \textit{n:m} mapping of the concepts. All in all, 4,285 TheSoz concepts and 2,320 STW concepts are aligned with 2,839 exact matches, 34 broader matches and 1,416 narrower matches. It is important to note that the reference alignment only contains alignments between the descriptors of both thesauri, i.e., the concepts that are actually used for document indexing. The upper part of the hierarchy consists of non-descriptor concepts (or categories) that are only used to organize the descriptors below them. We take this specialty into account as we only assess the generated alignments between descriptors and ignore alignments between non-descriptors. However, this might change in the future, as the results of this track could be used to extend the reference alignment to the upper part of the hierarchy.


Ontology matching systems taking part in the OAEI only work on OWL ontologies and are not (yet) ready to deal with the specialties of a thesaurus. To get first results and to lower the barrier of taking part in this challenge, we provide OWL versions of the thesauri, generated as follows:

skos:concept ➔ owl:class
skos:prefLabel, skos:altLabel ➔ rdfs:label
skos:scopeNote, skos:notation ➔ rdfs:comment
skos:narrower ➔ rdfs:superClassOf
skos:broader ➔ rdfs:subClassOf
skos:related ➔ rdfs:seeAlso
This transformation obviously is not loss-less. First and foremost, within the ontology, it is not recognizable which label is the preferred one and which ones are alternative labels. Since matching systems mostly have to focus on the labels, this transformation might lead to suboptimal results. There are, however, more fundamental differences between ontologies and thesauri that we show in the next section.


Thesauri -- and other, similar knowledge structures like classifications or taxonomies -- are often called lightweight ontologies. However, ontologies and thesauri fundamentally differ. This is also reflected by the fact that with SKOS a specific model for thesauri exists that is formulated in OWL. There, a skos:Concept is not an owl:Class. Concepts sometimes represent classes, for example the STW concept Commodities. However, this is not true for every skos:Concept, e.g., the STW concept Germany is an instance, not a class. Having a look at the subordinate concepts of Commodities, they mostly indeed represent classes, like Metals -- Metal Products -- Razor. Nevertheless, the relation in SKOS between these concepts is skos:broader, not rdfs:subClassOf. A subclass relationship states that if a class B is a subclass of a class A, then all instances of B will also be instances of A. Here, all metals are commodities, but not all metal products are metals: the razor consists partly of metal, but it is no metal. Thesauri are created for a very specific purpose and are used in a predetermined way. This is inter alia reflected by the distinction of descriptors and non-descriptors. Only descriptors are assigned to publications during the indexation or classification. All non-descriptors serve as additional information to provide the correct context or to build up a proper hierarchy. Such a distinction typically does not exist in an ontology. Very difficult for ontology matchers (not necessarily only automatic ones) is the quasi-synonymy of the describing labels for a concept. A skos:altLabel is often used to indicate subconcepts that should be subsumed under the concept in question to avoid extensive subclassing. As an example, the STW descriptor 14117-2 with the preferred English label Tropical fruit has German alternative labels like pineapple, avocado, and kiwi. In an (OWL) ontology, these alternative labels should be modeled as instances of the class Tropical fruit. In contrast, other alternative labels might really indicate alternative, synonymous terms for the preferred label. At last, instead of arbitrary semantic relations that are part of an ontology, in thesauri, relations like skos:related or compoundEquivalence in TheSoz exist. They often contain information for the (manual) use of the tehsaurus for indexing, i.e., which descriptor should be used in which case or how combinations of descriptors are to be used. Transferring them to ontological relations is not always possible and depends often on the single case. It can be seen that the development of a thesaurus matcher is indeed a challenge that differs from ontology matching. Nevertheless, the commonalities between thesauri and ontologies are large enough to pave the way for further developments by means of current ontology matchers.

Experimental Setting

To compare the created alignments with the reference alignment, we use the Alignment API. For this first evaluation, we only included equivalence relations (skos:exactMatch).

The generated alignments are available here.

All matching processes have been performed on a Debian machine with one 2.4GHz core and 7GB RAM allocated to each system. The evaluation has been executed by using SEALS technologies. For ServOMap, ServOMapLt and Optima, we used slightly adapted ontologies as input since they cannot handle URIs with the last part only consisting of numbers as it is the case in the official version. Each participating system uses the OWL version. We computed precision, recall and F-measure (beta=1) for each matcher. We only consider equivalence correspondences between two descriptors as non-descriptors are not included in the reference alignment. This filtering improves the precision (~8%) as well as the F-measure (~4%) for all systems. Moreover, we measured the runtime, the size of the created alignment and checked whether a 1:1 alignment has been created. To assess the results of the matchers, we developed three straightforward matching strategies, using the original SKOS version of the thesauri:


All systems listed in the table above are sorted according to their F-measure values. Altogether 13 of the 21 submitted matching systems were able to create an alignment. Three matching systems (MaasMatch, MEDLEY, Wmatch) did not finish within the time frame of one week while five threw an exception.

Of all these systems, GOMMA performs best in terms of F-measure, closely followed by ServOMapLt and LogMap. However, the precision and recall measures vary a lot across the top three systems. Depending on the application, an alignment either achieving high precision or recall is preferable. If recall is in the focus, the alignment created by GOMMA is probably the best choice with a recall about 90%. Other systems generate alignments with higher precision, e.g. ServOMap with over 70% precision, while mostly having significantly lower recall values (except for Hertuda).

From the results obtained by the matching strategies taking the different types of labels into account, we can see that a matching based on preferred labels only, outperforms other matching strategies. MatcherPref achieves the highest F-measure in these tests. The results of MatcherPrefDE and MatcherPrefEN provide an insight into the language characteristics of both thesauri and the reference alignments. MatcherPrefDE achieves the highest precision value (nearly 90%), albeit with a recall of only 60%. Both thesauri as well as the reference alignment have been developed in Germany and focus on German terms. From the results of MatcherPrefEN, we can see the difference: precision and especially recall significantly decrease when only the preferred English labels are used. On the one hand, only about 80% of the found correspondences are correct and on the other hand, less than a half of all correspondences can be found this way. This can be a disadvantage for systems that use NLP techniques on English labels or rely on language-specific background knowledge like WordNet.

The high precision values of the pref matchers reflect the fact that the preferred labels are chosen specifically to unambiguously identify the concepts. Our interpretation is that the English translations are partly not as precise as the original German terms (drop in precision) and not consistent regarding the English terminology (drop in recall).

In contrast, the MatcherAllLabels achieves a quite high recall (90%) but a rather low precision (54%). This means that most but not all of the corresondences can be found by only having a look at equivalent labels. However, when following this idea, nearly a half of the found correspondences are incorrect. The rather high F-measure of MatcherAllLabels is therefore misleading, as at least if the results would be used unchecked in an retrieval system, a higher precision would clearly be preferred over a higher recall. In this respect, matchers like ServOMap show better results. In any case, it can be seen that a matching system using the original SKOS version could achieve a better result. The information loss when converting SKOS to OWL really matters.

Concerning the runtime, LogMap as well as ServOMap are quite fast with a runtime below 50 seconds. These values are comparable or even better (LogMapLt) than both strategies computing the equivalence between preferred labels. Thus, they are very effective in matching large ontologies while achieving very good results. Other matchers take several hours or even days and do not produce better alignments in terms of F-measure. By computing the correlation between F-measure and runtime, we notice a slightly negative correlation (-0.085) but the small amount of samples is not sufficient to make a significant statement. However, we can say for certain that a longer runtime does not necessarily lead to better results.

We further observe that the n:m reference alignment affects the results because some matching systems (ServOMap, WeSeE, HotMatch, CODI, MapSSS) only create 1:1 alignments and discard correspondences with entities that already occur in another correspondence. Whenever a system creates a lot of \textit{n:m} correspondences, e.g., Hertuda and GOMMA, the recall significantly increases. This difference becomes clear when comparing ServOMapLt and ServOMap. Both systems mostly base on the same methods but ServOMapLt does not use the 1:1 filtering. Consequently, the recall increases and the precision decreases.

Since the reference alignment has not been updated for about six years, it does not contain updates of both thesauri. Thus, new correct correspondences might be found by matching systems but they are indicated as incorrect because they are not included in the reference alignment. Therefore, we applied a manual evaluation to check whether the matching systems found correct correspondences which are not included in the reference alignment at all. In turn, these information can help to improve the reference alignment.

The manual evaluation has been conducted by domain experts. All newly detected correspondences, which have not been contained into the reference alignment yet, have been considered. Because exact matches have to be 1:1 relationships, only those correspondences have been examined, whose terms are descriptors and not yet involved into an existing correspondence. The other correspondences are considered as wrong as they contain a term, to or from which already a correspondence exists.

Since all matching systems delivered correspondences representing exact matches, they have been judged in this specific regard. That means that correspondences have been considered as wrong for now, whose terms cannot be seen as equivalent but maybe as related, broader or narrower.

The matchers detected between 38 and 251 correspondences, which have not been in the reference alignment before. This includes especially terms, which hold a strong syntactical similarity or equivalence. But, some matching systems even detected difficult correspondences, e.g., between the German label for "automated production" ("Automatische Produktion") and "CAM", which has been identified by their associated non-preferred labels. Furthermore, correspondences of geographical terms have been detected, but some of the matchers have not been able to distinguish between the terms for citizens of a country, their language or the country itself, although these differences can be derived from the structure of the thesauri.

But, the manual evaluation exposed several issues, which can either be explained by the typical behavior of matching systems or by domain-specific differences inside the thesauri. There are similar terms inside TheSoz and STW, which are used in totally different contexts, e.g. the term "self-assessment". Even when considering the structure of both thesauri these differences are difficult to identify. In general, term similarities often led to wrong correspondences, which is not surprising at first. But, in turn syntactically equal terms have not been detected simultaneously in some cases. By now, we did not have the possibility to evaluate the matching systems with the improved reference alignment but we plan to perform this additional evaluation soon.


It is the first time this track takes place, so we cannot compare the results with previous ones. As it is also the first time for the matching systems participating in this track, they do not have any experience with the data. This has to be kept in mind if the results are compared to other tracks.

Nevertheless, the newly detected correspondences determine already a useful result for the maintainers of the two thesauri. The correct correspondences can be added to the existing reference alignment, which is already applied in information portals for supporting search term recommendation and query expansion services among differently indexed databases. As all matching systems delivered exact matches for the correspondences, some of the wrong correspondences will be examined again in the future, whether other relationships like broader, narrower or related matches can be considered for those.

We expect further improvements, if the matchers are tailored more specifically to the library track, i.e., if they exploit the information found in the original SKOS version. A promising approach is also the use of additional knowledge, e.g., instance data -- resources that are indexed with different thesauri.

This time, we collected the results of the matchers as a first survey and compared them to our simple string-matching strategy that takes advantage of the different types of labels. In future evaluations, we assume that better results can be achieved and that these strategies simply form a baseline.


We would like to thank Andreas Oskar Kempf from GESIS for the manual evaluation of the new detected correspondences.



Dominique Ritze (Mannheim University Library) dominique[.]ritze[at]bib[.]uni-mannheim[.]de
Kai Eckert (Mannheim University Library)
Benjamin Zapilko (GESIS)
Joachim Neubert (ZBW)

Original page: http://web.informatik.uni-mannheim.de/oaei-library/2012/