Ontology Alignment Evaluation Initiative - OAEI-2013 Campaign



Results for OAEI 2013 - Library Track



The following content is (mainly) based on the final version of the library section in the OAEI results paper.
If you notice any kind of error (wrong numbers, incorrect information on a matching system) do not hesitate to contact us.

Reference Alignment

The reference alignment we used for the evaluation is now available. Download Reference Alignment

Description

Libraries play an important role in the linked data web, and they widely agree that linked data technologies are ideal to integrate the data of libraries around the world and to foster the collaboration on cataloguing among the libraries. Library data does not only consist of the vast amount of cataloguing data, but especially -- and probably more interesting for other communities -- also of authority data, i.e., normed descriptions of locations, events, persons, corporate bodies, and subject concepts. The subject concepts are usually organized in more or less hierarchical knowledge organization systems, together with semantic relations between the concepts. A thesaurus is such a knowledge organization system that is used for indexing purposes and that provides quasi-synonymous, describing labels for each concept. Thesauri are sometimes referred to as lightweight ontologies, however, we will see that this definition can be misleading.

Thesauri, and authority data in general, have a long history in libraries and are actively used and maintained by information professionals and domain experts. Due to their high quality and their long-term development, they could function as a "backbone of the Semantic Web".

Most thesauri are domain-dependent and specialized to be used within a certain field, e.g., to index publications with an economical focus. During previous experiments, we examined the topical overlap between the two thesauri used in this challenge: TheSoz (social sciences) and STW (economics). They share not only a lot of concepts, there is also a manually created alignment that can be used as reference. Many thesauri exist that cover the same or overlapping domains, often in different languages. Multilingual thesauri are an important means to bridge the gap between catalogs in different languages, so that users can search for relevant literature using their own language. Another possibility is the creation of links between concepts across different thesauri, possibly in different languages. Such alignments -- or correspondences or cross-concordances -- can be exploited to mutually add further information to both thesauri and subsequently improve the retrieval. Therefore, for many, selected thesauri exist alignments that are manually created by domain experts. Nevertheless, the automatic identification of alignments is strongly desired, mainly due to two reasons: First, the manual creation of alignments between all existing thesauri is not feasible, so additional alignments have to be created, possibly by exploiting existing alignments (e.g., their transitivity). Second, automatically created alignments can be used to improve and enhance existing alignments, after approval by a domain expert. This is necessary, as most existing alignments are not complete and even if they are supposed to be complete, they have to be maintained just like the thesauri themselves, i.e., a constant effort is required to keep them up-to-date.

This library track is a new track within OAEI. However, there has already been a library track from 2007 to 2009 using different thesauri, as well as other thesaurus tracks like the food track and the environment track. A common motivation is that these tracks use a real-world scenario, i.e., real thesauri. For us, it is still a motivation to develop a better understanding, how thesauri differ from ontologies and how these differences affect state-of-the-art ontology matchers. We hope that the community accepts the challenge and that subsequently significant improvements can be seen that push the quality of automatic alignments between thesauri. Furthermore, we will use the matching results as input for the maintainers of the reference alignment to improve the alignment. While a full manual evaluation of all matching results is certainly not feasible, this way we constantly improve the reference alignment and mitigate possible weaknesses and incompleteness.

Test data

The library track uses two real-world thesauri, that are in many aspects comparable. They have roughly the same size, are both originally developed in German, are today both multilingual, both have English translations, and, most important, despite being from two different domains, they have huge overlapping areas. Not least, both are freely available in RDF using SKOS.

STW

The STW Thesaurus for Economics provides vocabulary on any economic subject: more than 6,000 standardized subject headings (skos:Concepts, with preferred labels in English and German) and 19,000 additional keywords (skos:altLabels) in both languages. The vocabulary was developed for indexing purposes in libraries and economic research institutions and includes technical terms used in law, sociology, or politics, and geographic names. The entries are richly interconnected by 16,000 skos:broader/narrower and 10,000 skos:related relations. An additional hierarchy of main categories provides a high level overview. The vocabulary is maintained on a regular basis by ZBW German National Library of Economics - Leibniz Centre for Economics and has been translated into SKOS.

TheSoz

The Thesaurus for the Social Sciences (TheSoz) serves as a crucial instrument for indexing documents and research information in the social sciences. It contains overall about 12,000 keywords, from which 8,000 are standardized subject headings (in English and German) and 4,000 additional keywords. The thesaurus covers all topics and sub-disciplines of the social sciences. Additionally terms from associated and related disciplines are included in order to support an accurate and adequate indexing process of interdisciplinary, practical-oriented and multi-cultural documents. The thesaurus is owned and maintained by GESIS- Leibniz Institute for the Social Sciences and is available in SKOS.

Reference Alignment

A mapping between STW and TheSoz already exists and has been manually created by domain experts in the KoMoHe project \cite{Mayr2008}. However, it does not cover the changes and enhancements in both thesauri since 2006. It is available in SKOS with the different matching types SKOS:exactMatch, SKOS:broaderMatch and SKOS:narrwowerMatch. Within the reference alignment, concepts of one thesaurus are aligned to more than one concept of the second thesaurus. Thus, we face a \textit{n:m} mapping of the concepts. All in all, 4,285 TheSoz concepts and 2,320 STW concepts are aligned with 2,839 exact matches, 34 broader matches and 1,416 narrower matches. It is important to note that the reference alignment only contains alignments between the descriptors of both thesauri, i.e., the concepts that are actually used for document indexing. The upper part of the hierarchy consists of non-descriptor concepts (or categories) that are only used to organize the descriptors below them. We take this specialty into account as we only assess the generated alignments between descriptors and ignore alignments between non-descriptors. However, this might change in the future, as the results of this track could be used to extend the reference alignment to the upper part of the hierarchy.

Transformation

Ontology matching systems taking part in the OAEI only work on OWL ontologies and are not (yet) ready to deal with the specialties of a thesaurus. To get first results and to lower the barrier of taking part in this challenge, we provide OWL versions of the thesauri, generated as follows:

skos:concept ➔ owl:class
skos:prefLabel, skos:altLabel ➔ rdfs:label
skos:scopeNote, skos:notation ➔ rdfs:comment
skos:narrower ➔ rdfs:superClassOf
skos:broader ➔ rdfs:subClassOf
skos:related ➔ rdfs:seeAlso
This transformation obviously is not loss-less. First and foremost, within the ontology, it is not recognizable which label is the preferred one and which ones are alternative labels. Since matching systems mostly have to focus on the labels, this transformation might lead to suboptimal results. There are, however, more fundamental differences between ontologies and thesauri that we show in the next section.

SKOS vs. OWL

Thesauri -- and other, similar knowledge structures like classifications or taxonomies -- are often called lightweight ontologies. However, ontologies and thesauri fundamentally differ. This is also reflected by the fact that with SKOS a specific model for thesauri exists that is formulated in OWL. There, a skos:Concept is not an owl:Class. Concepts sometimes represent classes, for example the STW concept Commodities. However, this is not true for every skos:Concept, e.g., the STW concept Germany is an instance, not a class. Having a look at the subordinate concepts of Commodities, they mostly indeed represent classes, like Metals -- Metal Products -- Razor. Nevertheless, the relation in SKOS between these concepts is skos:broader, not rdfs:subClassOf. A subclass relationship states that if a class B is a subclass of a class A, then all instances of B will also be instances of A. Here, all metals are commodities, but not all metal products are metals: the razor consists partly of metal, but it is no metal. Thesauri are created for a very specific purpose and are used in a predetermined way. This is inter alia reflected by the distinction of descriptors and non-descriptors. Only descriptors are assigned to publications during the indexation or classification. All non-descriptors serve as additional information to provide the correct context or to build up a proper hierarchy. Such a distinction typically does not exist in an ontology. Very difficult for ontology matchers (not necessarily only automatic ones) is the quasi-synonymy of the describing labels for a concept. A skos:altLabel is often used to indicate subconcepts that should be subsumed under the concept in question to avoid extensive subclassing. As an example, the STW descriptor 14117-2 with the preferred English label Tropical fruit has German alternative labels like pineapple, avocado, and kiwi. In an (OWL) ontology, these alternative labels should be modeled as instances of the class Tropical fruit. In contrast, other alternative labels might really indicate alternative, synonymous terms for the preferred label. At last, instead of arbitrary semantic relations that are part of an ontology, in thesauri, relations like skos:related or compoundEquivalence in TheSoz exist. They often contain information for the (manual) use of the tehsaurus for indexing, i.e., which descriptor should be used in which case or how combinations of descriptors are to be used. Transferring them to ontological relations is not always possible and depends often on the single case. It can be seen that the development of a thesaurus matcher is indeed a challenge that differs from ontology matching. Nevertheless, the commonalities between thesauri and ontologies are large enough to pave the way for further developments by means of current ontology matchers.

Experimental Setting

To compare the created alignments with the reference alignment, we use the Alignment API. We only included equivalence relations (skos:exactMatch).

The generated alignments are available here.

All matching processes have been performed on a Debian machine with one 2.4GHz core and 7GB RAM allocated to each system. The evaluation has been executed by using SEALS technologies. Each participating system uses the OWL version. We computed precision, recall and F-measure (beta=1) for each matcher. Moreover, we measured the runtime, the size of the created alignment and checked whether a 1:1 alignment has been created. To assess the results of the matchers, we developed three straight-forward matching strategies, using the original SKOS version of the thesauri:

Results

Of all 21 participating matchers (or variants), 12 were able to generate an alignment within 12 hours. CroMatcher, MaasMatch, RiMOM2013, WeSeE and WikiMatch did not finish in the time frame, OntoK had heap space problems and CiderCL, MapSSS and Synthesis threw an exception. The results can be found in table above.

The best systems in terms of F-measure are ODGOMS and YAM++. These matchers also have a higher F-measure than MatcherPref. ServOMap and AML are below this baseline but better than MatcherPrefDE and MatcherAllLabels. A group of matchers including LogMap, LogMapLite, HerTUDA and HotMatch are above the MatcherPrefEN baseline. Compared to last year evaluation with the updated reference alignment, the matchers clearly improved: In 2012, no matcher was able to beat MatcherPref and MatcherPrefDE, only ServOMapLt was better than MatcherAllLabels. Today, two matchers outperformed all baselines; further two matchers outperformed all baselines but MatcherPref. This is remarkable, as the matchers are still not able to consume SKOS and therefore neglect the distinction between preferred and alternative labels. The baselines are tailored for very high precision by design, while the matchers usually have a higher recall. This is reflected in the F-measure, where the highest value increased from 0.723 to 0.758 by almost 5 percent since last year. The recall mostly increased, e.g. YAM++ from 0.758 to 0.808 (without affecting the precision negatively, which also increased from 0.680 to 0.692).

Like in the previous year, an additional intellectual evaluation of the alignments established automatically was done by a domain expert to further improve the reference alignment. Unsurprisingly, the matching tools predominantly detected matches based on the character string. This included the term alone as well as the term's context. Especially in the case of short terms, this could easily lead to wrong correspondences (e.g "tea" != "tea"; "sheep" != "sleep"). Except for its sequence of letters the term's context was not taken into account. This sole attention to the character string was a main source of error in cases in which on the term as well as on the context level similar terminological entities appeared (e.g. "Green revolution" subject category: "Development Polic" != "permanent revolution" subject category: "Political Developments and Processes"). Additionally, identical components of a compound frequently lead to incorrect correspondences (e.g. "prohibition of interest" != "prohibition of the use of force"). Moreover, terms in different domains might look similar, but in fact have very different meanings. An illustrative example is "Chicago Antitrust Theory" != "Chicago School", where indeed the same Chicago is referenced, but without any effect on the (dis-)similarity of both concepts.

Conclusion

he overall improvement of the performance is encouraging in this challenge. While it might not look impressive to beat simple baselines as ours at first sight, it is actually a notable achievement. The baselines are not only tailored for very high precision, benefitting from the fact that in many cases a consistent terminology is used, they also exploit additional knowledge about the labels. The matchers are general-purpose matchers that have to perform well in all challenges of the OAEI. Nonetheless, we are still waiting for matchers who understand SKOS in order to make use of the many concept hierarchies provided on the Web. Generally, matchers still rely too much on the character string of the labels and the labels of the concepts in the immediate vicinity. During the intellectual evaluation process, it became obvious that a multitude of incorrect matches could be prevented if the subject categories, respectively the thesauri’s classification schemes would we matched beforehand. In many cases, misleading candidate correspondences could be discarded by taking these higher levels of the hierarchy into account. It could be prevented, for example, to build up correspondences between personal names and subject headings. A thesaurus, however, is not a classification system. The disjointness of two subthesauri is therefore not easy to establish, let alone to detect by automatic means. Nonetheless, thesauri oftentimes have their own classification schemes which partly follow classification principles. We believe that further exploiting this context knowledge could be worthwhile.

Organizers:

Dominique Ritze (Research Gorup Data and Web Science, University of Mannheim) dominique[.][at]informatik[.]uni-mannheim[.]de
Kai Eckert (Research Gorup Data and Web Science, University of Mannheim)
Benjamin Zapilko(GESIS)
Andreas Oskar Kempf (GESIS)
Joachim Neubert (ZBW)

Original page: http://web.informatik.uni-mannheim.de/oaei-library/2013/