This page informs about the OAEI 2011.5 results obtained for the MultiFarm dataset. If you notice any kind of error (wrong numbers, incorrect information on a matching system, etc.) do not hesitate to contact us (mail see below in the last paragraph on this page).
The alignments generated during our evaluation experiments can be found here.
For this first comprehensive evaluation based on the MultiFarm dataset, we use a subset of the whole dataset. In this subset (a) we omitted the ontologies edas and ekaw; (b) we suppressed the test cases where Russian and Chinese are involved. The reason for this is that most participating systems are still based on using no specific multilingual technique, that might still work (to some limited degree) on matching German on English, but will fail when matching, for instance, French to Russian or Chinese. Moreover, in order to provide some blind future evaluation, we do not provide the translations on edas and ekaw ontologies in this campaign.
Within the MultiFarm dataset it can be distinguished between two types of test cases: (i) those test cases where two different ontologies have been translated in different languages; and (ii) those test cases where the same ontology has been translated in different languages. The main motivation to construct MultiFarm has been the goal to design a rich set of such test cases.
Test cases of type (ii) are those test cases where the same ontology has to be matched. Good results for these test cases might not depend on multilingual methods, but on the ability to exploit the fact that both ontologies have an identical structure and that the reference alignment covers all entities described in the ontologies. It can be supposed that these test cases are dominated by specific techniques designed for matching different versions of the same ontology.
To our knowledge three participating systems use specific multilingual methods. These systems are WeSeE, AUTOMSv2 and YAM++. The other systems are not specifically designed to match ontologies in different languages, nor do they make use of a component that can be utilized for that purpose. WeSeE-Match and and YAM++ use Microsoft Bing to translate labels contained in the input ontologies to English. The translated English ontologies are then matched using standard matching procedures of WeSeE and YAM++. AUTOMSv2 re-uses a free Java API named WebTranslator to translate the ontologies to English. This process is performed before AUTOMSv2 profiling, configuration and matching methods are executed, so their input will consider only English-labeled copies of ontologies.
You can download the complete set of all generated alignments. These alignments have been generated by executing the tools with the help of the SEALS infrastructure. All results presented in the following are based on analyzing these alignments.
First of all we aggregated the results for all test cases of both types (i) and (ii). The results are listed in the following table. The systems not listed in this table have generated empty alignments for the test cases of type (i), or have thrown some exceptions.
|Different ontologies (type i)||Same ontologies (type ii)|
First of all, significant differences between results measured for test cases (i) and (ii) can be observed. While the three systems that implement specific multilingual techniques clearly generate the best results for test cases of type (i), only one of these systems is among the top systems for type (ii) test cases. This subset is dominated by the systems YAM++, CODI, and MapSSS. Note that these systems have generated very good results in OAEI 2011 on the benchmark track. On the one hand, there is a strong correlation between the ranking in Benchmark and the ranking for MultiFarm test cases of type (ii), while there is, on the other hand, no (or only a very weak) correlation between results for test cases of type (i) and type (ii). For that reason, we analyze in the following only the results for test cases of type (i) . In particular, we also do not include them in the representation of aggregated results.
So far we can conclude that specific methods work much better than state-of-the-art techniques applied to MultiFarm test cases. This is a result that we expected. However, the absolute results are still not very good, if compared to the top results of the Conference dataset (~ 0.65 F-measure). From all specific multilingual methods, the techniques implemented in YAM++ generate the best alignments in terms of F-measure. YAM++ is followed by AUTOMSv2 and WeSeE. It is also an interesting outcome to see that CIDER can generate clearly the best results compared to all other systems with non-specific multilingual systems.
The table below shows the results aggregated for the combination of each matcher and language pair. We depict the F-measure as well as precision and recall, respectively (in smaller font).
As expected and already reported above, the systems that apply specific strategies to deal with multilingual ontology labels outperform the other systems: YAM++, followed by AUTOMS and WeSeE, respectively, outperforms all other systems. For these three systems, looking for each pair of languages, the best five F-measures are obtained for en-fr (0.61), cz-en (0.58), cz-fr (0.57), en-pt (0.56), and en-nl/cz-pt/fr-pt (0.55). Apart the ontology structure differences, most of these language pairs do not have overlapping vocabularies (cz-pt or cz-fr, for instance). Hence, the translation step has an impact on the conciliation of the differences between languages. However, as expected, it is not the only impact factor, considering that YAM++ and WeSeE as based on the same translator an YAM++ outperforms WeSeE for most of the pairs. Looking for the average of these three systems, we have the following pairs ranking: en-fr (0.47), en-pt (0.46), en-nl (0.44), de-en (0.43) and fr-pt (0.40), with English as a common language (due to the matchers strategy of translation).
For the other group of systems, CIDER is ahead the others, providing the best scores: de-en (0.33), es-pt (0.30), es-fr (0.29), de-es (0.28) and en-es (0.25). MapSSS, LogMap, and CODI are the followers. For all of these four systems, the pairs es-pt and de-en are ahead in their sets of best F-measures. These two pairs contain languages whose vocabularies share similar terms. Once most of the systems take advantage of label similarities it is likely that it may be harder to find correspondences between cz-pt than es-pt. However, for some systems their five best score include these kind of pairs (cz-pt, for CODI and LogMapLt or de-es for LogMap).
Finally, looking for the average of all systems, the best scores are again for de-en (0.29) and es-pt (0.26) pairs. Concluding, we cannot neglect certain language features (like their overlapping vocabularies) in the matching process. In our evaluation, the average best F-measures where incidentally observed for the pairs of languages that have some degree of overlap in their vocabularies (de-en, fr-pt, es-pt). This is somehow expected, however, we could find exceptions to this behavior. In fact, MultiFarm requires systems exploiting more sophisticated matching strategies than label similarity and for many ontologies in MultiFarm it is the case. It has to be further analysed with a deep analysis of the individual pairs of ontologies. Furthermore, the way the MultiFarm ontologies have been translated by the different human expert may have an impact in the compliance of the translations according to the original ontologies.
 Christian Meilicke, Raul Garcia-Castro, Fred Freitas, Willem Robert van Hage, Elena Montiel-Ponsoda, Ryan Ribeiro de Azevedo, Heiner Stuckenschmidt, Ondrej Svab-Zamazal, Vojtech Svatek, Andrei Tamilin, Cassia Trojahn, Shenghui Wang. MultiFarm: A Benchmark for Multilingual Ontology Matching. Accepted for publication at the Journal of Web Semantics.
An authors version of the paper can be found at the MultiFarm homepage, where the dataset is described in details.
29.04.2012: The developer of YAM++ contacted us and informed us that YAM++ and YAM are two different systems. YAM++ is an ontology matcher developed by Ngo Duy Hoa, whereas, YAM was developed to XML schema matching by Fabien Duchateau. We had used "YAM" instead of "YAM++" at several places. We have meanwhile corrected this.
This track is organized by Christian Meilicke and Cassia Trojahn dos Santos. If you have any problems working with the ontologies, any questions or suggestions, feel free to write an email to christian [at] informatik [.] uni-mannheim [.] de or cassia [.] trojahn [at] inria [.] fr.