MultiFarm Homepage

This page informs about the MultiFarm dataset, a comprehensive dataset for multilingual ontology matching. The dataset can be downloaded and used for any kind of scientific purpose. Its generation and structure is briefly explained on this webpage, more details can be found in the following paper.

Christian Meilicke, Raúl García Castro, Fred Freitas, Willem Robert van Hage, Elena Montiel-Ponsoda, Ryan Ribeiro de Azevedo, Heiner Stuckenschmidt, Ondrej Svab-Zamazal, Vojtech Svatek, Andrei Tamilin, Cássia Trojahn, Shenghui Wang. MultiFarm: A Benchmark for Multilingual Ontology Matching. Web Semantics: Science, Services and Agents on the World Wide Web (15), Elsevier, Amsterdam, 2012.

Download the authors version of the paper

Modifications

The following enumeration describes modifications that have been applied to the dataset after its first publication.

Evaluation campaigns

The dataset has been used in in the following experiments:

It would be nice of you could inform us (contact below) in case you use the dataset in an experimental evaluation.

Translations in raw format

The dataset has been generated by translating the existing OntoFarm dataset. The results of this first step are available in simple structured textfiles and can be downloaded from the following table. Please notice that all files are UTF-8 encoded. Some letters might be incorrectly displayed by your browser, because it does not detect the encoding correctly.

  Spanish German French Russian Portuguese Czech Dutch Chinese
CMT link link link link link link link link
CONFERENCE link link link link link link link link
CONFOF link link link link link link link link
EDAS - - - - - - - -
EKAW - - - - - - - -
IASTED link link link link link link link link
SIGKDD link link link link link link link link

Complete bundle with ontologies and reference alignments

The results of the translation have been used to generate language specific variants of existing ontologies and reference alignment for all pairs of ontologies. These files are bundled in a single zip-file. They can be downloaded and used in any kind of scenario/experiment.

The zip-file is structured as follows:

ont/ 
   cn/
      cmt-cn.owl
      conference-cn.owl
      [for each ontology cmt, conference, confOf, edas, ekaw, iasted, sigkdd]
   cz/ (contains 7 files)
      cmt-cz.owl
      conference-cz.owl
      ...
   de/ (contains 7 files)
      cmt-de.owl
      conference-de.owl
      ...
   [a directory for each language cn, cz, de, en, es, fr, nl, pt, ru]
ref/
   cn-cz/
      cmt-cmt-cn-cz.rdf
      cmt-conference-cn-cz.rdf
      cmt-conference-cz-cn.rdf
      cmt-confOf-cn-cz.rdf
      cmt-confOf-cz-cn.rdf
      ...
      conference-conference-cn-cz.rdf
      ...
      [overall 21*2=42+7*1 files]
   [a directory for each language pair cn-cz, cn-de, ...]

>>> Download the zipped bundle (old version)

>>> Download the zipped bundle (new version, used in OAEI 2012)

SEALS Testsuites

The dataset can also be used via the SEALS platform, where we have prepared and stored a testsuite for each language pair, resulting in 36 testsuites. You need an account for the SEALS platform to search and retrieve them from the test data repository.

>>> Link to the SEALS platform

You can, for example, find the testsuite for the language pair Czech-German if you just type 'cz-de' in the search field of the test data repository.

Involved people

The dataset has been generated by a collaborative initiative of the following people.

Contact

Contact Cassia Trojahn or Christian Meilicke for further information related to this dataset.

Known Bugs and Updates

Some users of the dataset have already detected some small bugs. In the future we will fix these bugs, for the moment we will just list them:

Colophon

The logo at the top of this page is a modified version of a logo often used to refer to the Semantic Web. We have added the chinese signs for 'many' and 'language' to the original logo.