LD4IE 2016 Linked Data for Information Extraction

LD4IE 2016

Linked Data for Information Extraction

ISWC 2016 Workshop, October 18 2016, Kobe, Japan

LD4IE 2016, is the fourth international workshop on Linked Data for Information Extraction, following successful past editions:

The World Wide Web provides access to tens of billions of pages, mostly containing information that is largely unstructured and only intended for human readability. On the other hand, Linked Data provide billions of pieces of information linked together and made available for automated processing. However, there is the lack of interconnection between the information in the Web pages and Linked Data. A number of initiatives, like RDFa (supported by W3C) or Microformats (used by schema.org and supported by major search engines) are trying to enable machines to make sense of the information contained in human readable pages by providing the ability to annotate webpage content with Linked Data.

This creates a large knowledge base of entities and concepts, connected by semantic relations. Such resources can be valuable seed data for IE tasks. Furthermore, the annotated web pages can be considered as training data in the traditional machine learning paradigm.

However, powering Web-scale IE using Linked Data faces major challenges, including discovering relevant learning materials, which is non-trivial due to the heterogeneity of vocabularies, the imbalanced coverage of different domains and the presence of noise, errors, imprecision and spam.

Addressing these challenges requires multi-field collaborative research effort covering various topics such as modeling IE tasks with respect to LD; efficient, large scale, and robust learning algorithms able to scale and cope with noise; measures for assessing learning material quality, and methods for selecting and optimizing training seeds.

**Workshop at a glance**

Invited Talk by Valentina Presutti

LD4IE2016 program

LD4IE2016 proceedings

LD4IE2016 bibtex