Disambiguated Distributional
Semantic-based Sense Inventories

homeresourcessoftwarepublicationscontacts

Highlights


2017/August:
  • new online WSD application is coming soon.
  • substantial project page updates.
 

Project description


Disambiguated Distributional Semantic-based Sense Inventories are hybrid knowledge bases that combines the contextual information of distributional models with the conciseness and precision of manually constructed lexical networks. In contrast to dense vector representations, our resource is human readable and interpretable (Table 1), and can be easily embedded within the Semantic Web ecosystem. Manual evaluation based on human judgments indicates the high quality of the resource, as well as the benefits of enriching top-down lexical knowledge resources with bottom-up distributional information from text.

 

Our approach consists of three main phases (Figure 1):

  • 1) Learning a JoBimText model: initially, we automatically create a sense inventory from a large text collection using the pipeline of the JoBimText project;
  • 2) Disambiguation of related words: we fully disambiguate all lexical information associated with a proto-concept (i.e. similar terms and hypernyms) based on the partial disambiguation from step 1). The result is a proto-conceptualization (PCZ). In contrast to a term-based distributional thesaurus (DT), a PCZ consists of sense-disambiguated entries, i.e. all terms have a sense identifier (Table 1);
  • 3) Linking to a lexical resource: we align the PCZ with an existing lexical resource (LR). That is, we create a mapping between the two sense inventories and then combine them into a new extended sense inventory, our hybrid aligned resource. Finally, to obtain a truly unified resource, we link the "orphan" PCZ senses for which no corresponding sense could be found by inferring their type in the LR.

Table 1: Examples of entries for "mouse:NN" and "keyboard:NN". Trailing numbers indicate sense identifiers. Similarity and context clue scores are not shown for brevity.

 

Figure 1: Overview of our method for constructing a hybrid aligned resource.



DFG project JOIN-T