Datasets

Triframes: Unsupervised Semantic Frame Induction using Triclustering (ACL 2018)

The datasets produced during the experiments in the Triframes paper are available: https://github.com/uhh-lt/triframes/releases.

DepCC, a Web-Scale Dependency-Parsed Corpus from CommonCrawl (LREC 2018)

The files are distributed currently from a server at the University of Hamburg: http://ltdata1.informatik.uni-hamburg.de/depcc/.

LOaDing: Framster Extension to Distributional-Based Senses (LREC 2018)

We release a dataset for the sense inventory ddt-wiki-n30-1400k and for each Framester linking methodologies to BabelNet sense inventory, i.e., Base, DirectX, FrameBase, Fprofile, TansX and XWFN: ddt-wiki-n30-1400k-loading.zip.

Format Specification

Preamble
@prefix joint: <http://joint.uni-mannheim.de/SENSE-INVENTORY-NAME/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix lemon: <http://lemon-model.net/lemon#> .
@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> . 
@prefix frame: <http://www.ontologydesignpatterns.org/ont/framenet/abox/frame/> . 

joint:RelatedSenses a rdf:Seq .
joint:hasRelatedSenses a owl:ObjectProperty .
Frame Entry
frame:Reading_aloud a owl:Class .
frame:Reading_aloud joint:hasRelatedSenses joint:Reading_aloud__rs .
joint:Reading_aloud__rs rdf:_1 joint:Reading_aloud__rs_1 .
joint:Reading_aloud__rs_1 
   skos:related joint:jread__NN_0 ;
   skos:notation "12.0"^^joint:absoluteConfidence .
joint:Reading_aloud__rs rdf:_2 joint:Reading_aloud__rs_2 .
joint:Reading_aloud__rs_2 
   skos:related joint:jRead__NP_0 ;
   skos:notation "2.0"^^joint:absoluteConfidence .

Watset Synsets (ACL 2017)

The datasets produced during the experiments in the Watset paper are available: watset-acl2017.tar.xz.

Sense Inventories (ISWC 2016)

Here you can download the resulting sense inventories after steps: Learning a JoBimText Model and Disambiguation of Related Words (see the JOIN-T 1 section at the About page).

We provide a Turtle/Lemon specification for four different extraction of sense inventories from: Gigaword namely a 100 million sentence news corpus (news) from LCC, and with a 35 million sentence Wikipedia corpus (wiki). For convinience we splitted the specifications into separate datasets for nouns and verbs, features such as context clues and links to WordNet and BabelNet are also provided in separated files.

Also, we release the Wikipedia corpus, consisting of 35 million sentences, we used for the induction of our wiki sense inventories, provide the disambiguation evaluation of the wiki-p.1.6 JoBimText model on 17 words, and suggest to visit the JoBimViz demo for inspecting the dependency representations used in our experiments ( please remember to select the Stanford (English) model).

DDTs (Disambiguatied Distributional Thesauri)

Format Specification
Preamble
# The preamble for sense inventories
@prefix joint: <http://joint.uni-mannheim.de/SENSE-INVENTORY-NAME/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix lemon: <http://lemon-model.net/lemon#> .
@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> . 

# definition of the classes and properties
joint:Sense a owl:Class .
joint:RelatedSenses a rdf:Seq .
joint:HypernymySenses a rdf:Seq .
joint:hasRelatedSenses a owl:ObjectProperty .
joint:hasHypernymySenses a owl:ObjectProperty .
Sense Inventory Entry
# sense "mouse#NN#0" specification
joint:jmouse__NN
    a    lemon:LexicalEntry ; 
    lemon:canonicalForm   <http://joint.uni-mannheim.de/SENSE-INVENTORY-NAME/rdf/jshowing__NN/canonicalForm> ;
    lemon:language  "EN" ; 
    lexinfo:partOfSpeech lexinfo:noun ;
    rdfs:label "showing"@en .
# related senses specification for "mouse#NN#0" 

joint:jmouse__NN_0__rel rdf:_1 joint:jmouse__NN_0__rel_1 .
joint:jmouse__NN_0__rel_1 
   skos:related joint:jrat__NN_0 ;
   skos:notation "1.000"^^joint:relatedness .

joint:jmouse__NN_0__rel rdf:_2 joint:jmouse__NN_0__rel_2 .
joint:jmouse__NN_0__rel_2 
   skos:related joint:janimal__NN_0 ;
   skos:notation "0.796"^^joint:relatedness .
...

# hypernymy senses specification for "mouse#NN#0" 

joint:jmouse__NN_0__hyp rdf:_1 joint:jmouse__NN_0__hyp_1 .
joint:jmouse__NN_0__hyp_1 
   skos:broader joint:janimal__NN_0 ;
   skos:notation "1.000"^^joint:relatedness .
joint:jmouse__NN_0__hyp rdf:_2 joint:jmouse__NN_0__hyp_2 .
joint:jmouse__NN_0__hyp_2 
   skos:broader joint:jspecies__NN_1 ;
   skos:notation "0.161"^^joint:relatedness .

...

Context Clues

Format Specification
Preamble
# The preamble for sense inventories
@prefix : <http://joint.uni-mannheim.de/SENSE-INVENTORY-NAME/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

:hasCC a owl:DatatypeProperty .
Context Clue Entry
# sense "mouse#NN#0" context clues specification

:jmouse__NN_0  a owl:Class .
:jmouse__NN_0 :hasCC :jmouse__NN_0_cc .
:jmouse__NN_0_cc rdf:_1 :jmouse__NN_0_cc_1.
:jmouse__NN_0_cc_1
   rdfs:label "encode__VB__-prep_in" ;
   rdf:value 14453.82 .
:jmouse__NN_0_cc rdf:_2 :jmouse__NN_0_cc_2.
:jmouse__NN_0_cc_2
   rdfs:label "mollusk__NN__appos" ;
   rdf:value 9552.70 .
:jmouse__NN_0_cc rdf:_3 :jmouse__NN_0_cc_3.
:jmouse__NN_0_cc_3
   rdfs:label "passerine__NN__nn" ;
   rdf:value 2576.45 .
...
Format Specification
Preamble
# The preamble for sense inventories
@prefix joint: <http://joint.uni-mannheim.de/SENSE-INVENTORY-NAME/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix lemon: <http://lemon-model.net/lemon#> .
@prefix bn:    <http://babelnet.org/rdf/> . 
# sense "mouse#NN#0" and "mouse#NN#1" links to BabelNet

joint:jmouse__NN_0 skos:related bn:s00056119n .
joint:jmouse__NN_1 skos:related bn:s00021487n .

...

License

All the datasets are licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license unless stated otherwise.