Foto

Contact

Prof. Dr.-Ing. Laura Dietz

Table of Contents

Bio

I am assistant professor at the department of Computer Science at UNH with a focus on text-based machine learning and information retrieval.

Previously I was a Post-doctoral Research Scientist at the Data and Web Science Group of Mannheim University, working with Prof. Simone Paolo Ponzetto. Before that I was a Research Scientist at the Center for Intelligent Information Retrieval (CIIR) working with Bruce Croft at University of Massachusetts. Before that I did a post-doc with Andrew McCallum. I graduated from Max Planck Insititute for Informatics in Saarbruecken, Germany in January 2011.

Teaching

Fall 2017: “CS 753/853 Topics/Information Retrieval”

Spring 2017: “CS 980.02 Adv Top/Data Science for Knowledge Graphs and Text”

Activities

I am reviewing for different venues ranging from information retrieval (SIGIR, CIKM, ICTIR), natural language processing (ACL, ACL, KDD, EMNLP, NAACL), machine learning (ICML, NIPS, UAI), and data mining (KDD, CIKM).

I am a founding co-chair of the Women in IR network of female researchers in information retrieval. - Sign up for the next meeting at SIGIR 2016 here

Since 2015, I am organizer of the SIGIR Student Party.

Research Interests

I am interested in Text Retrieval, Extraction, Machine Learning, and Analysis (TREMA).

I also cover research areas Biomedical NLP/IR, Question Answering and Answer-Passage Retrieval, Topic Models for Graph Structured Data

My research is placed in the intersection between Information Retrieval and Information Extraction, where I am striving towards a deep integration rather than a pipelined combination. My tool of choice are graphical models, often generative probabilistic models. This pattern underlies all the different facets of my research, where some are detailed in the following:

Queripidia: Automatic Wikipedia Construction

Together with Simone Paolo Ponzetto and Michael Schuhmacher, we are working on methods to automatically, and in a query-driven manner, retrieve materials from the Web and compose Wikipedia-like articles. Especially for information needs, where the user has very little prior expert knowledge about, the web search paradigm of 10 blue hyperlinks is not sufficient. Instead we envision to provide a synthesis of the Web materials that strives to mimick the comprehensiveness of Wikipedia articles. We limit ourselves to a content-only setting where query-log, click, or session information is not available. Consequently, we aim to maximize the utility of information retrieval models in combination with methods from natural language processing. A particular emphasis is to utilize information from structured knowledge resources such as Wikipedia, Freebase, or DBpedia together with text-based reasoning on general document and Web corpora.

An early feasibility study was presented at AKBC 2014, a later demo presented at the ESAIR workshop at CIKM 2015 (demo). The method paper for the demo is under submission (information available on request).

Closely related work on reranking entities for web queries was presented at CIKM 2015 (appendix).

The project was awarded with an Amazon AWS in education research grant and a stipdend by the Eliteprogramm for Postdoktorandinnen und Postdoktoranden of the Baden-Württemberg Stiftung.

Entity-based Enrichment for Document Retrieval

Together with Jeff Dalton, I am studying how to effectively leverage Knowledge Bases such as Wikipedia and Freebase in ad hoc document retrieval. In a first step, documents and queries are enriched with links to the knowledge base. During the retrieval stage, these links can be used as an additional vocabulary as well as in feedback-based query expansions. For instance entities that are linked from the query are expected to also be linked in relevant documents. However, we may compensate for errors in the entity linking stage by also considering terms from the entities’ article text, as well as name variants. An additional option are feedback methods, where documents retrieved in a preliminary pass are inspected for entity links to update the belief on which entities are relevant for the query. We also use the feedback documents to build an entity-context model to understand how each entity is related to the query.

This work was presented at SIGIR 2014.

Knowledge Sketches

Assuming the existence of a large corpus and a large general purpose knowledge base, we want to support a user to explore a question in terms three facets: entities, pertinent relationships and relevant text passages. We devise a solution that reasons about distributions over entities, relations, and documents in a unified manner. For instance, we can arrive at a prior distribution over entities by issueing a query against the knowledge base. The distribution over entities helps to identify relevant document passages. Applying Bayes-rule, we can update the distribution over entities, given retrieved document passages. This is formalized in a generative model, which includes factors comprising probabilistic retrieval models.

This work was presented at AKBC 2013.

Entity Linking

Entity linking refers to a problem setting where the algorithm is given a string in a document and has to predict which Wikipedia entity it refers to. Our solution involved a retrieval model that incorporates the string itself, and surrounding entity mentions to predict entity candidates as a ranking. We show that this model is an approximation to state-of-the-art models which optimize a joint assignment of mentions to entities. This solution can be further refined with supervised re-rankers but also provides reasonable performance “out-of-the-box”.

We participate with this solution in TAC KBP 2012 and TAC KBP 2013 (talkposter). Also see our publication at OAIR 2012 (general-talktech-talk).

The code is available as part of the KB-Bridge project.

Entity Tracking and Retrieval

In order to monitor a stream of news and social documents for stories involving one or more target entities. We tap on symmetric relationships in our Entity Linking approach both retrieve relevant documents (KB to text) and entity link them (text to KB) with the same underlying model. This requires to integrate low-level NLP algorithms into a retrieval framework.

We participate with this solution in TREC KBA 2012 and TREC KBA 2013. A paper on time-aware IR-based evaluation is published at [TAIA 2013] (streameval/index.html). The time-aware evaluation methods are used to analyze our KBA 2013 results with results presented at in our 2013 talk at TREC.

… and more …

I further work on “senti-PRF”, a pseudo relevance feedback approach to optimize retrieval for opinionated questions. Published at CIKM 2013.

Relatedly, I am interested in “vague” Question Answering, such questions asking for opinions, advice, or research questions. Here I work both with general-purpose data sets and bio-medical question-answering.

I am still interested in unsupervised algorithms for identifying shared aspects and quantifying influence in social networks. Work on symmetric networks is published at ICWSM 2012 ( Code & Supplement ) and asymmetric networks at ICML 2007 (talkSupplement).

Other work revolved around localizing bugs in software, published at NIPS 2009. (supplementproject page)

Further, I am working on a scalable MCMC inference framework “bayes-stack”, available on GitHub.

My PhD thesis was mainly focused on topic models and other generative models for data with link structure.

Students

PhD Students

Master Students

Serving on PhD Committees

Organized Shared Tasks

Laura Dietz, Ben Gamari, Manisha Verma, Prasenjit Mitra, Nick Craswell. TREC Complex Answer Retrieval at the Text REtrieval Conference. 2016–2018. - www - dataset - Mailinglist - TREC homepage

Organized Workshops, Keynotes, and Tutorials

Women in IR

Women in IR Mailinglist

Selected Publications & Talks

Recent Positions

March 2015 - present: Post-doctoral Research Scientist at Data and Web Science Group (DWS), Mannheim University (DWS, Simone Paolo Ponzetto)

August 2012 - March 2015: Research Scientist at Center for Intelligent Information Retrieval (CIIR), University of Massachusetts (CIIR, Bruce Croft)

October 2010 - August 2012: Post-doctoral researcher at University of Massachusetts (IESL, Andrew McCallum).

January 2008 - January 2011: PhD Student at Max-Planck-Institute for Informatics (Databases and Information Systems, Prof Gerhard Weikum), Saarbruecken

January 2007 - December 2008: PhD Student at Max-Planck-Institute for Informatics (Machine Learning, Prof. Tobias Scheffer), Saarbruecken

October 2006 - December 2006: PhD Scholarship at Knowledge Management Group (Prof. Tobias Scheffer), Humboldt University, Berlin

December 2002 - September 2006: Research Associate at Concert Division and I-Info Division, Fraunhofer Institute for Publication and Information Systems (IPSI), Darmstadt

Open Source Releases

Hobbies