CrumbTrail algorithm

Stefano Faralli, University of Mannheim, Germany
Irene Finocchi, Sapienza, University of Rome, Italy
Simone Paolo Ponzetto, University of Mannheim, Germany
Paola Velardi, Sapienza, University of Rome, Italy


We present CrumbTrail, an algorithm to clean large and dense knowledge graphs. CrumbTrail removes cycles, out-of-domain nodes and non-essential nodes, i.e., those that can be safely removed without breaking the knowledge graph’s connectivity. It achieves this through a bottom-up topological pruning on the basis of a set of input concepts that, for instance, a user can select in order to identify a domain of interest. Our technique can be applied to both noisy hypernymy graphs – typically generated by ontology learning algorithms as intermediate representations – as well as crowdsourced resources like Wikipedia, in order to obtain clean, domainfocused concept hierarchies. CrumbTrail overcomes the time and space complexity limitations of current state-of-art algorithms.


- Interactive examples:


We provide 6 examples of the application of CrumbTrail to toy noisy graphs:

example 1, example 2, example 3, example 4, example 5, example 6.

Once you open an example, click on a row of the table (on the right side of the screen) to visualize the corresponding steps of the algorithm.


- Source code:

NetBeans Java application project

- License

All the datasets and software are licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License:

DFG project JOIN-T