Python kgextension package

Abstract

Python is currently the most used platform for data science and machine learning. At the same time, public knowledge graphs have been identified as a valuable source of background knowledge in many data science tasks. In this paper, we introduce the kgextension package for Python, which allows for using knowledge graph in data science pipelines built in Python. The demo shows how data from public knowledge graphs such as DBpedia and Wikidata can be used in data mining pipelines based on the popular Python package scikit-learn. We demonstrate the package's utility by showing that the prediction accuracy on a popular Kaggle task can be significantly increased by using background knoweldge from DBpedia.

Package Functionalities

Linking and Link Exploration
Feature Generation
Feature Filtering
Matching and Fusion

Demonstration Contents

prediction task from Kaggle: Amazon Top 50 Bestselling Books 2009 - 2019 to classify books in fiction and non-fiction
extend the dataset with information from different public knowledge graphs
accuracy increases significantly from 0.86 to 0.94

Future Developments

integrate novel methods for linking and libraries for creating knowledge graph embedding vectors
extend the dataset with information from different public knowledge graphs
integrate efficient generators for Triple Pattern Fragment endpoints and HDT files

Link to the demo

Link to Github Repository kgextension and Read the Docs

scikit-learn Pipelines meet Knowledge Graphs

Abstract

Package Functionalities

Demonstration Contents

Future Developments

Link to the demo