AnyBURL

AnyBURL

This is the home of the rule learner AnyBURL (Anytime Bottom Up Rule Learning). AnyBURL has been designed for the use case of knowledge base completion, however, it can also be applied to any other use case where rules are helpful. You can use it to (i) learn rules, (ii) apply them create candidate rankings, (iii) and to evaluate the created ranking.

Since June 2021 there exists a new AnyBURL version called AnyBURL-JUNO. Compared to the previous version only minor changes have been applied. A new type of rule has been added, which is a rule with an empty body. It predicts entities with respect to the freqency that they appear in subject or object poistion of a certain relation. The idea of this rule is to fill up rankings. The predictions made by this rule are weighted low to achieve this purpose. Aside from this minor change the new release is more or less the same as the RE release.

Since December 2020 there exists a faster and probably better alternative for applying the rules learned by AnyBURL, which is called SAFRAN. See here for more details.

Alternative approaches to knowledge base completion, which are currently dominating the research field in number of publications, are embedding a given graph into a low dimensional vector space. If you want to compare AnyBURL to these approaches we recommend the use of LibKGE.

Results

These are the results of the newest AnyBURL version, called AnyBURL-JUNO, in comparison to the previous AnyBURL versions. The time used for learning the rules was restructed to 1000 seconds (~17 minutes). The application of these rules required less than 5 minutes for each dataset, with the exception of CODEX-L, where the final ranking have been created in less than 15 minutes. The IJCAI results have been computed on a laptop, the other results have been computed on a compute server 24 Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz cores using between 20 and 22 cores.

For YAGO we reduced the learning time to 100 seconds. This resulted into 172 seconds application time for YAGO. In total less then 5 minutes. The resulting scores are hits@1=49.5, hits@10=67.9, MRR=0.555. Please send me a mail if you know any model that can achieve these scores in 5 minutes including the hyperparemeter search, which is not required by AnyBURL.

Dataset IJCAI-19 AnyBURL-RE AnyBURL-JUNO (current version)
Metric hits@1 hits@10 hits@1 hits@10 hits@1 hits@10
WN18 93.9 95.6 94.8 96.2 94.8 96.1
WN18RR 44.6 55.5 45.7 57.7 45.7 57.6
FB15 80.4 89.0 81.4 89.4 80.9 89.4
FB15-237 23.0 47.9 27.3 52.2 24.5 50.6
YAGO03-10 42.9 63.9 49.2 68.9 49.8 69.2
CODEX-L 25.6 42.6

IMPORTANT NOTE: All results have been created with the default paremeter setting of AnyBURL. Exceptions are the results for WN18 and WN18RR. For these datasets it is possible to increase the length of the cyclic rules from three (default value) to five, which gives a small additional plus. You have to add the line MAX_LENGTH_CYCLIC = 5 to the configuration file that describes the in and output of rule leaning.

SPECIAL REMARK: With the current release we are correcting what might have been a mistake. In AnyBURL-RE we implemented a specific technique to learn from the validation set, what can be expected to work well for the test set. Sounds okay, ... well, decide on your own. It is described here in Section 4.4. Its impact is restricted to the FB237 dataset. For that reason we colored the specific cells red in the table above. This technique has been deactivated in the current release. Its still an open issue, whether its fair to apply this technique.

Download (and build) AnyBURL

The current version of AnyBURL uses reinforcement learning to learn rules of different length sampled from different path types in parallel. In each time span AnyBURL computes how well the sampled rules allow to reconstruct the training set, assigning more computational resources to those path profiles that score best. For more details we have to point to the paper, which is unfortunately not yet available.

AnyBURL is packaged as jar file and requires no external resources. You can download the jar file here.

If you have problems in running the jar due to, e.g., some java version conflict, you can build an AnyBURL.jar on your own. If you want (or need) to do this continue as follows, otherwise skip the following lines. Download the source code and unzip it. Compile the code and create the jar as follows. First create a folder build, then compile with the following command.

javac de/unima/ki/anyburl/*.java -d build

Package in a jar:

jar cfv AnyBURL-JUNO.jar -C build .

There is a dot . at the end of the line, its required. Afterwards you can delete the build folder.

Datasets

You can use AnyBURL on any dataset that comes as a set of triple. The supported format is rather simple and should look like this. The separator can be blank or tab.

anne loves bob
bob marriedTo charly
bob hasGender male
...

So far it has been tested on the well known datasets FB15k, FB15237, WN18, WN18RR and YAGO03-10. We have zipped the FB15, FB15-237, and WN18 in one file. Please download and unzip. YAGO03-10 and WN18RR is available at the ConvE webpage.

Run AnyBURL

AnyBURL can be used (i) to learn rules and (ii) to apply the learned rules to solve prediction tasks. These are two distinct processes that have to be started independently.

Learning

Download and open the file config-learn.properties and modify the line that directs to the training file choosing the datasets that you want to apply AnyBURL to. Create the output folder rules, then run AnyBURL with this command.

java -Xmx3G -cp AnyBURL-JUNO.jar de.unima.ki.anyburl.LearnReinforced config-learn.properties

If you have been using the previous version of AnyBURL, you might have noticed that the only difference is related to the change from Learn to LearnReinforced. Everything else stays the same. The parameter -Xmx12G specifies how much memory should be available for java. Here we specified 3 Gigabyte. It might be required to increase this for larger datasets.

It will create three files alpha-10, alpha-50, and alpha-100 in the rules folder. These files contain the rules learned after 10, 50, and 100 seconds. During executing AnyBURL you can see how many rules have been found so far and how the saturation rate for cyclic and acyclic rules changes over time. Note that everything should also work fine when setting the maximal heapsize to only 3G.

Learning Parameters relevant for Learning

You can change the following parameters to modify the standard learning behaviour of AnyBURL. Any changes have to be made by writing a line (or changing a line) into the config-learn.properties file

In some scenarios there is only a specific target relation, suppose it is called relation17. In these scenarios you should add the line SINGLE_RELATIONS = relation17 to the config-learn.properties. If there are several target relation, you can list them seperated by a comma. Do not use blanks in between.

Predicting

Download and open the file config-apply.properties and modify it according to your needs (if required). Create the output folder predictions, then run AnyBURL with this command. Note that you have to specify the rules that have been learned previously.

java -Xmx3G -cp AnyBURL-JUNO.jar de.unima.ki.anyburl.Apply config-apply.properties

If you have been using the previous version of AnyBURL, you might have noticed that nothing has changed w.r.t to the method call.

This will create two files alpha-10, alpha-100 in the predictions folder. Each contains the top-k rankings for the completion tasks, which are already filtered rankings (this is the only reason why the validation and test set must be specified in the apply config files)

Prediction Parameters

You can change the following parameters to modify the standard prediction (= rule application) behaviour of AnyBURL.

Evaluating Results

To eval these results, use this command after modifying config-eval.properties (if required). The evaluation result is printed to standard out.

java -Xmx3G -cp AnyBURL-JUNO.jar de.unima.ki.anyburl.Eval config-eval.properties

If you follow the whole workflow using the referenced config-files, the evaluation program should print results similar to the following output:

...
-----
10 0.1997 0.3965 0.234
50 0.2197 0.4337 0.284
100 0.2299 0.4517 0.333

The first column refers to the time used for learning, the second column shows the hits@1 score, the third column the hits@10 score. The last column is the MRR (approximated, as its based on the top-k only).

The evaluation command line interface is only used for demonstration purpose, its not intended to be used in a large scale experimental setting. We might have modified it meanwhile a bit, so the output might look a bit different.

Light Setup

If you want to run AnyBURL in a kind-of light set up, you can do this by adding the following parameter setting to the configuration for learning the rules.

THRESHOLD_CORRECT_PREDICTIONS = 10

MAX_LENGTH_ACYCLIC = 0
MAX_LENGTH_GROUNDED_CYCLIC = 0
ZERO_RULES_ACTIVE = false

The first line ensures that only rules are stored, for which at least 10 correct predictions against the training set can be made, i.e., there are at least 10 different groundings in the training set which make both head and body of the rule true. The next two lines ensure that no rules with constants are learned. And the last line ensures that a special kind of default-rule is also suppressed. As a result only a relatively small set of cyclic rules without constants are learned.

We are currently conducting experiments with this setting. Results will be available within the next days.

Extensions

SAFRAN (Scalable and fast non-redundant rule application) is a framework for fast inference of groundings and aggregation of logical rules on large heterogeneous knowledge graphs. It requires a rule set learned by AnyBURL as input, which is used to make predictions for the standard KBC task. This means that is can be used as alternative to the rule application method that is built in AnyBURL . In most cases it is significantly faster and slightly better in terms of hits@k and MRR.

Publications

The main publication is shown in bold letters. Please cite this paper, if you are not using AnyBURL for a specific purpose, which might be better reflected in one of the other papers.

The third paper is not about AnyBURL but about a simple rule-based baseline called RuleN and its comparison againsts some state of the art embedding methods. The good results of RuleN motivated us to develop AnyBURL.

Previous and Special Versions

22.11.2021: Added a paragraph related to running AnyBURL in a light mode on the webpage. Further experiments with this setting are planned (UPDATE on 23.11.2021 while adding more results we detected an inconsistency in the evaluation code that will be resolved within the next days).

14.09.2021: Added the AKBC paper to the list of related publications.

08.07.2021: Fixed some minor issues in the descriptions and Updated webpage with a hint on the parameter SINGLE_RELATIONS.

10.06.2021: Updated webpage with new version of AnyBURL called AnyBURL-JUNO. Only minor modifications compared to the RE version.

02.02.2021: Added Lincense information at the end of this page and extended the remark below the results table with the WN18/WN18RR specific setting.

06.11.2020: Updated webpage with a hint on how to achieve the results shown in the results table. See the paragraph below the table. Thanks to Simon Ott for pointing to the fact that slightly worse results are achieved when run with default setting.

23.03.2020: Updated webpage with new version of AnyBURL using Reinforcement Learning (RE version).

12.06.2019: Some minor issues in the sources of the 2019-05 version have been fixed (thanks to the feedback of Andrea Rossi). Now there should be no built problems related to the encoding of German Umlaute in some comments (most of the comments are in English) and the reference to an outdated (and unused) package.

Contact

If you have any questions, feel free to write a mail at any time. We are also interested to hear about some applications where AnyBURL was useful.

License

AnyBURL is available under the 3-clause BSD, sometimes referred to as modified BSD license:

Copyright (c) University Mannheim, Data and Web Science Group

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Colophon

Wikipedia: " A burl [...] is a tree growth in which the grain has grown in a deformed manner. It is commonly found in the form of a rounded outgrowth on a tree trunk or branch that is filled with small knots from dormant buds." If you cut it you get what is shown in the background. The small knots, which are also called burls, can be associated with constants and the regularities that are associated with the forms and structures that surround them.