AnyBURL Parameter

Parameter

Some of the parameters can only be used in the context of learning (L), some in the context of predicting (P), some can be used for both (L+P). A bold X means that you have to set this parameter.

Parameter L P Default Comment
PATH_TRAINING X X Filepath to training data
PATH_VALID X Filepath to validation data (used by AnyBURL to filter rankings), point to an empty file if there is no validation set
PATH_TEST X Filepath to test data (used by AnyBURL to filter rankings), point to an empty file if there is no test set
PATH_RULES X Filepath to previously learned rules
PATH_OUTPUT X X Filepath where you want to store the rules / predictions, seconds of the snapshot are attached.
SATURATION X 0.99 The higher the number, the more rules are found until the algorithm continues with the higher path length.
SAMPLE_SIZE X 500 Size of sample used for computing the approximated confidence value
BATCH_TIME X 1000 Batchtime in milliseconds, should not be set to a lower value.
THRESHOLD_CONFIDENCE X X 0.01 All rules above this threshold AND above the following one are stored / used for prediction. It could be that you have to use THRESHOLD_APPLIED_CONFIDENCE instead of THRESHOLD_CONFIDENCE in the prediction phase, just specify both parameters with the same name.
THRESHOLD_CORRECT_PREDICTIONS X X 2 Only rules with at least n correct predictions are stored / used for prediction.
MAX_LENGTH_CYCLIC X X 3 The maximal number of body atoms in cyclic rules (inclusive this number). Once the number is exceeded, only the other type of rules is searched for.
MAX_LENGTH_ACYCLIC X X 2 The maximal number of body atoms in acyclic rules (inclusive this number). Once the number is exceeded, only the other type of rules is searched for.
SNAPSHOTS_AT X X 10,100 The default stores rules learned after 10 and 100 seconds.
After the last snapshot AnyBURL terminates. Change as you want.
WORKER_THREADS X X 3 It is recommended to set this to n-1 if you have n cores.
UNSEEN_NEGATIVE_EXAMPLES X 5 Number of negative examples that are added in the denominator as pessimistic variant of laplace smoothing within confidence computation, this number affects prediction (Apply) only!
AGGREGATION_TYPE X maxplus Choose between noisyor and maxplus, we strongly recommend maxplus
TOP_K_OUTPUT X 10 top k candidates are created as output, choose at least 10 if you want to compute hits@10 ...

Examples of minimal configurations files can be found here and here.

This is another example of learning only high quality rules on a machine with many cores.

Both in the learning and the prediction results (rules or predictions) are stored with a suffix of the snapshot-second. If learning terminates before the final snapshot is reached (because the saturation of maximum rule length for both types has been reached), then rules are stored with suffix 0.