Some of the parameters can only be used in the context of learning (L), some in the context of predicting (P), some can be used for both (L+P). A bold X means that you have to set this parameter.
Parameter | L | P | Default | Comment |
PATH_TRAINING | X | X | Filepath to training data | |
PATH_VALID | X | Filepath to validation data (used by AnyBURL to filter rankings), point to an empty file if there is no validation set | ||
PATH_TEST | X | Filepath to test data (used by AnyBURL to filter rankings), point to an empty file if there is no test set | ||
PATH_RULES | X | Filepath to previously learned rules | ||
PATH_OUTPUT | X | X | Filepath where you want to store the rules / predictions, seconds of the snapshot are attached. | |
SATURATION | X | 0.99 | The higher the number, the more rules are found until the algorithm continues with the higher path length. | |
SAMPLE_SIZE | X | 500 | Size of sample used for computing the approximated confidence value | |
BATCH_TIME | X | 1000 | Batchtime in milliseconds, should not be set to a lower value. | |
THRESHOLD_CONFIDENCE | X | X | 0.01 | All rules above this threshold AND above the following one are stored / used for prediction. It could be that you have to use THRESHOLD_APPLIED_CONFIDENCE instead of THRESHOLD_CONFIDENCE in the prediction phase, just specify both parameters with the same name. |
THRESHOLD_CORRECT_PREDICTIONS | X | X | 2 | Only rules with at least n correct predictions are stored / used for prediction. |
MAX_LENGTH_CYCLIC | X | X | 3 | The maximal number of body atoms in cyclic rules (inclusive this number). Once the number is exceeded, only the other type of rules is searched for. |
MAX_LENGTH_ACYCLIC | X | X | 2 | The maximal number of body atoms in acyclic rules (inclusive this number). Once the number is exceeded, only the other type of rules is searched for. |
SNAPSHOTS_AT | X | X | 10,100 | The default stores rules learned after 10 and 100 seconds. After the last snapshot AnyBURL terminates. Change as you want. |
WORKER_THREADS | X | X | 3 | It is recommended to set this to n-1 if you have n cores. |
UNSEEN_NEGATIVE_EXAMPLES | X | 5 | Number of negative examples that are added in the denominator as pessimistic variant of laplace smoothing within confidence computation, this number affects prediction (Apply) only! | |
AGGREGATION_TYPE | X | maxplus | Choose between noisyor and maxplus, we strongly recommend maxplus | TOP_K_OUTPUT | X | 10 | top k candidates are created as output, choose at least 10 if you want to compute hits@10 ... |
Examples of minimal configurations files can be found here and here.
This is another example of learning only high quality rules on a machine with many cores.
Both in the learning and the prediction results (rules or predictions) are stored with a suffix of the snapshot-second. If learning terminates before the final snapshot is reached (because the saturation of maximum rule length for both types has been reached), then rules are stored with suffix 0.