Skip to content


Metrics are used for evaluating how well models perform on the datasets given the chosen hyperparameters. These metrics can also be used for monitoring live performance by passing feedback when labels for predictions made in the past become available.

Experiments in the user interface show the metrics that were measured during the different trials and how they correspond to the hyperparameters. Metrics of experiments can also be used to manually or automatically decide what models to train in full and make available for deployment.

Experiment results


Similar to the algorithm definitions, metrics are based on the scikit-learn metrics API. In the standard form, a custom metric takes two parameters, y_true and y_pred. The y_true argument gets passed a set of targets in the shape of the data created by your DataLoader. The y_pred parameter gets passed the corresponding predictions generated by the trained model that is being evaluated at that moment. Here is an example of a mean squared error where predicting too high is punished more severely:

import numpy as np

def skewed_mean_squared_error(y_true, y_pred):
    difference = y_true - y_pred
    squared = np.square(difference)
    skewed_squared = np.where(difference < 0, squared * 2, squared)
    return np.mean(skewed_squared)

Metrics exist in their own files in the metrics folder in the codebase structure. The filename should correspond with the function name, this allows other support methods to be in the same file.