Skip to content

Getting Started With Cubonacci

This is a Getting Started guide for the Cubonacci cloud platform: cloud.cubonacci.com

Cubonacci is a code-first platform to streamline the machine learning workflow from development to deployment. A video with a quick overview of the platform can be found here

Introduction

In this guide you will learn how to set up a Cubonacci project, run a hyper parameter search (called experiments in Cubonacci), how to train, test and deploy a model, and how interact with it using the integrated notebook feature.

If you would like to get a quickstart or when you would like some help with the first steps in setting up a project in Cubonacci, this getting started guide is for you. If you have experience with developing machine learning models in Python, the rest of the documentation should get you up-and-running quickly.

There is a Slack workspace for Cubonacci users, that you can join if you would like to share feedback, or when you have questions about how to use the platform.

Create an account

Cubonacci uses Github as an identity provider and as the place to store the model code you want to use. If you don’t have a Github account, please sign up with Github before creating a Cubonacci account.

Authorization

Cubonacci will ask your permission to perform certain tasks within your GitHub account on your behalf. The way GitHub works, this is only possible by giving permission for full read and write access to the repositories in your account. Cubonacci will use this permission only to list your repositories, create new repositories from within Cubonacci, and update repositories that you manage from Cubonacci. Cubonacci will never read from or write to repositories that you are not using within Cubonacci. At any time, you may revoke these permissions by accessing Settings → Applications → Authorized OAuth Apps on the GitHub website.

Create an account

Project overview

After creating your Cubonacci account, you will see the project overview. This is the default landing page for logged in users. Press the + create project button to create your first project and give the project a name.

Add project

Cubonacci Console

Click on the new project to open the Cubonacci Console. The overview is still empty and it will fill up with status overviews and lists of recent actions, as your project progresses through the machine learning lifecycle. The collapsible sidebar on the left is used for navigating the platform.

Choose to add a repository to let Cubonacci create a new empty repository for you.

Add repository

The wizard guides you through the process and requests permission for the Cubonacci Github app to modify selected repositories in your Github account. This happens in a new browser tab that you can close after confirming the app connection. After this is done, two new buttons appear: quickstart and example project. The quickstart wizard is useful if you want Cubonacci to create the skeleton of a project that fits with your use case. We choose to start with an example project and on the next screen we pick the iris project. If you are more interested in following along with the mnist project, keep in mind that a training run can take up to 15 minutes of wall clock time.

quickstart or example project

Cubonacci commits the example project to the repository that you created in the previous step. The overview will now show the first commit with a green checkmark - we'll come back to that later, but first let's take a loot at the new project.

Notebooks

To understand the structure of a Cubonacci project, we recommend you to inspect the repository that was just created with a Cubonacci notebook. You can do this by selecting the Notebook feature from the sidebar menu. Create a new notebook and check the box to create the notebook with cells based on commit code. This prepopulates the notebook with code that shows how you can interact with the repository. This interaction is similar to how the platform will interact with the model code.

The notebook editor consists of two parts: on the top you see the model repository, with the folder structure on the left and a text editor on the right. Below you will find a notebook, that allows you to interact directly with your code and the Cubonacci platform through a python backend.

Code editor and Notebook combined

If you make changes to the files in the code editor on the top, the updated code will be reloaded directly in the notebook. However, it it not committed back to github until you press the commit button on the top left. The notebook itself is saved when you click on the save notebook button.

When you do commit the changes to your repository, the new version of the code is pushed to Github and gets picked back up by the Cubonacci platform. The commit is validated and a window appears in your notebook asking you if you want to connect the notebook to the new commit.

There are two reasons why Cubonacci pushes the code to Git first, and then picks up the new commit from the Git repository, instead of directly storing it as a new commit internally. The first is that it is important to store all versions of your code, and that is exactly what Git is used for. The second is that Git allows you or your organization to work with policies. Often a new version of the mode code will not be committed directly to a Git repository, it has to pass a peer review first.

Structure of the repository

The main folder of the code repository contains three files and three subfolders. The files are:

File Description
README.md every git repository by default contains a markdown readme file
cubonacci.yaml a yaml file with basic configuration for your model. Yaml is a serialization language that is used for human readable configuration files that are quickly parsed by computers
requirements.txt the list of Python requirements for your project

The subfolders are:

Folder Description
data_loader contains the code to load the training data
algorithms contains the code and configuration for the machine learning models that you want to use. In this folder, each model will have its own subfolder.
metrics contains the code for the metrics that you want to use to evaluate the performance of the models

Coding Best Practice

The code is defined in a structure that follows the best practices in Machine Learning development and resembles the structure of the well known scikit learn library. The data loader and the models are defined as classes and the metrics are defined as functions. In the docs you can find detailed information on the definition of the data loader, the models and the metrics.

The models also require a bit of configuration: the hyperparameters that you would like to try in a hyperparameter search need to be defined in a yaml file, where you set the values, or the range of values, they can take.

Back to the Cubonacci Console

Return to the overview. The overview still shows the first commit and empty boxes for the notebooks, experiments, models and deployments that belong to this commit.

First Commit

The green checkmark indicates that the repository has passed the first validation, which means that the structure of the code is according to what Cubonacci needs to interact with it.

Click on the commit to see the details of the version.

Commit details

Commit details

The commit hash (in this exampe: dc6aeacb) uniquely identifies the version of the code on Github. A validated version of your code allows you to run an experiment and you can connect it to a notebook to develop with your code and interact with the Cubonacci platform. Three tabs are present, for the experiements, models and deployments that belong to this commit.

From the commit details, we will start with a distributed hyperparameter search, to find out what the best settings are for our model. Click on the plus sign to start the experiment. A menu pops up with a number of choices.

Experiment Settings

For this getting started guide, we will leave the options to their default settings and start the experiment. In the experiments a new record appears. Click on it to inspect the experiment progress and results.

Experiment results

In the experiment details you find the results and the progress of the hyperparameter search. Metadata about the run is shown on the top and below is a summary of the trial run, and the model that showed the best result. All the metrics that you see, as well as the leading metric that is used to suggest the best performing model, are based on what has been defined and configured in the code of the repository.

A visual presentation of the experiment results allows you to inspect the relationship between the hyperparemter settings of each run and the performance metrics. Below the visual the same same information is shown in table form, from which you can pick the settings that fit best with your objectives and start the final training run of the model.

Experiment results

Individual logs of each trail run can be found by clicking on the trial result in the table overview. The logs show all stderr and stdout from the Python session, allowing you to inspect issues and any logging that is printed to stdout.

Final Model Training

When you have decided what the best hyperparameters are, you can start the final model training. This will be run with the entire dataset, as opposed to the train/test split that was used in the hyperparameter search.

Model Training Settings

Deployment

Every model that has done a final model training is available to deploy. The + button to add deployments appears on the model details page when the training is successfully completed.

For deployment there are also a number of options as can be seen in the image:

Deployment Settings

Testing a deployment

Every deployment gets its own unique endpoint and you can connect it with a static endpoint in the endpoints section. Cubonacci infers the schema of the input and output data of the model, so that the platform can automatically define the correct API contract and setup a user friendly test form. This form allows you to enter data and to judge whether the result of the API call is according to expectations. There is also a code generator that shows how to make calls to the API in a number of languages.

Deployment Test Form

Wrap-up

You have now successfully walked through all steps in the machine learning lifecycyle: you set up a Git repository, committed code, ran an experiment, trained a model, deployed it and provisioned an API.