Expanding on my previous post, F# and ML.NET Regression, the current post will take a look at performing classification using Microsoft’s new ML.NET framework. The task at hand will be to use biomechanical attributes to classify patient vertebra conditions into normal (NO), disk hernia (DH), or spondilolysthesis (SL) categories.
As I mentioned in the previous post, there is a disclaimer: ML.NET is in its early stages. I found a couple implementation and interface idiosyncrasies I suspect will change over time. Just keep that in mind moving forward. But knowing that, I’ve been pleased with what I’ve seen so far.
Update: The post was written using Microsoft.ML v0.1.0, and v0.2.0 has since been released. I have noted interfaces changes below, for the example it is just TextLoader.
Make sure you have .NET Core version 2.1 installed. If you don’t, head out to the .NET Core Downloads page. Select SDK for your platform. Tangential, but you can also get here by going to dot.net, then navigating to
First, create a console F# project, then add the ML.NET package.
dotnet new console --language F# --name MLNet-Vertebral
mkdir data && cd data
Here is a sample of what the data looks like. There is no header row. The columns represent 6 features and 1 classification column:
# Data Rows
Now that the project is setup and data is local, we can get to the code. Time to open up the already created
Program.fs. First, add the necessary namespaces.
The ML.NET pipeline expects the data in a specific format. In the C# world, this is a class, for F# we can use a type. Below are the required types;
VertebralData is the input data,
VertebralPrediction is the output prediction. For
VertebralData, this is basically a a map of columns to member variables. There are a couple notable points to ensure the pipeline can properly consume the data. Each attribute must be
mutable public, it also requires the
[<Column("#")>] to specify it’s column position, and
[<DefaultValue>] attributes. For
VertebralPrediction, a single attribute is required, the prediction value. For the input data, the label variable must be named
Label. For the prediction type, the variable must be labeled
type VertebralData() =
Building the pipeline structure is reasonably intuitive. First, create a pipeline. Then, add components to the pipeline in the order to be executed. So first, load the data with a
TextLoader. This data is comma delimited and has a header row.
let pipeline = new LearningPipeline()
After the data is loaded, feature columns need to be added to the pipeline. I’m going to use all feature columns from the file, but I don’t have to. The classification model requires features to be numeric. The features don’t need anything special done to them, but the class does need converted to numeric values. For this I use
Now that the features are defined, it is time to determine what training method to use. For this post
StochasticDualCoordinateAscentClassifier is used. Custom hyperparameters can also be defined. I have a commented out example that changes bias and convergence tolerance.
For the dataset in question, the
StochasticDualCoordinateAscentClassifier worked well, but I could’ve used a
NaiveBayesClassifier as well. Since this is multiclass, I had less options, but ML.NET seems to have a fair amount of binary classifiers, when that is the desired use case.
The last part, train the model. Note the
VertebralPrediction types as part of the
Train call. I also need to define the prediction label column name. Unfortunantly the function name is really long, but it is at least descriptive…
pipeline.Add(new PredictedLabelColumnOriginalValueConverter(PredictedLabelColumn = "PredictedLabel") )
Validation of any model is important. For a real case, I would train on one dataset and validate against a previously unseen dataset. Since this is just an example, I validate against the training data. As a result, I expect the results to be very good, and they are. ML.NET offers multiple Evaluator classes, based on specific needs. This makes getting some of those crucial high-level numbers pretty easy. It takes a trained model and a dataset, and produces critical metrics. One specific call-out I will make is to the
TopKAccuracy. The evaluator allows an additional accuracy result based on if the correct class was in the top
k rankings for a prediction. Here I have choosen
2, which is a little silly in a 3 class problem, but obviously in larger problems this is a valuable analysis tool. The confusion matrix takes a bit of coercing to print nicely, but at least the data is there.
// Evaluate results
# Evaluator Results:
With the initial evaluation out of the way, here is an example of how individual predictions can be made. Create a
VertebralData object and provide it to the
Predict method. For this example, I pull one of those rows from the training data.
let test1 = VertebralData()
# Prediction Result:
Once a model is trained, it can also be saved to a file a reloaded at a later time. This is supported by the
ReadAsync methods of a model.
// Save model to file
# Prediction Result: (model reloaded):
Throughout the post, portions of the output have been provided out of band. Here is how the whole thing looks when run with
Automatically adding a MinMax normalization transform, use 'norm=Warn' or 'norm=No' to turn this behavior off.
This has been a brief look into training and using an ML.NET classification model. There were two interesting takeaways for me. The first is just how to interact with the framework for classification. The second, and more interesting, is how little is different between the regressor and classification pipelines. I really appreciate a consistent framework where I can leverage a similar pipeline for most of my projects, and only really need to change out the appropriate logic bits for my specfic current problem. ML.NET has some really good components, and it will be interesting to see it grows, hopefully with more F#-centric support as well. Until next time.