Today I look into performing linear regression using F#. The implementations of interest will be the MathNet and Accord.NET libraries. I assume you already know what linear regression is, but in can you need a refresher: Linear Regression. My goal is to provide a simple explanation of how to leverage some existing F# accessible libraries. Once you know some of the basic calling functions, you can go crazy with some of the other options these libraries have to offer.
Using Paket, here is a sample paket.dependencies file.
Here is the library loading.
First I need to create some data. For this example, the formula is y = x^2 + noise + occasional outlier. The method is create 3 arrays representing x, noise, and outliers. It is a bit convoluted, but it allows me to show off a couple bits of functionality from F# and MathNet. The MathNet library includes methods to generate datasets, for the noise dataset
Generate.Normal creates an array of numbers with a normal distribution. It is worth checking out the other diverse generation capabilities available. For outliers, I define an arbitrary 30% chance of a big spike in the data, as defined by pct and range variables. Then I use
Array.zip3 to combine the 3 element tuple of arrays into an array of 3 element tuples. Once in this format, a map is used to calculate the formula mentioned at the start.
Sidenote: If you’ve coded any F#, you know
|>. But did you know there are other, similar operators:
||> passes a tuple as two arguments,
|||> passes a 3-tuple as 3 arguments.
let xData = [| 1. .. 0.25 .. 50. |]
Starting with the MathNet implementation, it is time for the regression fitting. The first option is
Fit.Line, this takes the x and y data, fits a line and returns the associated intercept and slope that can be plugged into a y = mx + b formula. The second option is to use
Fit.LineFunc. It also takes the x and y data to fit a line. The difference is it creates a delegate function that can be used to directly calculate.
let mathnetIntercept, mathnetSlope = Fit.Line (xData, yData)
Now it’s time to generate data based on the regression result. pData1 is calculated manually using the slope and intercept from the
Fit.Line call. pData2 leverages the function delegate from
Fit.LineFunc. I need to use
.Invoke for performing the calculation.
let pData1 =
The below code combines 3 charts; the original data, plus the regression lines. Some of the effect is lost, since the lines are ontop of each other, but you hopefully get the point.
// Chart actual versus linear regression predictions
Now it is time to look at the Accord.NET implementation. Here,
Learn are used to determine line fitting. The result is a SimpleLinearRegression object.
let ols = new OrdinaryLeastSquares();
Now it’s time to generate data based on the regression result. pData3 is calculated manually using the slope and intercept. pData4 is calculated directly by leveraging Accord’s
let pData3 =
Again, the below code combines 3 charts; the original data, plus the regression lines. As expected, these graphs are identical.
So, there it is. A couple ways to do linear regression, but there is more. What good is the regression model if you can’t perform scoring against the result. Luckily, MathNet and Accord.NET have several methods of comparing datasets. There are too many options to show them all here, but here are a couple examples scoring predicted data (pData1) versus actual data (yData). For reference: MathNet Distances and Accord.NET Distances. I recommend digging deeper to find the scoring method appropriate for your specific needs.
printfn "Scoring: R2=%.2f R=%.2f PSE=%.2f SE=%.2f SAD=%.2f SSD=%.2f MAE=%.2f MSE=%.2f"
printfn "Scoring SE: %.2f E: %.2f E2:%.2f M:%.2f PC:%.2f"
I hope this has been helpful if you’re venturing into F# and regression.