2022-01-30

Taking Stock of More Anomalies with F# and ML.NET

Read Time: 10 minutes

It has been awhile since I posted about Anomaly detection using F# and ML.NET. Since the ML.NET framework continues to evolve, so it is worth a revisit to investigate changes. This also provides a good opportunity to dig deeper into the anomaly detection options that are provided.

Setting up dependencies is a two-part process, depending on the operating system. First, add the necessary packages. Second, Mkl.Redist may require an additional library install. Microsoft has details at install extra dependencies for the specific requirements, but I include an example of installing and loading the library on Ubuntu.

dotnet add package Microsoft.ML --version 1.7.1
dotnet add package Microsoft.ML.FastTree --version 1.7.1
dotnet add package Microsoft.ML.Mkl.Components --version 1.7.1
dotnet add package Microsoft.ML.MklRedist --version 1.7.1
dotnet add package Microsoft.ML.TimeSeries --version 1.7.1
dotnet add package Plotly.NET --version 2.0.0

# Additional requirement installation (For Linux)
sudo bash
cd /tmp
wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
rm GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
exit
sudo apt-get update
sudo apt-get install intel-mkl-64bit-2020.4-912
sudo ldconfig /opt/intel/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin

Last time I used the Dow Jones stock index for my example. This time I’m going to shift a bit, and use the NASDAQ index. I’m using the same data format as before, below is a snippet of the source data. It is extensive stock price data, but I will only use Date and Close price today. It was exported from Yahoo! Finance.

# Data Rows
Date,Open,High,Low,Close,Adj Close,Volume
2010-01-04,2294.409912,2311.149902,2294.409912,2308.419922,2308.419922,1931380000
2010-01-05,2307.270020,2313.729980,2295.620117,2308.709961,2308.709961,2367860000
2010-01-06,2307.709961,2314.070068,2295.679932,2301.090088,2301.090088,2253340000
2010-01-07,2298.090088,2301.300049,2285.219971,2300.050049,2300.050049,2270050000
2010-01-08,2292.239990,2317.600098,2290.610107,2317.169922,2317.169922,2145390000
2010-01-11,2324.780029,2326.280029,2302.209961,2312.409912,2312.409912,2077890000
2010-01-12,2297.280029,2298.850098,2272.699951,2282.310059,2282.310059,2368320000
2010-01-13,2289.459961,2313.030029,2274.120117,2307.899902,2307.899902,2318350000
2010-01-14,2303.310059,2322.560059,2303.290039,2316.739990,2316.739990,2254170000

The setup is similar to before. I need to setup a PriceData type for data loading, and PricePrediction for the anomaly detection.

open System
open Microsoft.ML
open Microsoft.ML.Data
open Microsoft.ML.Transforms.TimeSeries
open Plotly.NET

type PriceData () =
  [<DefaultValue>]
  [<LoadColumn(0)>]
  val mutable public Date: string

  [<DefaultValue>]
  [<LoadColumn(1)>]
  val mutable public Open: float32

  [<DefaultValue>]
  [<LoadColumn(2)>]
  val mutable public High: float32

  [<DefaultValue>]
  [<LoadColumn(3)>]
  val mutable public Low: float32

  [<DefaultValue>]
  [<LoadColumn(4)>]
  val mutable public Close: float32

  [<DefaultValue>]
  [<LoadColumn(5)>]
  val mutable public AdjClose: float32

  [<DefaultValue>]
  [<LoadColumn(6)>]
  val mutable public Volume: float32

type PricePrediction () =
  [<DefaultValue>]
  val mutable public Date: string

  [<DefaultValue>]
  val mutable public Prediction: double[]

Before I get into the detection, there is some work to be done. First is setting up the MLContext that will be used in the transformations and detections. I will also load the actual price data into its own array and chart,, so I can use it in the final display phase later.

let dataPath = "nasdaq.csv"

let context = MLContext()

let data = 
  context
    .Data
    .LoadFromTextFile<PriceData> (
      path = dataPath,
      hasHeader = true,
      separatorChar = ',')

///////////////
// Pricing data

let priceData =
  context
    .Data
    .CreateEnumerable<PriceData>(data, false)
  |> Seq.map (fun x -> (x.Date, float (x.Close)))

let priceChart =
  Chart.Line(priceData, Name = "Price")

The first anomaly detection method to look at is IidSpike. This is the method used in the original post. Creating an anomaly detector hasn’t changed too much between versions. There are a couple small parameter issues, but the transition is pretty clean. As is often the case, the values for confidence and pvalueHistoryLength can/will be situation dependent, but for example purposes these work reasonably well. After determining the anomalies for the data, I pair the detected anomalies with the price data for a chart.

let iidSpikeData = 
  context
    .Transforms
    .DetectIidSpike(
      outputColumnName = "Prediction",
      inputColumnName = "Close",
      side = AnomalySide.TwoSided,
      confidence = 95., 
      pvalueHistoryLength = 50)
    .Fit(data)
    .Transform(data)

let iidSpikeAnomalies = 
  context
    .Data
    .CreateEnumerable<PricePrediction>(iidSpikeData, reuseRowObject = false)

let iidSpikeChartData = 
  (priceData, iidSpikeAnomalies)
  ||> Seq.zip
  |> Seq.map (fun (p, a) ->
      // For all anomalies, use closing price to show on the chart
      (a.Date, if (a.Prediction).[0] = 0. then None else Some (snd p)))
  |> Seq.filter (fun (_x, y) -> y.IsSome)
  |> Seq.map (fun (x, y) -> (x, y.Value))

let iidSpikeChart = 
  Chart.Scatter (iidSpikeChartData, StyleParam.Mode.Markers, Name = "iidSpike")

[ priceChart; iidSpikeChart ]
|> Chart.combine
|> Chart.withTitle "Close Price (IidSpike)"
|> Chart.show

Price Anomalies - IidSpike

One of the main goals of this posts is to investigate additional anomaly detection methods that ML.NET provides. The second anomaly detection method to look at is SrCnn. Its methodology is based on leveraging Spectral Residual and a Convolutional Neural Network. You can read more details on the underlying mechanisms and reasoning on their website SrCnnAnomalyEstimator and whitepaper Time-Series Anomaly Detection Service at Microsoft. Below is a pipeline for detection using SrCnn. One takeaway is the code is nearly identical to the IidSpike example; just replace the DetectIidSpike call with DetectAnomalyBySrCnn. Perhaps this isn’t surprising, but it makes experimentation easy as snapping pieces in and out. Since the parameter support is different for the calls there is a bit more work. I’m only using windowSize to define the sliding window, but it does have more knobs (like threshold) to tweak. Like before, I pair the detected anomalies with the price data for a chart.

let srCnnData = 
  context
    .Transforms
    .DetectAnomalyBySrCnn(
      outputColumnName = "Prediction",
      inputColumnName = "Close",
      windowSize = 50)
    .Fit(data)
    .Transform(data)

let srCnnAnomalies = 
  context
    .Data
    .CreateEnumerable<PricePrediction>(srCnnData, reuseRowObject = false)

let srCnnChartData = 
  (priceData, srCnnAnomalies)
  ||> Seq.zip
  |> Seq.map (fun (p, a) ->
      // For all anomalies, use closing price to show on the chart
      (a.Date, if (a.Prediction).[0] = 0. then None else Some (snd p)))
  |> Seq.filter (fun (_x, y) -> y.IsSome)
  |> Seq.map (fun (x, y) -> (x, y.Value))

let srCnnChart = 
  Chart.Scatter (srCnnChartData, StyleParam.Mode.Markers, Name = "srCnn")

[ priceChart; srCnnChart ]
|> Chart.combine
|> Chart.withTitle "Close Price (SrCnn)"
|> Chart.show

Price Anomalies - SrCnn

The next anomaly detection method up for experimentation is spikeSSa. This method uses Singular spectrum analysis to detect anomalies. Microsoft has more details regarding its methodology at SsaSpikeEstimator and with the whitepaper Basic Singular Spectrum Analysis and Forecasting with R. As with SrCnn, I can snap in the DetectBySsa estimator into the pipeline. This does have more knobs to tweak, so some experimentation is worthwhile to determine the best settings for your particular situation. Once detected, I link the anomalies with the data to make a nice chart.

let spikeSsaData = 
  context
    .Transforms
    .DetectSpikeBySsa(
      outputColumnName = "Prediction",
      inputColumnName = "Close",
      confidence = 95.,
      pvalueHistoryLength = 100,
      trainingWindowSize = 1000,
      seasonalityWindowSize = 100,
      side = AnomalySide.TwoSided)
    .Fit(data)
    .Transform(data)

let spikeSsaAnomalies = 
  context
    .Data
    .CreateEnumerable<PricePrediction>(spikeSsaData, reuseRowObject = false)

let spikeSsaChartData = 
  (priceData, spikeSsaAnomalies)
  ||> Seq.zip
  |> Seq.map (fun (p, a) ->
      // For all anomalies, use closing price to show on the chart
      (a.Date, if (a.Prediction).[0] = 0. then None else Some (snd p)))
  |> Seq.filter (fun (_x, y) -> y.IsSome)
  |> Seq.map (fun (x, y) -> (x, y.Value))

let spikeSsaChart = 
  Chart.Scatter (spikeSsaChartData, StyleParam.Mode.Markers, Name = "spikeSsa")

[ priceChart; spikeSsaChart ]
|> Chart.combine
|> Chart.withTitle "Close Price (SpikeSsa)"
|> Chart.show

Price Anomalies - SpikeSsa

Now that we’ve gone over these three methods, you can see from their charts they have differing sensitivies to anomalies. This doesn’t even take into account hyperparameter optimization for options such as window-size, confidence, and ovaluehistory. As always, its great to have options, and I’ve found that depending on my needs I graviate to different methods. This has been a short foray into additional anomaly detection methods provided by ML.NET. Examples can go a long way to help wrapping your head around the possiblities. I hope you found this useful in your ML.NET projects, or perhaps has intrigued you enough to try it. Until next time.