2019-10-11

FastDtw, an F# Implementation

Read Time: 8 minutes

Today’s post is a short introduction into using FastDtw for Dynamic Time Warping analysis. Specifically it is a quick introduction to using my newly released FastDtw package.

Dynamic Time Warping (DTW) can be a useful tool when comparing signals or series while adjusting for frequency variance. One downside to the standard algorithm is it can be expensive. There are many ways to mitigate the cost, FastDtw is one of them. While other methods typically hard-cap parameters on the search space, FastDtw provides a dynamic approach to reducing the search space while maintaining a higher level of accuracy. In this post I’ll discuss the usage of my F# package implementation of the FastDtw: Toward Accurate Dynamic Time Warping in Linear Time and Space paper. For those interested in the details, it is a pretty accessible paper. FastDtw is technically an approximation, but its flexible pathing strategy provides good results with some significant performance improvements over basic DTW.

To get started, you will need to have .NET Core installed. Once this is complete the application can be setup.

1
2
3

dotnet new console -lang F# -n BitCoinTrends 
cd BitCoinThrends
dotnet add package FastDtw

As always, there is some basic setup. I’ll include FastDtw (obviously), and a charting library for some series comparison visualizations.

open System
open System.IO
open FastDtw
open XPlot.GoogleCharts

The example will take a csv of Bitcoin/USD conversion values and transform it into a list of datasets broken down by month. The file is a 2010-2019 data download from Yahoo Finance. The format can be seen below, for today’s purposes I only care about the Data and Close fields.

1 2	Date,Open,High,Low,Close,Adj Close,Volume 2010-07-17,0.049510,0.049510,0.049510,0.049510,0.049510,0

I want to break the file into datasets by month. Since this is just a small script I’ll make a mini file processing pipeline to group and normalize the data by converting to a percentage of the previous day. This improves comparisons, especially with something as volatile as bitcoin.

[<EntryPoint>]
let main argv =
  // Convert file to month-group datasets
  let dataSets =
    File.ReadLines("data/BTC-USD.csv")
    |> Seq.skip 1 // header
    |> Seq.map (fun line ->
      // Convert line to desired tuple
      let columns = line.Split ','
      let month = DateTime.Parse(columns.[0]).ToString("yyyy-MM")
      let closing = float (columns.[4])
      (month, closing))
    |> Seq.groupBy fst // groupby month 
    |> Seq.map (fun (month, closingData) -> 
      // Aggregate grouped rows into their respective arrays
      let data = 
        closingData 
        |> Seq.map snd // extract closing
        |> Seq.mapFold (fun (a: float option) x -> 
          // Convert closing values to % change from previous day
          if a.IsSome then (x / a.Value, Some x) else (0., Some x)) None
        |> fst
        |> Seq.toArray

      (month, data))

Once the datasets are loaded, it is time to see which month most closely mirrors the trend of 9/2019. For all the other code in this post, this is really the highlight: let distance = FastDtw.Distance targetData data radius. This is where the comparison happens. Radius allows a configurable level of accuracy. It controls the per point search space as the series are compared. In most cases, distance is all that matters, but there are times when how the series match up can be useful. The DistanceWithPath function provides the series’ indexes that pair together.

let targetMonth = "2019-09"
let radius = 2

let targetData =
  dataSets
  |> Seq.filter (fun (month, _) -> month = targetMonth)
  |> Seq.head
  |> snd

let compares = 
  dataSets
  |> Seq.filter (fun (month, _) -> month <> targetMonth) // skip target
  |> Seq.map (fun (month, data)->
    let distance = FastDtw.Distance targetData data radius
    (month, distance, data))
  |> Seq.sortBy snd3 // Sort by compare result

let randomData =
  compares
  |> Seq.skip (Random().Next(Seq.length compares))
  |> Seq.head

Once the comparisons have been completed, it is time to show some comparison charts. I print out the top couple matches as well as a random match.

let showChart targetData compareData title seriesName =
  let dataToChartData data =
    data 
    |> Array.skip 1 // skip first entry (always 0)
    |> Array.mapi (fun i x -> (i, x)) 

  let targetChartData = targetData |> dataToChartData
  let compareChartData = compareData |> thd3 |> dataToChartData

  [targetChartData; compareChartData]
  |> Chart.Combo
  |> Chart.WithOptions (Options(title = sprintf "%s (%s)" title (fst3 compareData), 
                                series = [| Series("line"); Series("line") |]))
  |> Chart.WithLabels ["Target"; seriesName ]
  |> Chart.WithLegend true
  |> Chart.WithSize (700, 250)
  |> Chart.Show

showChart targetData randomData "% Daily Change Comparison" "Random Match"

compares
|> Seq.take 5
|> Seq.iteri (fun i x ->
  showChart targetData x "% Daily Change Comparison" (sprintf "Match %d" i))

0

Once the code is in place, it is time to look at some of the results. It is useful to not only see a good match, but an average match. This helps with the contrast. There is no guarantee there will be a good match, but of the months provided, 2012-10 isn’t a bad match to 2019-09, especially considering what a random match can produce. If you squint a bit, you even can see the portions of similar trends over a month, if only the elapsed time and amplitude is different.

Best Match

Second-Best Match

Random Match

Beyond the example, it is useful to see some of the performance differences with the stock DTW algorithm. For this I setup a quick BenchmarkDotNet test. The below results show the performance benefits. FastDtw is a multi-pass algorithm, so it’s not surprising that on very short series it’s overhead makes it slower. Comparisons for larger series get anywhere from 65% - 70% faster. One possible downside of the current implementation is more allocations, thus the extra GC events. This is one of those cases where the allocations can be reduced with a bit of refactoring, so this looks like a good place for future optimizations.

|  Method | SeriesSize |             Mean |         Error |        StdDev |     Gen 0 |     Gen 1 |     Gen 2 |  Allocated |
|-------- |----------- |-----------------:|--------------:|--------------:|----------:|----------:|----------:|-----------:|
|     dtw |         10 |         3.555 us |     0.0198 us |     0.0175 us |    1.8654 |         - |         - |    2.86 KB |
| fastDtw |         10 |         7.936 us |     0.0357 us |     0.0334 us |    3.9215 |         - |         - |    6.01 KB |
|     dtw |        100 |       210.675 us |     0.5531 us |     0.5174 us |   59.0820 |         - |         - |   94.99 KB |
| fastDtw |        100 |       145.968 us |     0.6435 us |     0.5704 us |   94.9707 |         - |         - |  149.76 KB |
|     dtw |       1000 |    20,856.656 us |    66.5105 us |    58.9598 us |  593.7500 |  500.0000 |  500.0000 |  148.99 KB |
| fastDtw |       1000 |     7,392.059 us |   137.4944 us |   135.0379 us | 1109.3750 |  828.1250 |  820.3125 |  446.05 KB |
|     dtw |      10000 | 2,082,283.166 us | 3,328.7961 us | 2,950.8911 us |         - |         - |         - | 1483.73 KB |
| fastDtw |      10000 |   653,825.530 us | 4,111.5875 us | 3,845.9815 us | 3000.0000 | 3000.0000 | 3000.0000 |  4134.2 KB |

This has been a quick introduction into the FastDtw package. I hope you’ve enjoyed this. Until next time.