2020-11-05

Rust and Dynamic Time Warping

Read Time: 7 minutes

I’m back to a familar topic, dynamic time warping. This time it’s my excuse to play with Rust.

As a bit of refresher, Dynamic Time Warping (DTW) is a useful mechanism to compare signals, or series, while adjusting for frequency variance. It has turned into one of my go-to algorithms when trying out a new language, Rust is no exception. Today I’m going to explore implementing the DTW algorithm in Rust. My goal is to be as idiomatic as possible, but since this is a learning process, I assume there will be some gaps. Obviously for all this to work I need Rust, installation instructions can be found here.

There is a little setup first. The plan is to read data from csv files, and use gnuplot for some sequence visualizations. As a sidenote, the gnuplot crate assumes gnuplot is installed. So I need to ensure that is installed.

extern crate csv;

use csv::ReaderBuilder;
use gnuplot::{Figure, Caption, Color};
use std::env;

Getting right into the meat of it, here is the dtw implementation. The cost function takes 2 float vectors (and a debug flag) and returns a float as the cost, or difference, between the two sequences. The internal structure that is traversed is a two-dimensional vector, the cost matrix. After the matrix initialization, I iterate through both sequences, comparing and storing cummulative “best” distances. As an initial pass I’m not interested in implementing some of the typical dtw optimizations. That makes this implementation straight-forward, and a good first step.

/// Calculate distance between values
fn distance(a: f32, b: f32) -> f32 {
  return f32::abs(a - b);
}

/// Calculate cost difference between two sequences (dtw-style)
fn cost(debug: bool, a: &[f32], b: &[f32]) -> f32 {
  // Init matrix
  let matrix_a_len = a.len() + 1;
  let matrix_b_len = b.len() + 1;
  let mut data = vec![vec![f32::MAX; matrix_b_len]; matrix_a_len];
  data[0][0] = 0.;

  // Calculate cost matrix
  // NOTE: ai and bi represent index in data (do -1 when referencing into the a and b vectors)
  for ai in 1..matrix_a_len {
    for bi in 1..matrix_b_len {
      let cost = distance(a[ai - 1], b[bi - 1]);
      data[ai][bi] = cost + f32::min(f32::min(data[ai - 1][bi], data[ai][bi - 1]), data[ai - 1][bi - 1]);
    }
  }

  // Dump cost matrix
  if debug {
    for ai in 0..matrix_a_len {
      for bi in 0..matrix_b_len {
        print!("{:5} ", (if data[ai][bi] == f32::MAX { String::from("-") } else { format!("{:.*}", 2, data[ai][bi]) }));
      }
      println!("");
    }
  }

  // Return final cost
  return data[a.len()][b.len()];
}

The algorithm doesn’t do much if it doesn’t have data. I use this an excuse to learn how to use a csv parser. I’m still getting a handle on the best way to deal with errors in Rust. For my purposes printing an error, but filtering out bad data works for me. The goal is to read a csv, and transform the first column into a vector. I then return the resulting vector to eventually be consumed by the previously defined cost function. The csv::ReaderBuilder works well for me out of the box, and it has several more options than I choose to use here.

/// Load a file for processing
/// Assumes rows with at least one column
fn load_file(debug: bool, filename: &String) -> Result<Vec<f32>, csv::Error> {
  if debug {
    println!("loading: {}", filename);
  }

  let reader =
    ReaderBuilder::new()
    .has_headers(false)
    .delimiter(b',')
    .from_path(&filename);

  match reader {
    Ok(mut r) => {
      let data: Vec<f32> =
        r
        .records()
        .filter_map(|x| x.ok())
        .map(|x| (x[0]).parse::<f32>())
        .filter_map(|x| { match &x {
                            Ok(_) => { () }
                            Err(e) => { println!("{}", &e) }
                        }
                        x.ok() })
        .collect();

      return Ok(data);
    }
    Err(e) => {
      Err(e)
    }
  }
}

To aid in the development process it is useful to visualize the data. For that I’ll use gnuplot, and thankfully there is a crate for me to use. As mentioned previously, I need to also have gnuplot installed, beyond just the crate. The comparison chart I put together here is pretty basic, but it meets my exploratory needs.

/// Make a chart containing 2 sequences (for comparison)
fn make_chart(debug: bool, filename: &str, a: &[f32], b: &[f32]) {
  let x_max = a.len().max(b.len()) as i32;
  let x: Vec<i32> = (0..x_max).collect();

  let mut chart = Figure::new();
  chart
    .set_terminal("pngcairo", filename)
    .axes2d()
    .lines(&x, a, &[Caption("A"), Color("red")])
    .lines(&x, b, &[Caption("B"), Color("blue")]);
  chart.show();
}

Now, I’m finally to the point where I can put it all together. To facilitate flexiblity, I take two csv files at the command line and compare their results. After calculating the cost I create a combined chart of the sequences.

/// main
fn main() {
  let args: Vec<String> = env::args().collect();

  let debug =
    args
      .iter()
      .map(|x| x == "--debug")
      .fold(false, |acc, x| acc || x);
  println!("{}", debug);

  let filename1 = &args[1];
  let filename2 = &args[2];
  
  let file1 = load_file(debug, filename1);
  let file2 = load_file(debug, filename2);

  match (file1, file2) {
    (Ok(file1), Ok(file2)) => {
      let cost = cost(debug, &file1, &file2);
      println!("cost: {}", cost);

      make_chart(debug, "chart1.png", &file1, &file2);
    }
    (Err(e), _) => { println!("Error: {} {}", filename1, e); }
    (_, Err(e)) => { println!("Error: {} {}", filename2, e); }
  }
}

With the code complete, it is time to test it out. For that I grabbed some cpu logs from different workloads. To simplify the post I’ll just focus on cpu; although memory, disk, and network are important to workload profiling. The existing algorithm can easily have a modified distance calculation to handle multiple dimensions. Now I just run the app to measure profile similarities and build charts for comparison.

1	$ cargo run cpu1.csv cpu2.csv

Workload 1
Workload 2
Workload 3
Workload 4
Workload 5

From the chart below, we can see Workloads 3, 4, and 5 have the closest profile. Workloads 1 and 5 have the highest distance difference.

	W2	W3	W4	W5
W1	1212	1613	2002	2218
W2		1572	1899	1928
W3			890	585
W4				1075

Using Rust for implementing DTW went relatively smoothly. My own code review exposes room for improvement and leveraging Rust better. So I already have some avenues to follow for more learnings. With that said, I’m pleased with how it all came together. Honestly, it’s too early to judge, but I’m thrilled with what I see from Rust. Even though I ran into the typical Rustism snags, coming from a functional background, I felt right at home with many of the idioms. And those snags are just room for growth and part of the learning process. Rust’s approach is solid and I like what I get out of some extra manual effort. Expect more posts as I continue the experiments. Until next time.