Dynamic time warping

From Free net encyclopedia

Dynamic time warping is a technique applied in automatic speech recognition to cope with different speaking speeds. In general, it is a method that allows to find an optimal match between two given sequences (e.g. time series) with certain restrictions, i.e. the sequences are "warped" non-linearly to match each other. This sequence alignment method is often used in the context of hidden Markov models.

A typical example of the restrictions imposed on the matching of the sequences are the following:

  • continuity (no large gaps in the sequences should occur, e.g. only one item of a sequence may be skipped)
  • monotonicity (the order of the elements in the sequence should not be inverted)

The optimization process is carried out using dynamic programming, hence the name.

The extension of the problem for two-dimensional "series" like images (planar warping) is NP-complete, while the problem for one-dimensional signals like time series can be solved in polynomial time.

Example of one of the many forms of the algorithm

int DTWDistance(char s[1..n], char t[1..m], int d[1..n,1..m]) {
    declare int DTW[0..n,0..m]
    declare int i, j, cost

    for i := 1 to m
        DTW[0,i] := infinity
    for i := 1 to n
        DTW[i,0] := infinity
    DTW[0,0] := 0

    for i := 1 to n
        for j := 1 to m
            cost:= d[s[i],t[j]]
            DTW[i,j] := minimum(DTW[i-1,j  ] + cost,    // insertion
                                DTW[i  ,j-1] + cost,    // deletion
                                DTW[i-1,j-1] + cost)    // match

    return DTW[n,m]
}

Reference

  • C. S. Myers and L. R. Rabiner.
    A comparative study of several dynamic time-warping algorithms for connected word recognition.
    The Bell System Technical Journal, 60(7):1389-1409, September 1981.