Processing math: 57%
Search
DTW with Global Invariances

Note. This work is part of Titouan Vayer's PhD thesis. We are co-supervising Titouan together with Laetitia Chapel and Nicolas Courty.

In this work, we address the problem of comparing time series while taking into account both feature space transformation and temporal variability. The proposed framework combines a latent global transformation of the feature space with the widely used Dynamic Time Warping (DTW). This work is available as a preprint (Vayer et al., 2020).

Definition

Let x and x be two time series of respective lengths n and m. Here, the features from the two time series are not assumed to lie in the same ambient space, but it is assumed that features from x lie in Rp while features from x lie in Rp. In the following, we assume pp without loss of generality. In order to allow comparison between time series x and x, we optimize on a family of functions F that map features from x onto the feature space in which features from x lie. F is hence the family of registration functions. More formally, we define Dynamic Time Warping with Global Invariances (DTW-GI) as the solution of the following joint optimization problem:

DTW-GI(x,x)=min

where \mathcal{F} is a family of functions from \mathbb{R}^{p^\prime} to \mathbb{R}^{p}.

This similarity measure estimates both temporal alignment and feature space transformation between time series simultaneously, allowing the alignment of time series when the similarity should be defined up to a global transformation. Time series do not have to lie in the same ambient space, as presented in the following Figure:

DTW-GI aligns time series by optimizing on temporal alignment (through Dynamic Time Warping) and feature space transformation (denoted f here). Time series represented here are color-coded trajectories, whose starting (resp. end) point is depicted in blue (resp. red).

Optimization

Optimization of the quantity in Equation \eqref{eq:dtwgi} can be performed via Block Coordinate Descent. In a nutshell, the optimization process alternates between the following two steps:

  1. for a fixed f, the optimal alignment path \pi is obtained through the DTW algorithm;
  2. for a fixed path \pi, the optimal map f (when \mathcal{F} is the Stiefel manifold) is obtained through Singular Value Decomposition.

Interestingly, this optimization strategy where we alternate between time series alignment, i.e. time correspondences between both time series, and feature space transform optimization can be seen as a variant of the Iterative Closest Point (ICP) method in image registration (Chen & Medioni, 1992), in which nearest neighbors are replaced by matches resulting from DTW alignment.

The resulting algorithm is detailed in the following code:

import scipy
from tslearn.utils import to_time_series
from tslearn.metrics import dtw_path

def dtw_gi(ts0, ts1, init_p=None, max_iter=20, verbose=False, use_bias=False):
    r"""Compute Dynamic Time Warping with Global Invariance (DTW-GI) similarity
    measure between (possibly multidimensional) time series and return it.

    Parameters
    ----------
    ts0: array of shape (sz0, d0)
        A time series.

    ts1: array of shape (sz1, d1)
        A time series.

    init_p : array of shape (d0, d1) (default: None)
        Initial p matrix for the Stiefel linear map. If None, identity matrix
        is used.

    max_iter : int (default: 20)
        Number of iterations for the iterative optimization algorithm.

    verbose: boolean (default: True)
        Whether information should be printed during optimization

    use_bias: boolean (default: False)
        If True, the feature space map is affine, otherwise it is linear.

    Returns
    -------
    w_pi
        Warping matrix (binary matrix of shape (sz0, sz1))

    p
        Stiefel matrix

    cost
        Similarity score
    """
    ts0_ = to_time_series(ts0, remove_nans=True)
    ts1_ = to_time_series(ts1, remove_nans=True)

    sz0, d0 = ts0_.shape
    sz1, d1 = ts1_.shape

    ts0_m = ts0_.mean(axis=0).reshape((-1, 1))
    ts1_m = ts1_.mean(axis=0).reshape((-1, 1))

    w_pi = numpy.zeros((sz0, sz1))
    if init_p is None:
        p = numpy.eye(d0, d1)
    else:
        p = init_p
    bias = numpy.zeros((d0, 1))

    # BCD loop
    for iter in range(1, max_iter + 1):
        w_pi_old = w_pi
        # Temporal alignment
        path, cost = dtw_path(ts0_, ts1_.dot(p.T) + bias.T)
        if verbose:
            print("Iteration {}: DTW cost: {:.3f}".format(iter, cost))
        w_pi = path2mat(path)
        if numpy.allclose(w_pi, w_pi_old):
            break
        # Feature space registration
        if use_bias:
            m = (ts0_.T - ts0_m).dot(w_pi).dot(ts1_ - ts1_m.T)
        else:
            m = (ts0_.T).dot(w_pi).dot(ts1_)
        u, sigma, vt = scipy.linalg.svd(m, full_matrices=False)
        p = u.dot(vt)
        if use_bias:
            bias = ts0_m - ts1_m.T.dot(p.T).T
    path, cost = dtw_path(ts0_, ts1_.dot(p.T) + bias.T)
    if verbose:
        print("After optimization: DTW cost: {:.3f}".format(cost))
    if use_bias:
        return w_pi, p, bias, cost
    else:
        return w_pi, p, cost

Results

We show that one can compute barycenters for this metric (using a similar approach as the DTW Barycenter Averaging from (Petitjean, Ketterlin, & Gançarski, 2011)), even when time series to be averaged do not lie in the same ambient space:

We also introduce soft counterparts following the definition of softDTW from (Cuturi & Blondel, 2017). In this case, the optimization process consists of a gradient descent, and a wider variety of feature space transformation families can be considered.

We validate the utility of these similarity measures on real world datasets on the tasks of human motion prediction (where motion is captured under different points of view) and cover song identification (where song similarity is defined up to a key transposition). In both these settings, we observe that joint optimization on feature space transformation and temporal alignment improves over standard approaches that consider these as two independent steps.

References

  1. Vayer, T., Chapel, L., Courty, N., Flamary, R., Soullard, Y., & Tavenard, R. (2020). Time Series Alignment with Global Invariances.
  2. Chen, Y., & Medioni, G. (1992). Object modelling by registration of multiple range images. Image and Vision Computing, 10(3), 145–155.
  3. Petitjean, F., Ketterlin, A., & Gançarski, P. (2011). A global averaging method for dynamic time warping, with applications to clustering. Elsevier Pattern Recognition, 44(3), 678–693.
  4. Cuturi, M., & Blondel, M. (2017). Soft-DTW: a differentiable loss function for time-series. In Proceedings of the International Conference on Machine Learning (pp. 894–903).