I am attending a short workshop from 30th April to 4th May on the topic of Nonlinear Time Series Analysis organized by the Indian Institute of Sciences, Bangalore. The course is being taught by Professor Marco Bittelli of the University of Bologna. I intend to keep (very brief) summaries of what is covered each day as it is both interesting and will benefit my understanding.
A large part of the ideas that Prof. Marco introduced today were related to a post I had made sometime back in connection to preparation for the Fusion fest. But he also introduced the idea of modeling data using the methods of Nonlinear time series analysis (NLTS). NLTS is especially useful if the collected data when analyzed using traditional statical tools such as Autocorrelation seems to be generated from a stochastic process. Indeed, NLTS can reveal the hidden patterns that often escape detection by methods designed for random datasets. An interesting application of NLTS is Prof. Marco’s study on wind-velocity patterns to examine the feasibility of a windmill farm in a sugarcane farm in Florida, USA.
A very fascinating theorem (actually a simplified version of Takens’ theorem) and a key step in NLTS is the (approximate) reconstruction of the phase portrait for a multi-variable deterministic dynamical system through the embedding of a single variable with appropriate dimensions and time delay! Embedding basically involves creating vectors (of a certain dimension) out of the data for a single variable by choosing points according to carefully chosen time-delays, and plotting them. What is meant by “carefully chosen” will be talked about in a later session. This means that for a big class of dynamical systems, only a single variable suffices to describe their state evolution. For example, the state of an ideal pendulum, undergoing small oscillations, physically governed by its position and velocity at any instant, can almost entirely be described by just making use of its positions.
We started by learning about the False Nearest Neighbor algorithm (FNN) to determine the appropriate embedding dimension for our chosen vector. I found the basic idea behind FNN intuitive if one considers the analogy of viewing players on a field sideways and from a drone point-of-view. Then Prof. Marco introduced the notion of the mutual information function to quantify the correlation between values of a variable at different times. Thus it is reasonable that one chooses that time delay for which the mutual information is not too high (redundant information) nor too low (totally random values). The highlight was the Singular Spectrum Analysis (SSA), that is used in time series analysis at the very first step: data pre-processing, i.e., removal of the noise from the signal. Not only does the SSA give the clean signal, it also determines all the dominant cycles in the signal- almost magical. The code for all the algorithms was executed in R. SSA uses a large variety of fields- linear algebra, signal processing, ODEs, functional analysis are the most important ones. Moreover, SSA is also able to find the variance from the signal mean for each cycle and so one can discard the cycles which don’t contribute much to the signal. Suffice it to say it is an extremely sophisticated noise filter. An accessible introduction the algorithm of SSA can be found in the book Nonlinear Time Series Analysis with R, of which Prof. Marco is an author.
The crux of today’s session was to determine if the deterministic nature of the time series that we detect after SSA and single-variable embedding is really due to a deterministic dynamical system or produced by a linear stochastic system, in which case we have only stochastic analysis to fall back on. The technique used to test the null-hypothesis is Surrogate Data testing. It uses a combination of statistical methods like Convergent Cross mapping (CCM) to test for the null hypothesis. To be honest, I didn’t completely understand the mechanics of CCM. If you have a nice way to explain CCM, I would appreciate that you comment about it below!
The idea of Surrogate data testing seemed like a really critical step towards progressing towards building a model for the dynamical system, because you don’t want to later find out that the deterministic structure that you were detecting was due to the darn noise. This is really diligent methodology!
Today we came to the final step of NLTS: phenomenological modeling. I found it really fascinating because the technique of modeling that Prof. Marco introduced relied on both analytical and statistical tools. First off, one could simply try fitting a curve to the time series, an if one is lucky- voila! A model is found. But the truth is that rarely are dynamical systems so well behaved that a relatively simple combination of sinusoidal, exponential and/or logarithmic functions can model it.
Here, we tried modeling the data in terms a system of ODEs of only first order. I am not sure if the first order will always suffice but perhaps it does for most time series. The number of variables is, quite fascinatingly, determined by the single-variable embedding dimension. This has an important consequence- choose too big a dimension and you are going to be dealing with a monstrous ODE system! The form of the ODE system for 3 variables looks like this:
Now, the determination of the coefficients are done using the Ordinary least squares method. Here one has to ensure that the resulting model isn’t overfitting, which would mean that the model is only imitating the data without detecting macro trends and so ultimately having no predictive capabilities.
We applied the steps of NLTS to analyze time series for pertussis, scarlet fever and measles in a certain region. It was extremely interesting to see that initial time series, seeming to be stochastic was in fact confined in a very well-defined phase-space.
I am convinced that investing the time to learn the techniques of NLTS will pay off for anyone studying real-world phenomena. One does encounter time-series everywhere! You can read a very nice introduction to this field from Prof. Marco’s text: Nonlinear Time Series Analysis with R. (I happen to have a soft-copy for the book, so if you are interested in reading it, you can drop a comment below or email me.)