Main Flavors of Reservoir Computing


The core idea of reservoir computing has been independently found several times, couched in different scientific contexts, leading to a number of "flavors"  (adapted from [1]):

Echo State Networks

Echo State Networks (ESNs) [2] represent one of the two pioneering Reservoir Computing (RC) methods. The approach is based on the observation that if a random recurrent neural network (RNN) possesses certain algebraic properties, training only a linear readout from it is often sufficient to achieve excellent performance in practical applications. The untrained RNN part of an ESN is called a dynamical reservoir, and its states are termed echoes of its input history [3].

ESNs standardly use "weighted sum and nonlinearity" type of simulated analog-valued neurons, most often with a tanh() nonlinearity. Leaky integration of the neurons' state recently became a standard practice in ESNs [4]. Classical recipes and conditions of producing the ESN reservoir were outlined in the original introduction of ESNs [3]. The readout from the reservoir is usually linear. The original and most popular batch training method to compute the output weights is linear regression. For online training settings the computationally cheap least mean squares algorithm is recommended [5].

The initial ESN publications [3][6][7][8][5] were framed in settings of machine learning and nonlinear signal processing applications. The original theoretical contributions of early ESN research concerned algebraic properties of the reservoir that make this approach work in the first place (the echo state property [3]) and analytical results characterizing the dynamical short-term memory capacity [7] of reservoirs.

Liquid State Machines

Liquid State Machines (LSMs) [9] are the other pioneering reservoir method, developed independently from and simultaneously with ESNs. LSMs were developed from a computational neuroscience background, aiming at elucidating the principal computational properties of neural microcircuits [9][10][11][12]. Thus LSMs use more sophisticated and biologically realistic models of spiking integrate-and-fire neurons and dynamic synaptic connection models in the reservoir. The connectivity among the neurons often follows topological and metric constraints that are biologically motivated. In the LSM literature, the reservoir is often referred to as the liquid, following an intuitive metaphor of the excited states as ripples on the surface of a pool of water. Inputs to LSMs also usually consist of spike trains. In their readouts LSMs originally used multilayer feedforward neural networks (of either spiking or sigmoid neurons), or linear readouts similar to ESNs [9]. Additional mechanisms for averaging spike trains to get real-valued outputs are often employed.

RNNs of the LSM-type with spiking neurons and more sophisticated synaptic models are usually more difficult to implement, to correctly set up and tune, and typically more expensive to emulate on digital computers (with a possible exception of event-driven spiking NN simulations, where the computational load varies depending on the amount of activity in the NN) than simple "weighted sum and nonlinearity" RNNs. Thus they are less widespread for engineering applications of RNNs than ESNs. However, while the ESN-type neurons only emulate mean firing rates of biological neurons, spiking neurons are able to perform more complicated information processing, due to the time coding of the information in their signals (i.e., the exact timing of each firing also matters). Also findings on various mechanisms in natural neural circuits are more easily transferable to these more biologically-realistic models.

The main theoretical contributions of the LSM brand to RC consist in analytical characterizations of the computational power of such systems [9][13].

Backpropagation-Decorrelation

The idea of separation between a reservoir and a readout function has also been arrived at from the point of view of optimizing the performance of the classical RNN training algorithms that use error backpropagation. In an analysis of the weight dynamics of an RNN trained using the Atya-Parlos recurrent learning (APRL) algorithm [14], it was revealed that the output weights of the network being trained change quickly, while the hidden weights change slowly and in the case of a single output the changes are column-wise coupled [15]. Thus in effect APRL decouples the RNN into a quickly adapting output and a slowly adapting reservoir. Inspired by these findings a new iterative/online RNN training method, called BackPropagation-DeCorrelation (BPDC), was introduced [16]. It approximates and significantly simplifies the APRL method, and applies it only to the output weights, turning it into an online RC method. BPDC uses the same type of neurons as ESNs. BPDC learning is claimed to be insensitive to the parameters of the fixed reservoir. BPDC boasts fast convergence times and thus is capable of tracking quickly changing signals.

Temporal Recurrent Networks

This summary of RC brands would be incomplete without a spotlight directed at Peter F. Dominey's decade-long research suite on cortico-striatal circuits in the human brain (e.g., [17][18][19], and many more). Although this research is rooted in empirical cognitive neuroscience and functional neuroanatomy and aims at elucidating complex neural structures rather than theoretical computational principles, it is probably Dominey who first clearly spelled out the RC principle: "... there is no learning in the recurrent connections [within a subnetwork corresponding to a reservoir], only between the State [i.e., reservoir] units and the Output units. Second, adaptation is based on a simple associative learning mechanism ..." [20]. It is also in this article where Dominey brands the neural reservoir module as a Temporal Recurrent Network. The learning algorithm, to which Dominey alludes, can be seen as a version of the Least Mean Squares algorithm. At other places, Dominey emphasizes the randomness of the connectivity in the reservoir: "... It is worth noting that the simulated recurrent prefrontal network relies on fixed randomized recurrent connections, ..." [21]. Only in early 2008 did Dominey and ``computational'' RC researchers become aware of each other.


References

  1. Lukoševičius, M., and H. Jaeger,
    "Reservoir computing approaches to recurrent neural network training",
    Computer Science Review , vol. 3, no. 3, pp. 127-149, August, 2009.
  2. Jaeger, H.,
    "Echo state network",
    Scholarpedia, vol. 2, no. 9, pp. 2330, 2007.
  3. Jaeger, H.,
    "The ``echo state'' approach to analysing and training recurrent neural networks - with an Erratum note",
    GMD Report 148: German National Research Center for Information Technology, 2001.
  4. Jaeger, H., M. Lukoševičius, D. Popovici, and U. Siewert,
    "Optimization and applications of echo state networks with leaky-integrator neurons",
    Neural Networks, vol. 20, no. 3, pp. 335-352, 2007.
  5. Jaeger, H., and H. Haas,
    "Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless telecommunication",
    Science, vol. 304, no. 5667, pp. 78–80, April 2, 2004.
  6. Jaeger, H.,
    "Short term memory in echo state networks",
    GMD Report 152: German National Research Center for Information Technology, 2001.
  7. Jaeger, H.,
    "Tutorial on training recurrent neural networks, covering {BPTT}, {RTRL}, {EKF} and the ``echo state network'' approach",
    GMD Report 159: German National Research Center for Information Technology, pp. 48 pp., 2002.
  8. Jaeger, H.,
    "Adaptive nonlinear system identification with echo state networks",
    In: Advances in Neural Information Processing Systems: MIT Press, Cambridge, MA, , pp. 593-600, 2003, 2002.
  9. Maass, W., T. Natschlaeger, and H. Markram,
    "Real-time Computing without stable states: A New Framework for Neural Computation Based on Perturbations",
    Neural Computation, vol. 14, no. 11, pp. 2531–2560, 2002.
  10. Maass, W., T. Natschlaeger, and H. Markram,
    "A model for real-time computation in generic neural microcircuits",
    NIPS 2002: Advances in Neural Information Processing Systems, vol. 15: MIT Press, pp. 213–220, 2003, 2002.
  11. Natschläger, T., H. Markram, and W. Maass,
    "Computer models and analysis tools for neural microcircuits",
    A Practical Guide to Neuroscience Databases and Associated Tools: Kluver Academic Publishers (Boston), 2002.
  12. Maass, W., T. Natschläger, and H. Markram,
    "Computational models for generic cortical microcircuits",
    Computational Neuroscience: A Comprehensive Approach: CRC-Press, 2004.
  13. Maass, W., P. Joshi, and E. D. Sontag,
    "Principles of real-time computing with feedback applied to cortical microcircuit models",
    NIPS 2005: Advances in Neural Information Processing Systems, vol. 18, Cambridge, MA, MIT Press, pp. 835–842, 2006, 2005.
  14. Atiya, A. F., and A. G. Parlos,
    "New Results on Recurrent Network Training: unifying the Algorithms and Accelerating Convergence",
    IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 697–709, 2000.
  15. Schiller, U. D., and J. J. Steil,
    "Analyzing the weight dynamics of recurrent learning algorithms",
    Neurocomputing, vol. 63C, pp. 5-23, 2005.
  16. Steil, J. J.,
    "Backpropagation-Decorrelation: online recurrent learning with O(N) complexity,",
    IJCNN, 2004.
  17. Dominey, P. F.,
    "Complex sensory-motor sequence learning based on recurrent state representation and reinforcement learning",
    Biological Cybernetics, vol. 73, no. 3, pp. 265-74, 08/1995.
  18. Dominey, P. F., M. Hoen, J. - M. Blanc, and T. Lelekov-Boissard,
    "Neurological basis of language and sequential cognition: evidence from simulation, aphasia, and ERP studies",
    Brain and Language, vol. 86, no. 2, pp. 207-225, 08/2003.
  19. Dominey, P. F., M. Hoen, and T. Inui,
    "A Neurolinguistic Model of Grammatical Construction Processing",
    Journal of Cognitive Neuroscience, vol. 18, no. 12, pp. 2088-2107, 2006.
  20. Dominey, P. F., and F. Ramus,
    "Neural network processing of natural language: I. Sensitivity to serial, temporal and abstract structure of language in the infant",
    Language and Cognitive Processes, vol. 15, no. 1, pp. 87-127, 2000.
  21. Citekey Dominey05 not found