.. _learning_rules: ============== Learning rules ============== Reservoir Computing techniques allow the use of a great variety of learning mechanisms to solve specific tasks. In ReservoirPy, these learning rules are sorted in two categories: **offline** learning rules and **online** learning rules. Nodes can be equipped with such learning rules, and learning can be triggered by using their, :py:meth:`~.Node.fit` (offline learning) and :py:meth:`~.Node.train` (online learning) methods. Offline learning rules - Linear regression ------------------------------------------ Offline learning rules are the most common learning rules in machine learning. They include gradient descent and linear regression amongst others. Within the Reservoir Computing field, linear regression is probably the simplest and the more used way of training an artificial neural network. Linear regression is said to be an *offline* learning rule because parameters of the linear regression model are learned given all available samples of data and all available samples of target values. Once the model is learned, it can not be updated without training the model on the whole dataset another time. Training and data gathering happen in two separate phases. Linear regression is implemented in ReservoirPy through the :py:class:`~.Ridge` node. Ridge node is equipped with a regularized linear regression learning rule, of the form :eq:`ridge`: .. math:: :label: ridge W_{out} = YX^\intercal (XX^\intercal + \lambda Id)^{-1} Where :math:`X` is a series of inputs, and :math:`Y` is a series of target values that the network must learn to predict. :math:`\lambda` is a regularization parameter used to avoid overfitting. In most cases, as the :py:class:`~.Ridge` node will be used within an Echo State Network (ESN), :math:`X` will represent the series of activations of a :py:class:`~.Reservoir` node over a timeseries. The algorithm will therefore compute a matrix of neuronal weights :math:`W_{out}` (and a bias term) such as predictions can be computed using equation :eq:`ridgeforward`. :math:`W_{out}` (and bias) is stored in the node :py:attr:`Node.params` attribute. .. math:: :label: ridgeforward y[t] = W_{out}^\intercal x[t] + bias which is the forward function of the Ridge node. :math:`y[t]` represents the state of the Ridge neurons at step :math:`t`, and also the predicted value given the input :math:`x[t]`. Offline learning with :py:meth:`~.Node.fit` ------------------------------------------- Offline learning can be performed using the :py:meth:`~.Node.fit` method. In the following example, we will use the :py:class:`~.Ridge` node. We start by creating some input data ``X`` and some target data ``Y`` that the model has to predict. .. ipython:: python X = np.arange(100)[:, np.newaxis] Y = np.arange(100)[:, np.newaxis] Then, we create a :py:class:`~.Ridge` node. Notice that it is not necessary to indicate the number of neurons in that node. ReservoirPy will infer it from the shape of the target data. .. ipython:: python from reservoirpy.nodes import Ridge ridge = Ridge().fit(X, Y) We can access the learned parameters looking at the ``Wout`` and ``bias`` parameter of the node. .. ipython:: python print(ridge.Wout, ridge.bias) As ``X`` and ``Y`` where the same timeseries, we can see learning was successful: the node has learned the identity function, with a weight of 1 and a bias of 0. Ridge regression can obviously handle much more complex tasks, such as chaotic attractor modeling or timeseries forecasting, when coupled with a reservoir inside an ESN. Offline learning with :py:meth:`~.Model.fit` -------------------------------------------- Models also have a :py:meth:`~.Model.fit` method, working similarly to the one of the Node class presented above. The :py:meth:`~.Model.fit` method can only be used if all nodes in the model are offline nodes, or are not trainable. If all nodes are offlines, then the :py:meth:`~.Node.fit` method of all offline nodes in the model will be called as soon as all input data is available. If input data for an offline node B comes from another offline node A, then the model will fit A on all available data, then run it, and finally resume training B. As an example, we will train the readout layer of an ESN using linear regression. We first create some toy dataset: the task we need the ESN to perform is to predict the cosine form of a wave given its sine form. .. ipython:: python X = np.sin(np.linspace(0, 20, 100))[:, np.newaxis] Y = np.cos(np.linspace(0, 20, 100))[:, np.newaxis] Then, we create an ESN model by linking a :py:class:`~.Reservoir` node with a :py:class:`~.Ridge` node. The :py:class:`~.Ridge` node will be used as readout and trained to learn a mapping between reservoir states and targeted outputs. We will regularize its activity using a ridge parameter of :math:`10^{-3}`. We will also tune some of the reservoir hyperparameters to obtain better results. We can then train the model using :py:meth:`~.Model.fit`. .. ipython:: python from reservoirpy.nodes import Reservoir, Ridge reservoir, readout = Reservoir(100, lr=0.2, sr=1.0), Ridge(ridge=1e-3) esn = reservoir >> readout esn.fit(X, Y) During that step, the reservoir has been run on the whole timeseries, and the resulting internal states has been used to perform a linear regression between these states and the target values, learning the connection weights between the reservoir and the readout. We can then run the model to evaluate its predictions: .. ipython:: python X_test = np.sin(np.linspace(20, 40, 100))[:, np.newaxis] predictions = esn.run(X_test) .. plot:: from reservoirpy.nodes import Reservoir, Ridge reservoir, readout = Reservoir(100, lr=0.2, sr=1.0), Ridge(ridge=1e-3) esn = reservoir >> readout X = np.sin(np.linspace(0, 20, 100))[:, np.newaxis] Y = np.cos(np.linspace(0, 20, 100))[:, np.newaxis] esn.fit(X, Y) X_test = np.sin(np.linspace(20, 40, 100))[:, np.newaxis] Y_test = np.cos(np.linspace(20, 40, 100))[:, np.newaxis] S = esn.run(X_test) plt.plot(Y_test, label="Ground truth cosinus") plt.plot(S, label="Predicted cosinus") plt.ylabel("ESN output") plt.xlabel("Timestep $t$") plt.legend() plt.show() Online learning rules --------------------- As opposed to offline learning, online learning allows to learn a task using only **local information in time**. Example of online learning rules are Hebbian learning rules, Least Mean Squares (LMS) algorithm or Recurrent Least Squares (RLS) algorithm. These rules can update the parameters of a model one sample of data at a time, or one episode at a time to borrow vocabulary used in the Reinforcement Learning field. While most deep learning algorithms can not used such rules to update their parameters, as gradient descent algorithms requires several samples of data at a time to obtain convergence, Reservoir Computing algorithms can use this kind of rules. Indeed, only readout connections need to be trained. A single layer of neurons can be trained using only local information (no need for gradients coming from upper layers in the models and averaged over several runs). Online learning with :py:meth:`~.Node.train` -------------------------------------------- Online learning can be performed using the :py:meth:`~.Node.train` method. In the following example, we will use the :py:class:`~.FORCE` node, a single layer of neurons equipped with an online learning rule called FORCE algorithm. We start by creating some input data ``X`` and some target data ``Y`` that the model has to predict. .. ipython:: python X = np.arange(100)[:, np.newaxis] Y = np.arange(100)[:, np.newaxis] Then, we create a :py:class:`~.FORCE` node. Notice that it is not necessary to indicate the number of neurons in that node. ReservoirPy will infer it from the shape of the target data. .. ipython:: python from reservoirpy.nodes import FORCE force = FORCE() The :py:meth:`~.Node.train` method can be used as the call method of a Node. Every time the method is called, it updates the parameter of the node along with its internal state, and return the state. .. ipython:: python s_t1 = force.train(X[0], Y[0]) print("Parameters after first update:", force.Wout, force.bias) s_t1 = force.train(X[1], Y[1]) print("Parameters after second update:", force.Wout, force.bias) The :py:meth:`~.Node.train` method can also be called on a timeseries of variables and targets, in a similar way to what can be done with the :py:meth:`~.Node.run` function. All states computed during the training will be returned by the node. .. ipython:: python force = FORCE() S = force.train(X, Y) As the parameters are updated incrementally, we can see convergence of the model throughout training, as opposed to offline learning where parameters can only be updated once, and evaluated at the end of the training phase. We can see that convergence is really fast. Only the first timesteps of output display visible errors: .. plot:: from reservoirpy.nodes import FORCE X = np.arange(100)[:, np.newaxis] Y = np.arange(100)[:, np.newaxis] force = FORCE() S = force.train(X, Y) plt.plot(S, label="Predicted") plt.plot(Y, label="Training targets") plt.title("Activation of FORCE readout during training") plt.xlabel("Timestep $t$") plt.legend() plt.show() We can access the learned parameters looking at the ``Wout`` and ``bias`` parameter of the node. .. ipython:: python print(force.Wout, force.bias) As ``X`` and ``Y`` where the same timeseries, we can see learning was successful: the node has learned the identity function, with a weight of 1 and a bias close to 0. Online learning with :py:meth:`~.Model.train` --------------------------------------------- Models also have a :py:meth:`~.Model.train` method, working similarly to the one of the Node class presented above. The :py:meth:`~.Model.train` method can only be used if all nodes in the model are online nodes, or are not trainable. If all nodes are online, then the :py:meth:`~.Node.train` methods of all online nodes in the model will be called in the topological order of the graph defined by the model. At each timesteps, online nodes are trained, called, and their updated states are given to the next nodes in the graph. As an example, we will train the readout layer of an ESN using FORCE learning. We first create some toy dataset: the task we need the ESN to perform is to predict the cosine form of a wave given its sine form. .. ipython:: python X = np.sin(np.linspace(0, 20, 100))[:, np.newaxis] Y = np.cos(np.linspace(0, 20, 100))[:, np.newaxis] Then, we create an ESN model by linking a :py:class:`~.Reservoir` node with a :py:class:`~.FORCE` node. The :py:class:`~.FORCE` node will be used as readout and trained to learn a mapping between reservoir states and targeted outputs. We will tune some of the reservoir hyperparameters to obtain better results. We can then train the model using :py:meth:`~.Model.train`. .. ipython:: python from reservoirpy.nodes import Reservoir, FORCE reservoir, readout = Reservoir(100, lr=0.2, sr=1.0), FORCE() esn = reservoir >> readout predictions = esn.train(X, Y) During that step, the reservoir has been trained on the whole timeseries using online learning. We can have a look at the outputs produced by the model during training to evaluate convergence: .. plot:: X = np.sin(np.linspace(0, 20, 100))[:, np.newaxis] Y = np.cos(np.linspace(0, 20, 100))[:, np.newaxis] from reservoirpy.nodes import Reservoir, FORCE reservoir, readout = Reservoir(100, lr=0.2, sr=1.0), FORCE() esn = reservoir >> readout S = esn.train(X, Y) plt.plot(S, label="Predicted") plt.plot(Y, label="Training targets") plt.title("Activation of FORCE readout during training") plt.xlabel("Timestep $t$") plt.legend() plt.show() We can then run the model to evaluate its predictions: .. ipython:: python X_test = np.sin(np.linspace(20, 40, 100))[:, np.newaxis] predictions = esn.run(X_test) .. plot:: from reservoirpy.nodes import Reservoir, FORCE reservoir, readout = Reservoir(100, lr=0.2, sr=1.0), FORCE() esn = reservoir >> readout X = np.sin(np.linspace(0, 20, 100))[:, np.newaxis] Y = np.cos(np.linspace(0, 20, 100))[:, np.newaxis] esn.train(X, Y) X_test = np.sin(np.linspace(20, 40, 100))[:, np.newaxis] Y_test = np.cos(np.linspace(20, 40, 100))[:, np.newaxis] S = esn.run(X_test) plt.plot(Y_test, label="Ground truth cosinus") plt.plot(S, label="Predicted cosinus") plt.ylabel("ESN output") plt.xlabel("Timestep $t$") plt.legend() plt.show()