The Resilient Propagation (RProp) algorithm

The RProp algorithm is a supervised learning method for training multi layered neural networks, first published in 1994 by Martin Riedmiller. The idea behind it is that the sizes of the partial derivatives might have dangerous effects on the weight updates. It implements an internal adaptive algorithm which focuses only on the signs of the derivatives and completely ignores their sizes. The algorithm computes the size of the weight update by involving an update value which depends on the weights. This value is independent from the size of the gradients.

\Delta w_{i,j}^{(t)}=\begin{cases} -\Delta_{i,j}^{(t)} & ,\text{if} \ \frac{\partial E^{(t)}}{\partial w_{i,j}} > 0 \\ +\Delta_{i,j}^{(t)} & ,\text{if} \ \frac{\partial E^{(t)}}{\partial w_{i,j}} < 0 \\ 0 & , \text{otherwise}\end{cases}

Here the ∂E(t)/∂wi,j value is the summarized gradient valid for the whole patterns set and it is obtained from a batch backpropagation. The second step of the RProp algorithm is to determine the Δ(t)i,j update values.

\Delta_{i,j}^{(t)}= \begin{cases} \eta^{+}\cdot\Delta_{i,j}^{(t-1)} & ,\text{if} \ \frac{\partial E^{(t-1)}}{\partial w_{i,j}} \cdot \frac{\partial E^{(t)}}{\partial w_{i,j}} > 0\\ \eta^{-}\cdot\Delta_{i,j}^{(t-1)} & ,\text{if} \ \frac{\partial E^{(t-1)}}{\partial w_{i,j}} \cdot \frac{\partial E^{(t)}}{\partial w_{i,j}} < 0\\ \Delta_{i,j}^{(t-1)} & , \text{otherwise}\end{cases}

When the wi,j weight changes its sign it means that the last update was too big and the algorithm has just escaped from a local minimum. In tihs case the update value Δ(t)i,j is decreased by the value of η . If the signs of the derivatives following each other are the same Δ(t)i,j is increased to speed up convergence. The values of η and η+ are constant, several tests proved that the choice of η = 0.5 and η+ = 1.2 gives very good result for almost all problems [Riedmiller, M. 1994].

The algorithm below shows the adaptation method of the RProp. The first part of RProp is a simple batch backpropagation which has already been discussed before.

\forall i,j : \Delta_{i,j}(t)=0
\forall i,j : \frac{\partial E^{(t-1)}}{\partial w_{i,j}}=0
\text{Repeat}
\text{\{}
\text{Compute gradient} \frac{\partial E^{(t)}}{\partial w_{i,j}} \text{(backpropagation)}
\text{For (all weights and biases)}
\text{\{}
\text{if} \left(\frac{\partial E^{(t-1)}}{\partial w_{i,j}}\cdot\frac{\partial E^{(t)}}{\partial w_{i,j}} > 0\right) \text{then}
\text{\{}
\Delta_{i,j}^{(t)}=\min(\Delta_{i,j}^{(t-1)}\cdot\eta^{+}, \Delta_{max})
\Delta w_{i,j}^{(t)}=sign\left(\frac{\partial E^{(t)}}{\partial w_{i,j}}\right)\cdot\Delta_{i,j}^(t)
w_{i,j}^{(t+1)}=w_{i,j}^{(t)}+\Delta w_{i,j}^{(t)}
\frac{\partial E^{(t-1)}}{\partial w_{i,j}}=\frac{\partial E^{(t)}}{\partial w_{i,j}}
\text{\}}
\text{else if}\left(\frac{\partial E^{(t-1)}}{\partial w_{i,j}}\cdot\frac{\partial E^{(t)}}{\partial w_{i,j}}<0\right) \text{then}
\text{\{}
\Delta_{i,j}^{(t)}=\max(\Delta_{i,j}^{(t-1)}\cdot\eta^{-}, \Delta_{min})
\frac{\partial E^{(t-1)}}{\partial w_{i,j}}=0
\text{\}}
\text{else if}\left(\frac{\partial E^{(t-1)}}{\partial w_{i,j}}\cdot\frac{\partial E^{(t)}}{\partial w_{i,j}}=0\right)\text{then}
\text{\{}
\Delta w_{i,j}^{(t)}=sign\left(\frac{\partial E^{(t)}}{\partial w_{i,j}}\right)\cdot\Delta_{i,j}^(t)
\ w_{i,j}^{(t+1)}=w_{i,j}^{(t)}+\Delta w_{i,j}^{(t)}
\frac{\partial E^{(t-1)}}{\partial w_{i,j}}=\frac{\partial E^{(t)}}{\partial w_{i,j}}
\text{\}}
\text{\}}
\text{\}}
\text{Until (converged)}

//

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s