The matrix form of the RProp algorithm

Since the RProp algorithm uses if/else conditional statements while determining update values some special helper matrix-functions and helper matrices must be introduced. These functions will allow us to express the conditional statements more elegantly, only using matrix operations.

Some matrices needed to make the above functions work:

  • D: decision matrix,
  • M: meta-gradient matrix,
  • U: update matrix,
  • U, U0, U+: separated update matrices,
  • M, M0, M+: separated meta-gradient matrices.

The required functions:

  • [A, A0, A+] = SeparateBySign(A): Produces three binary matrices from the matrix A. The first result contains ones where A contains negative values. The second contains ones where A contains zeros. The third contains ones where A contains positive values. All other values are zeros.
  • A = MinimumWhereNotZero(B,λ): Produces the minimum of a matrix and a scalar value only taking into account values that are not zeros. Zero values in the matrix are left untouched otherwise the result matrix contains the smaller value.
  • A = MaximumWhereNotZero(B,λ): Produces the maximum of a matrix and a scalar value only taking into account values that are not zeros. Zero values in the matrix are left untouched otherwise the result matrix contains the greater value.
  • A = ZeroWhereNegative(B): The resulting A matrix will contain zeros where B contained negative values.

Some constants are also to be defined for the algorithm. Fortunately the algorithm is not too sensitive to the values of these constants so they can be fixed globally.

λ+= 1.2: Increment of updates,
λ = 0.5: Decrement of updates,
Δmax= 50: Maximal update,
Δmin= 0.000001: Minimal update.

Based on the above definitions the RProp algorithm is as follows (i means the current iteration):

matrixrprop

The main advantage of the RProp algorithm is its fast convergence. This is due to the fact that the weight updates are not disturbed by the unpredictable behavior of the gradients. Another advantage is the simplicity of the algorithm; no constants have to be fine-tuned for different training patterns.

//

Leave a Reply

Your email address will not be published. Required fields are marked *