Exploring Gated Recurrent Unit (Gru) Activation Functions

Introduction to Gated Recurrent Unit (GRU)

Gated Recurrent Unit (GRU) is a type of Recurrent Neural Network (RNN) architecture that has been widely used in various natural language processing tasks and time series prediction problems. The GRU architecture was introduced by Cho et al. in 2014 as an alternative to the popular Long Short-Term Memory (LSTM) architecture. The primary advantage of GRU is its simplified structure compared to LSTM, which leads to faster training and reduced computational resources.

In this blog post, we will explore the use of different activation functions in GRU and their impact on performance. We will first provide an overview of the mathematical formulation of GRU and then discuss how to implement GRU with custom activation functions using TensorFlow and Keras in Python.

GRU Formulation

The GRU is represented by three main equations, which describe the update gate (z), reset gate (r), and new hidden state (h). The equations are as follows:

z = σ(Wz * x + Uz * h_prev + bz)

r = σ(Wr * x + Ur * h_prev + br)

h = (1 - z) * h_prev + z * tanh(Wh * x + Uh * (r * h_prev) + bh)

Where W, U, b are the respective weight and bias matrices for the update and reset gates, σ is the Sigmoid activation function, and tanh is the hyperbolic tangent activation function. The element-wise multiplication is denoted by *.

Now let's dive into implementing GRU with different activation functions.

Implementing GRU with Custom Activation Functions in TensorFlow and Keras

We will now demonstrate how to implement a GRU layer in TensorFlow and Keras using custom activation functions. For this example, we will replace the tanh function with relu in the final hidden state computation. The following Python code snippet shows how to achieve this:

import tensorflow as tf
from tensorflow.keras.layers import GRU, Layer

class CustomGRU(GRU):
    def __init__(self, *args, **kwargs):
        super(CustomGRU, self).__init__(*args, **kwargs)

    def _activation(self, x):
        return tf.nn.relu(x)

    def step(self, inputs, states):
        h_tm1 = states[0]  # previous memory state

        matrix_x = tf.keras.backend.dot(inputs, self.kernel)
        matrix_x = tf.keras.backend.bias_add(matrix_x, self.input_bias)

        x_z, x_r, x_h = tf.split(matrix_x, 3, axis=-1)
        z = tf.sigmoid(x_z)
        r = tf.sigmoid(x_r)

        matrix_inner = tf.keras.backend.dot(h_tm1 * r, self.recurrent_kernel)
        matrix_inner = tf.keras.backend.bias_add(matrix_inner, self.recurrent_bias)
        _, _, recurrent_h = tf.split(matrix_inner, 3, axis=-1)

        h = (1.0 - z) * h_tm1 + z * self._activation(recurrent_h)  # we replace tanh with relu

        return h, [h]

# Usage example
model = tf.keras.Sequential()
model.add(CustomGRU(units=10, input_shape=(None, 5), return_sequences=True))

In the code above, we define a custom GRU layer called CustomGRU that inherits from the standard Keras GRU layer. We then overwrite the _activation function to use the relu activation and modify the step function to utilize our custom activation function.

Conclusion

In this post, we have explored the use of custom activation functions in the GRU architecture for deep learning tasks. We discussed the mathematical formulation of GRU and provided an example of implementing a custom GRU layer in TensorFlow and Keras using the relu activation function. Exploring different activation functions in GRU layers can offer performance improvements and facilitate the development of more robust models for various machine learning tasks. So, feel free to experiment with other activation functions in your projects!