Gated Recurrent Unit (GRU) is a type of Recurrent Neural Network (RNN) architecture that has been widely used in various natural language processing tasks and time series prediction problems. The GRU architecture was introduced by Cho et al. in 2014 as an alternative to the popular Long Short-Term Memory (LSTM) architecture. The primary advantage of GRU is its simplified structure compared to LSTM, which leads to faster training and reduced computational resources.
In this blog post, we will explore the use of different activation functions in GRU and their impact on performance. We will first provide an overview of the mathematical formulation of GRU and then discuss how to implement GRU with custom activation functions using TensorFlow and Keras in Python.
The GRU is represented by three main equations, which describe the update gate (z
), reset gate (r
), and new hidden state (h
). The equations are as follows:
z = σ(Wz * x + Uz * h_prev + bz)
r = σ(Wr * x + Ur * h_prev + br)
h = (1 - z) * h_prev + z * tanh(Wh * x + Uh * (r * h_prev) + bh)
Where W
, U
, b
are the respective weight and bias matrices for the update and reset gates, σ
is the Sigmoid activation function, and tanh
is the hyperbolic tangent activation function. The element-wise multiplication is denoted by *
.
Now let's dive into implementing GRU with different activation functions.
We will now demonstrate how to implement a GRU layer in TensorFlow and Keras using custom activation functions. For this example, we will replace the tanh
function with relu
in the final hidden state computation. The following Python code snippet shows how to achieve this:
import tensorflow as tf from tensorflow.keras.layers import GRU, Layer class CustomGRU(GRU): def __init__(self, *args, **kwargs): super(CustomGRU, self).__init__(*args, **kwargs) def _activation(self, x): return tf.nn.relu(x) def step(self, inputs, states): h_tm1 = states[0] # previous memory state matrix_x = tf.keras.backend.dot(inputs, self.kernel) matrix_x = tf.keras.backend.bias_add(matrix_x, self.input_bias) x_z, x_r, x_h = tf.split(matrix_x, 3, axis=-1) z = tf.sigmoid(x_z) r = tf.sigmoid(x_r) matrix_inner = tf.keras.backend.dot(h_tm1 * r, self.recurrent_kernel) matrix_inner = tf.keras.backend.bias_add(matrix_inner, self.recurrent_bias) _, _, recurrent_h = tf.split(matrix_inner, 3, axis=-1) h = (1.0 - z) * h_tm1 + z * self._activation(recurrent_h) # we replace tanh with relu return h, [h] # Usage example model = tf.keras.Sequential() model.add(CustomGRU(units=10, input_shape=(None, 5), return_sequences=True))
In the code above, we define a custom GRU layer called CustomGRU
that inherits from the standard Keras GRU
layer. We then overwrite the _activation
function to use the relu
activation and modify the step
function to utilize our custom activation function.
In this post, we have explored the use of custom activation functions in the GRU architecture for deep learning tasks. We discussed the mathematical formulation of GRU and provided an example of implementing a custom GRU layer in TensorFlow and Keras using the relu
activation function. Exploring different activation functions in GRU layers can offer performance improvements and facilitate the development of more robust models for various machine learning tasks. So, feel free to experiment with other activation functions in your projects!