CNN and ANN performance with different Activation Functions like ReLU, SELU, ELU, Sigmoid, GELU etc

Thanga Sami
5 min readJun 25, 2021

Activation Functions can be defined as Function which make Neural Network Non-Linear. It helps Neural Network to learn better in each epochs. In this article, we are going to see how activation function’s performance is varying with ANN and CNN model.

Few Commonly used activation functions and their plotted characteristics are given below.

Sigmoid:

Sigmoid Function

Plotted Sigmoid Function with Python Code:

TanH:

TanH Function

Plotted Tanh Function with Python Code:

ReLU:

Rectified Linear Unit Function

Plotted ReLU Function with Python Code:

ELU:

Exponential Linear Unit Function

Plotted ELU Function with Python Code:

Leaky ReLU:

Leaky Rectified Linear Unit Function

Plotted Leaky ReLU Function with Python Code:

Leaky ReLu Plotted with alpha = 0.1

SELU:

Scaled Exponential Linear Unit

Plotted SELU Function with Python Code:

GELU:

Gaussian Error Linear Unit

Plotted GELU Function with Python Code:

Let us compare activation function performance with ANN and CNN. Need to import below set of Lib in python for our analysis. The entire notebook is available in Github in below link.

https://github.com/Thangasami/Activation-Functions-Analysis

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, Activation, LeakyReLU
from keras.layers.noise import AlphaDropout
from keras.utils.generic_utils import get_custom_objects
from keras import backend as K

We are going to use MNIST FASHION dataset for our analysis.

fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

Preprocessing details for respective MNIST dataset details are given below.

class_names = [‘T-shirt/top’, ‘Trouser’, ‘Pullover’, ‘Dress’, ‘Coat’,
‘Sandal’, ‘Shirt’, ‘Sneaker’, ‘Bag’, ‘Ankle boot’]

train_images_scaled = train_images / 255.0
test_images_scaled = test_images / 255.0

input_shape = (28, 28, 1)
train_images_scaled = tf.expand_dims(train_images_scaled, axis=-1)
test_images_scaled = tf.expand_dims(train_images_scaled, axis=-1)

Since GELU activation function alone not available in keras, we need to add it into Keras Lib.

# Add the GELU function to Keras
def gelu(x):
return 0.5 * x * (1 + tf.tanh(tf.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3))))
get_custom_objects().update({‘gelu’: Activation(gelu)})

# Add leaky-relu so we can use it as a string
get_custom_objects().update({‘leaky-relu’: Activation(LeakyReLU(alpha=0.2))})

act_func = [‘sigmoid’, ‘relu’, ‘elu’, ‘leaky-relu’, ‘selu’, ‘gelu’, ‘tanh’ ]

CNN Model building:

def build_cnn(activation,
dropout_rate,
optimizer):
model = Sequential()

if(activation == ‘selu’):
model.add(Conv2D(32, kernel_size=(3, 3),
activation=activation,
input_shape=input_shape,
kernel_initializer=’lecun_normal’))
model.add(Conv2D(64, (3, 3), activation=activation,
kernel_initializer=’lecun_normal’))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(AlphaDropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation=activation,
kernel_initializer=’lecun_normal’))
model.add(Dense(10, activation=’sigmoid’))
else:
model.add(Conv2D(32, kernel_size=(3, 3),
activation=activation,
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation=activation))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation=activation))
model.add(Dense(10, activation=’sigmoid’))

model.compile(
loss=’sparse_categorical_crossentropy’,
optimizer=optimizer,
metrics=[‘accuracy’]
)

return model

ANN Model building:

def build_ann(activation,
dropout_rate,
optimizer):
model = Sequential()

if(activation == ‘selu’):
model.add(Flatten())
model.add(Dense(5000, activation=activation,
kernel_initializer=’lecun_normal’))
model.add(AlphaDropout(0.5))
model.add(Dense(1000, activation=activation,
kernel_initializer=’lecun_normal’))
model.add(Dense(10, activation=’sigmoid’))
else:

model.add(Flatten())
model.add(Dense(5000, activation=activation))
model.add(Dropout(0.5))
model.add(Dense(1000, activation=activation))
model.add(Dense(10, activation=’sigmoid’))

model.compile(
loss=’sparse_categorical_crossentropy’,
optimizer=optimizer,
metrics=[‘accuracy’]
)

return model

CNN Model Compilation and call for various activation functions:

result = []

for activation in act_func:
print(‘\nTraining with →{0}← activation function\n’.format(activation))

model = build_cnn(activation=activation,
dropout_rate=0.2,
optimizer=’adam’)

history = model.fit(train_images_scaled, train_labels,
validation_split=0.20,
batch_size=32, # 128 is faster, but less accurate. 16/32 recommended
epochs=10,
verbose=1,
validation_data=(test_images_scaled, test_labels))

result.append(history)

K.clear_session()
del model

print(result)

ANN Model Compilation and call for various activation functions:

result = []

for activation in act_func:
print(‘\nTraining with →{0}← activation function\n’.format(activation))

model1 = build_ann(activation=activation,
dropout_rate=0.2,
optimizer=’adam’)

history = model1.fit(train_images_scaled, train_labels,
validation_split=0.20,
batch_size=32, # 128 is faster, but less accurate. 16/32 recommended
epochs=10,
verbose=1,
validation_data=(test_images_scaled, test_labels))

result.append(history)

K.clear_session()
del model1

print(result)

Comparing Accuracy and Loss of different Loss Functions:

CNN Results:

Accuracy Comparison
Loss curve

ANN Results:

Conclusion:

Though selection of activation function is depends on data and use case, our observations based on above testing is given below.

For CNN, Sigmoid/ Tanh functions are performing poor. ReLU outperforming others. Latest functions like ELU, SELU, GELU are giving similar results. For CNN, it is better to avoid Sigmoid and Tanh.

For ANN, All activation functions are performing good and give good results except SELU. Better to avoid using SELU during dense artificial neural network problems. Interesting thing is, Sigmoid/Tanh also performing good similar to other activation functions in our ANN Case.

However, we need to do similar type of analysis for RNN as well to understand activation functions better. We will update that use case test results soon.

--

--

Thanga Sami

I am a data science and machine learning enthusiast with hands on experience in python. I have graduated from MIT Chennai with 13+ years IT Experience