Abstract:
Neural Network (NN) allows complex nonlinear relationships between the response
variables and its predictors. The Deep NN have made notable contributions across
computer vision, reinforcement learning, speech recognition and natural language
processing. Previous studies have obtained the parameters of NN through the classical approach using Homogeneous Activation Functions (HOMAFs). However, a
major setback of NN using the classical approach is its tendency to over-fit. Therefore, this study was aimed at developing a Bayesian NN (BNN) model to ameliorate
over-fitting using Heterogeneous Activation Functions (HETAFs).
A BNN model was developed with Gaussian error distribution for the likelihood
function; inverse gamma and inverse Wishart priors for the parameters, to obtain
the BNN estimators. The HOMAFs (Rectified Linear Unit (ReLU), Sigmoid and
Hyperbolic Tangent Sigmoid (TANSIG)) and HETAFs (Symmetric Saturated Linear Hyperbolic Tangent (SSLHT) and Symmetric Saturated Linear Hyperbolic Tangent Sigmoid (SSLHTS)) were used to activate the model parameters.The Bayesian
approach was used to ameliorate the problem of over-fitting, while the Posterior
Mean (PM), Posterior Standard Deviation (PSD) and Numerical Standard Error
(NSE) were used to determine the estimators’ sensitivity. The performance of the
Bayesian estimators from each of the activation functions was evaluated in the
Monte Carlo experiment using the Mean Square Error (MSE), Mean Absolute Error (MAE) and training error as metrics. The proximity of MSE and training error
values were used to generalise on the problem of over-fitting.
The derived Bayesian estimators were β ∼ N(Kβ, Hβ) and γ ∼ exp (−1 2{Fγ +Mγ);
where Kβ is derived mean of β, Hβ is derived standard deviation of β; Fγ and
Mγ
are the derived posteriors of γ. For ReLU, the PM, PSD and NSE values for
β and γ were 0.4755, 0.0646, 0.0020; and 0.2370, 0.0642, 0.0020, respectively; for
Sigmoid: 0.4476, 0.2734, 0.0087; and 1.0269, 0.2732, 0.0086, respectively; for TANSIG: 0.4718, 0.0826, 0.0026, and 1.0239, 0.0822, 0.0026, respectively. For SSLHT,
the PM, PSD and NSE values for β and γ were 0.8344, 0.0567, 0.0018; and 1.0242,
0.0566, 0.0016, respectively; and for SSLHTS: 0.89825, 0.01278, 0.0004; and 1.0236,
v0.0127, 0.0003, respectively. The MSE, MAE and training error values for the performance of the activation functions were ReLU: 0.1631, 0.2465, 0.1522; Sigmoid:
0.1834, 0.2074, 0.1862; TANSIG: 0.1943, 0.269, 0.1813; SSLHT: 0.0714, 0.0131,
0.0667; and SSLHTS: 0.0322, 0.0339, 0.0328, respectively. The HETAFs showed
closer proximity between MSE and training error implying amelioration of overfitting and minimum error values compared to HOMAFS.
The derived Bayesian neural network estimators ameliorated the problem of overfitting with close values of Mean Square Error and training error, thus making
them more appropriate in handling Neural Network models. They could be used
in solving problems in machine learning.