The Gaussian process,
well-known as Kriging, is multivariate normal distributions with finite
dimensionality [14, 25-27]. The Gaussian process model is composed of a global
model f(x) and the stochastic Gaussian process Z(x)
presenting a deviation from the global model:
Where f(x) the
polynomial regression function with regression coefficients β, and Z(x) is a stationary
stochastic Gaussian process N(0, K) having zero mean and covariance
function K. x is an m-dimensional vector of input parameters.
For the stochastic process
Z(x), suppose that the training data consists of the computer
output y at the input data X = {x1, x2,
.., xn} and that f* is the being predicted at x*.
The Gaussian process model has got the multivariate normal distribution:
Therefore, the prediction
for f* is the mean of the multivariate normal distribution and its variance
V(f*) is the uncertainty of the prediction:
Covariance function (K=R) is the crucial ingredient in a Gaussian process predictor, as it encodes the assumptions about the interpolation function between x1 and x2. Therefore, different famos stationary covariance functions [27] have been implemented:
· Square Exponential
· Exponential
· Gamma-Exponential
· Matern Class 3/2
· Matern Class 5/2
· Periodic
· Rational Quadratic
where w, λ and α are hyper-parameters, which have to be found
using optimization procedure to maximize the likelihood function of the
multivariate normal distribution:
The Gaussian process
returns not only the prediction value, also the uncertainty of the prediction
for the regression function. Thus, the confidence interval is available over
all the regression function space. That leads to an efficient way called
adaptive Gaussian process making the regression function more accurately with
few number of data points. The adaptive Gaussian process is an iterative
process. For the initial data points, the initial Gaussian process will be
generated. Based on the calculated uncertainties, the data point on the
parameter space returning the maximal uncertainty will be sampled for the next
generating of the Gaussin process. This adaptive
process will be iterative carried out until the maximal uncertainty the
user-defined limit or a number of sampled data points have been obtained.
The regression is
considered, when the output is continuous real value. Classification is another
important problem, where the output is the discrete value, states or classes
[60-76]. While regression represents the analog simulation, classification
deals with digital simulation, the output is assigned to one of C classes (C
>1). for Gaussian process classification, the
latent f(x) function is introduced. The input data x
is approximated by this latent function using Gaussian process regression with
Laplace approximation. Then, the latent function f will be squash through a
function to obtain probability π assigning to one of the classes C .
We decide 2 cases, binary classification (C=2) and multiple classification
(C>2)
Laplace method utilizes a
Gaussian approximation to the posterior with the latent fuction
f:
The posterior mean for f uner the Laplace approximation can be expressed:
and the variance of the prediction is:
The latent function f will
be squash to one of the binary class using the regression function:
The posterior for the multple classification problem is more complex.
Thus, the posterior mean
for f is bellow:
and the variance of the prediction:
The latent function f is
assigned to one of C class using the soft-max function:
The representing for the input data X = {x1, x2, .., xn} is the covariance matrix K with n descending ranks: λ1, λ2,...,λr , .. , λn . If the number of the input data is extremely huge, the computing of the Gaussian process based matrix operations is very expensive. The matrix approximation is an efficient way to reduce the input data to a low-rank matrix with (r < n) for cheaper computing and less quality loss. The small ranks of the covariance matrix (r+1, r+1, ... n) will be cut off. The approximated matrix will have got only r ranks, but represents the initial covariance matrix with minimal differences. This process is called sparse Gaussian process.