Loss Functions
Choosing a suitable loss function for the task at hand can make a significant difference, primarily depending on the dataset targets. In order to give you a certain freedom in this sense (i.e. when it comes to weighting outliers), Fortnet implements the following functions:
mean squared loss (mse)
root mean square loss (rms)
mean absolute loss (mae)
mean absolute percentage loss (mape)
The loss function used during the training is selected in the Training block
of the HSD input. The functions and an associated, exemplary training block, are
listed below, assuming a dataset of \(N\) targets \(y_i^\mathrm{ref}\)
and network predictions \(y_i^\mathrm{nn}\).
Note
The default loss function is the mean squared error (mse).
Mean Squared Error
Training = LBFGS {
.
.
.
Loss = mse
}
Root Mean Square Error
Training = LBFGS {
.
.
.
Loss = rms
}
Mean Absolute Error
Training = LBFGS {
.
.
.
Loss = mae
}
Mean Absolute Percentage Error
Training = LBFGS {
.
.
.
Loss = mape
}
Regularization
An equally simple and effective method to prevent overfitting while training a neural network is loss-based regularization. By adding an additional penalty term \(\tilde{C}\) to the base loss \(C_0\), the assembly of a spiky hypersurface, due to high connection weights, can be mitigated:
The Strength of the penalty is regulated by the \(\lambda\)-parameter.
Fortnet supports \(L_1\) (lasso) and \(L_2\) (ridge) regularizations as
well as a mixture of both (elastic net), serving different purposes. The
specification takes place in the Training block of the HSD input, e.g.:
Training = LBFGS {
.
.
.
Regularization = Ridge {
Strength = 1.0
}
}
More detailed descriptions of each variant are given below.
L1 - Lasso
Lasso regression adds a penalty based on the raw magnitudes of weight coefficients. It is often referred to as \(L_1\) regularization and might lead to a certain feature selection by zeroing out some of the weights:
The specification takes place in the Training block of the HSD input, e.g.:
Training = LBFGS {
.
.
.
Regularization = Lasso {
Strength = 1.0
}
}
L2 - Ridge
Ridge regression adds a penalty on particularly large weight coefficients. It is often referred to as \(L_2\) regularization and, analogous to lasso regression, shrinks the weights and reduces model complexity:
The specification takes place in the Training block of the HSD input, e.g.:
Training = LBFGS {
.
.
.
Regularization = Ridge {
Strength = 1.0
}
}
Elastic Net
Elastic net regularization mixes \(L_1\) and \(L_2\) contributions in a certain ratio, determined by the \(\alpha\)-parameter:
For \(\alpha = 0\) or \(\alpha = 1\) this results in \(L_2\) or
\(L_1\) regularization respectively. The specification takes place in the
Training block of the HSD input, e.g.:
Training = LBFGS {
.
.
.
Regularization = ElasticNet {
Strength = 1.0
Alpha = 0.5
}
}