Optimizer#
To successively optimize the weight and bias network parameters during the training iterations, Fortnet provides different algorithms. Depending on the problem and dataset, the choice of optimizer can have a major impact on convergence and overall behavior during training. Currently, the following choices are available:
Steepest Descent (SD) [11]
Conjugate Gradient (CG) [12]
FIRE (FIRE) [13]
Limited-Memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) [14]
General Optimizer Settings#
Some parameters of the Training
block are universally valid across all
optimizers listed above. The table below lists these entries:
Setting |
Type |
Default |
Note |
NIterations |
Integer |
Huge() |
Max. number of training iterations |
Threshold |
Float |
Tiny() |
Gradient termination criterion |
NPrintout |
Integer |
10 |
Standard output print interval |
NSaveNet |
Integer |
100 |
Netstat output save interval |
MinDisplacement |
Float |
1e-06 |
Min. displacement in parameters |
MaxDisplacement |
Float |
1e+04 |
Max. displacement in parameters |
Shuffle |
Logical |
No |
Randomly shuffle order of gradient calculations |
Optimizer Specific Settings#
In addition to the universal parameters, there are also optimizer-specific options. These are the subject of the following sections.
Steepest Descent#
Exemplary HSD Training
block of the fortnet_in.hsd
user input:
Training = SD {
Threshold = 1e-08
NIterations = 10000
NPrintout = 10
NSaveNet = 100
MinDisplacement = 1e-10
MaxDisplacement = 5e-02
LearningRate = 0.01
Shuffle = No
}
Optimizer specific settings:
Setting |
Type |
Default |
Note |
LearningRate |
Float |
0.01 |
uniform weight of gradient components |
Conjugate Gradient#
Exemplary HSD Training
block of the fortnet_in.hsd
user input:
Training = CG {
Threshold = 1e-08
NIterations = 10000
NPrintout = 10
NSaveNet = 100
MinDisplacement = 1e-10
MaxDisplacement = 5e-02
Shuffle = No
}
Currently, there are no specific parameters for the conjugate gradient method.
FIRE#
Exemplary HSD Training
block of the fortnet_in.hsd
user input:
Training = FIRE {
Threshold = 1e-08
NIterations = 10000
NPrintout = 10
NSaveNet = 100
MinDisplacement = 1e-10
MaxDisplacement = 5e-02
Shuffle = No
}
Currently, there are no specific parameters for the conjugate gradient method.
L-BFGS#
Exemplary HSD Training
block of the fortnet_in.hsd
user input:
Training = LBFGS {
Threshold = 1e-08
NIterations = 10000
NPrintout = 10
NSaveNet = 100
MinDisplacement = 1e-10
MaxDisplacement = 5e-02
MaxForQNDisplacement = No
LineMin = Yes
Memory = 1000
Shuffle = No
}
Optimizer specific settings:
Setting |
Type |
Default |
Note |
MaxForQNDisplacement |
Logical |
False |
Consider max. step for quasi-Newton direction |
Linemin |
Logical |
True |
Use a line search |
Memory |
Integer |
1000 |
Nr. of past iterations to save |