First Training with Fortnet#
[Input: recipes/basics/firsttrain/]
This chapter should serve as a tutorial guiding you through your first network optimization using Fortnet. As an exemplary dataset, the \(E\)-\(V\) scan of a primitive silicon unitcell in the diamond phase is used. The procedure is split into three major steps:
providing an appropriate input to Fortnet,
actually running Fortnet,
finally, analysing the results
After this tutorial, you will therefore have already become familiar with all of the basic features of Fortnet and subsequently start your own project.
Providing the Input#
Fortnet accepts the input in the Human-readable Structured Data (HSD) format. The input file must be called fortnet_in.hsd and in this example looks as follows:
Network = BPNN {
Hidden = 2 2
Activation = tanh
}
Features {
Mapping = ACSF {
Reduce = Yes
Standardization = Yes
Function = Auto {
RCut = 4.0
NRadial = 5
NAngular = 4
}
}
}
Training = LBFGS {
Threshold = 1e-08
NIterations = 5000
NPrintout = 1000
NSaveNet = 1000
MinDisplacement = 1e-10
MaxDisplacement = 5e-02
LineMin = Yes
Memory = 1000
Loss = mse
}
Data {
Dataset = fnetdata.hdf5
NetstatFile = 'fortnet.hdf5'
}
Options {
Mode = train
ReadNetStats = No
RandomSeed = 123456
}
The order of the specified blocks in the HSD input is arbitrary. You are free to
capitalise the keywords as you like, since they are case-insensitive. This is
not valid however for string values, especially if they are specifying file
names. Furthermore, it is possible to put arbitrary comments in the HSD input
after a hash-mark (#
) character. Everything between this character and the
end of the current line is ignored by the parser.
So let’s have a look at the input blocks, one by one.
Network#
Network = BPNN {
Hidden = 2 2
Activation = tanh
}
The Network
block specifies the neural network architecture to use.
Currently, only the Behler-Parrinello-Neural-Network (BPNN) [1] type
is implemented. It is assumed that all sub-nn’s have the same internal
structure, i.e. the same number of hidden layers and neurons per layer. The
Hidden
child node controls said parameters by expecting a list of positive
integer values, where each value corresponds to a hidden layer with the
specified number of neurons. To determine a neuron status, the activation or
transfer function is essential. Its type is controlled by the Activation
entry. In this case, let’s use the hyperbolic tangent. For complete list of
activation functions, please consult the corresponding
section.
Features#
Features {
Mapping = ACSF {
Reduce = Yes
Standardization = Yes
Function = Auto {
RCut = 4.0
NRadial = 5
NAngular = 4
}
}
}
Fortnet tries to infer physical or chemical properties of your systems based on
structural information, i.e. the atom types and coordinates. Since these raw
values are unsuitable as network inputs, for several reasons, they have to get
mapped to translational, rotational and commutation (same type) invariant
values. One famous set of functions that fulfills this purpose are the so-called
Atom-centered symmetry functions (ACSF) by J. Behler [2]. Fortnet
currently implements radial \(G_1, G_2, G_3\) and angular \(G_4, G_5\)
functions, as denoted in the original ACSF paper. In this case Fortnet’s
automatic parameter generation scheme is used to achieve a decent coverage of
the cutoff sphere by utilizing \(G_2\) and \(G_5\) functions. Therefore,
only the number of radial (NRadial
) and angular (NAngular
), as well as
the cutoff radius (RCut
), needs to be specified. The unit of the cutoff
radius is Angstrom. Due to the nature of the ACSF it is likely to get input
values of very different orders of magnitude. To compensate for this and achieve
an improvement in convergency and overall stability, it is possible to apply a
simple z-score standardization in the background, before feeding the network.
This behavior is controlled via the Standardization
option. The Reduce
entry determines whether the ACSF functions should be element resolved or
unresolved. In the latter case (Reduce
= Yes) the calculated neighbor lists
would contain all the atoms regardless of their type which leads to a
significant reduction of the input features. However, this would require the
weighting of individual summands with atomic prefactors
since otherwise contradictory input features would arise. Since the dataset at
hand only contains silicon atoms, this parameter may be ignored for now.
Training#
Training = LBFGS {
Threshold = 1e-08
NIterations = 5000
NPrintout = 1000
NSaveNet = 1000
MinDisplacement = 1e-10
MaxDisplacement = 5e-02
LineMin = Yes
Memory = 1000
Loss = mse
}
To successively optimize the weight and bias network parameters during the
training iterations, Fortnet provides different algorithms. In this example
a limited memory implementation of the Broyden–Fletcher–Goldfarb–Shanno
algorithm (L-BFGS) is used. For a complete list of the available optimizers,
please consult the corresponding optimizer section. Every
optimizer provides two options to controll when to end the training process, the
Threshold
and maximum number of iterations (NIterations
). The training
will be terminated as soon as one of the conditions is fulfilled. Furthermore,
the number of training iterations must be specified, after which the current
loss value and gradient gets printed to stdout (NPrintout
) and the current
network status is written out (NSaveNet
). For a list of available loss
functions, consult the dedicated Loss Functions section. The
remaining settings of the example above are optional and described in the
corresponding L-BFGS optimizer subsection.
Data#
Data {
Dataset = fnetdata.hdf5
NetstatFile = fortnet.hdf5
}
Since the provision of high quality data is key when dealing with neural
networks in general, let’s have a look at the data block and how to hand over a
dataset. Most important, the Dataset
entry must be a string pointing to a
compatible HDF5 dataset file (in this case fnetdata.hdf5
). A fundamental
design decision of Fortnet is not to provide native support for the output files
of popular simulation packages directly. Instead, a separate input format is
used and a corresponding Python class is provided which, based on the
Atomic Simulation Environment (ASE) that is
also implemented in Python, enables a dataset to be generated easily. To see how
you get from the output files of your simulation package of choice to a Fortnet
compatible dataset, please consult the
Fnetdata: Generating a Dataset section.
Another useful feature is that the loss function of an external validation
dataset, that is not included in the optimization prozess, can be monitored
during training. To utilize this so-called validation-monitoring, e.g. for early
stopping purposes, provide an additional pathfile via the Validset
entry:
Data {
.
Validset = fnetvdata.hdf5
}
In this case a dataset file named fnetdata.hdf5 is present in the same folder
as the fortnet_in.hsd
input. Feel free to have a look at its content by
using your HDF5 viewer of choice.
In addition, the Data
block also handles the filename of the so-called
netstat files of the Fortnet world. They define the whole network status and
will be needed for a later restart of the training process or predictions based
on the created potential.
Options#
Options {
Mode = train
ReadNetStats = No
RandomSeed = 123456
}
The basic program behavior gets defined in the Option
block of the input,
starting with the running mode of Fortnet. There are three valid options:
train, validate, predict. As in this example, the train mode will
optimize the network with respect to the targets provided by the dataset. A
resumption of the training process based on existing netstat file would be
requested by setting the ReadNetStats
entry to Yes. To validate the
resulting networks or to predict structures with unknown properties, the
other two modes are used and explained in the
First Predictions with Fortnet section.
The reproducibility of results is particularly important in scientific fields of
application. To meet this requirement, Fortnet provides a RandomSeed
entry.
By setting a seed you define the initial state of the luxury random number
generator [3, 4, 5] that is working in the background and
is responsible for the outcome of the initialization of the sub-nn’s and
therefore the training process in general. This is an optional entry and
randomly generated if not set by the user. Since Fortnet prints out the random
seed of the current run you may need this for later reproduction of results.
Warning
A few warning words about the reproducibility: In theory all the results you
obtain using Fortnet are reproducible since the RandomSeed
entry enables
the user to define the initial state of the random number generators used by
the project. However, due to the non-commutativity of floating-point
operations it has been observed that reproducibility is given for a fixed
machine, compiler and number of MPI-processes, but as soon as one of these
parameters changes you will get different results.
Running Fortnet#
As soon as all files have been generated and are present in their correct
location, you are ready to execute Fortnet. To do so, invoke the fnet
binary
without any arguments in the directory containing the fortnet_in.hsd
file.
As mentioned above, Fortnet writes some information to the standard output.
Therefore it is recommended to tee this output for later investigation:
fnet | tee output
In most cases Fornet will be compiled with MPI parallelism enabled. To make use of the associated speedup, issue:
mpirun -np 4 fnet | tee output
or something equivalent. Note: It may be necessary to provide the absolute path
to the fnet
binary in this case.
Examining the Output#
Fortnet uses two output channels: 1) the standard output (which you should redirect into a file to keep for later evaluation) and 2) various output files. These two channels will now be outlined, within the context of a training scenario, below.
Standard Output#
In the following, the standard output, gets broken down and explained piece by piece, in the order as it appears on the screen, starting with the header:
|==============================================================================|
| Fortnet - A BPNN Implementation, Version 0.3 |
| |
| Copyright (C) 2020 - 2021 T. W. van der Heide |
|==============================================================================|
date: 15.08.2021
time: 13:13:08, +0200
As you may have seen, nothing spectacular is happening here. Nevertheless, the version number as well as date and time of the binary execution can be important information in retrospect.
Interpreting input file 'fortnet_in.hsd'
Checking Input Consistency...passed
Processed input written as HSD to 'fortnet_pin.hsd'
--------------------------------------------------------------------------------
As the next step, Fortnet parses and interprets the fortnet_in.hsd
input
file and carries out some basic consistency checks on the obtained parameters.
Additionally the input as Fortnet sees and interprets it gets stored in the
fortnet_pin.hsd
file.
You will also see a list of information from the HSD input, as printed below:
Initialisation
running in training mode
random seed: 123456
read initial netstats: F
--------------------------------------------------------------------------------
Sub-NN Details
inputs: 9
hidden layers: 2 2
outputs: 1
activation: tanh
--------------------------------------------------------------------------------
ACSF Mappings
species-resolved: F
nr. of radial functions: 5
nr. of angular functions: 4
g2: rc = 7.558904, rs = .000000, eta = .805987,
atomId = 0
g2: rc = 7.558904, rs = 1.889726, eta = .805987,
atomId = 0
g2: rc = 7.558904, rs = 3.779452, eta = .805987,
atomId = 0
g2: rc = 7.558904, rs = 5.669178, eta = .805987,
atomId = 0
g2: rc = 7.558904, rs = 7.558904, eta = .805987,
atomId = 0
g5: rc = 7.558904, lambda = 1.000000, eta = .080599, xi = 1.000000
atomId = 0
g5: rc = 7.558904, lambda = -1.000000, eta = .080599, xi = 1.000000
atomId = 0
g5: rc = 7.558904, lambda = 1.000000, eta = .080599, xi = 16.000000
atomId = 0
g5: rc = 7.558904, lambda = -1.000000, eta = .080599, xi = 16.000000
atomId = 0
--------------------------------------------------------------------------------
Dataset Information
found: 25 datapoints (25 unique ones)
in file: fnetdata.hdf5
total sub-nn parameters: 29
targets per parameter: .8621
--------------------------------------------------------------------------------
The entry targets per parameter
is of particular importance. Based on this
ratio you can roughly deduce whether the selected network size is suitable
regarding the dataset that was provided. It is calculated in terms of unique
datapoints, by solely considering the unweighted geometry-target pairs.
Up to this stage of binary execution, the input was parsed and the dataset read.
The Calculating ACSF
statement tells us, that Fortnet has started to map the
structure information to input-suitable ACSF values. As soon as the word done
appears, this process is complete and the training process starts:
Calculating ACSF...done
Starting training...
iTrain MSE-Loss Gradients
--------------------------------------------------------------------
1000 0.187303E-04 0.548379E-03
2000 0.224850E-04 0.121142E-02
3000 0.411225E-05 0.315495E-03
4000 0.118531E-05 0.820748E-03
5000 0.142334E-06 0.205431E-03
--------------------------------------------------------------------
Training finished (max. Iterations reached)
--------------------------------------------------------------------
Loss Analysis (global min.)
iTrain: 5000, Loss: 1.423336E-07
--------------------------------------------------------------------
While the training process is running, the trajectory of the loss function and
the total gradient of the network parameters are printed regularly, depending on
the NPrintout
setting of the Training
block. In this case, the
termination criterion is the maximum number of training iterations. After
completion of the training, the iteration with the lowest loss value is written
out.
Output Files#
Depending on the setting of the program behavior in the input file (i.e. running mode), different output files are created. Running the current example there will be a single file written to disk, appart from the redirected standard output: fortnet.hdf5. The average user does not have to look into this file. It solely contains information regarding the program state, which are necessary for a later resumption of the training process or for predictions based on the resulting network potential.
In fact, the relevant output fnetout.hdf5
is only created in validation or
prediction mode and introduced in the next section.
If the total trajectory of the loss function and total gradient is of interest,
it can be written out as iterout.dat
by setting the corresponding entry
(default: No):
Options {
.
.
.
WriteIterationTrajectory = Yes
}
The column order of the output in iterout.dat
is analogous to the standard
output.