First Training with Fortnet¶
[Input: recipes/basics/firsttrain/]
This chapter should serve as a tutorial guiding you through your first network optimization using Fortnet. As an exemplary dataset, the \(E\)-\(V\) scan of a primitive silicon unitcell in the diamond phase is used. The procedure is split into three major steps:
providing an appropriate input to Fortnet,
actually running Fortnet,
finally, analysing the results
After this tutorial, you will therefore have already become familiar with all of the basic features of Fortnet and subsequently start your own project.
Providing the Input¶
Fortnet accepts the input in the Human-readable Structured Data (HSD) format. The input file must be called fortnet_in.hsd and in this example looks as follows:
Network = BPNN {
Hidden = 2 2
Activation = 'tanh'
}
Mapping = ACSF {
NRadial = 5
NAngular = 4
RCut = 4.0
Standardization = Yes
}
Training = LBFGS {
Threshold = 1e-08
NIterations = 5000
NPrintout = 1000
NSaveNet = 1000
MinDisplacement = 1e-10
MaxDisplacement = 5e-02
LineMin = Yes
Memory = 1000
Loss = 'mse'
}
Data {
Dataset = 'training_data'
Standardization = No
NetstatFiles = Type2FileNames {
Prefix = "./"
Suffix = ".net"
LowerCaseTypeName = No
}
}
Options {
Mode = 'train'
ReadNetStats = No
RandomSeed = 123456
}
The order of the specified blocks in the HSD input is arbitrary. You are free to
capitalise the keywords as you like, since they are case-insensitive. This is
not valid however for string values, especially if they are specifying file
names. Furthermore, it is possible to put arbitrary comments in the HSD input
after a hash-mark (#) character. Everything between this character and the
end of the current line is ignored by the parser.
So let’s have a look at the input blocks, one by one.
Network¶
Network = BPNN {
Hidden = 2 2
Activation = 'tanh'
}
The Network block specifies the neural network architecture to use.
Currently, only the Behler-Parrinello-Neural-Network (BPNN) [1] type
is implemented. It is assumed that all sub-nn’s have the same internal
structure, i.e. the same number of hidden layers and neurons per layer. The
Hidden child node controls said parameters by expecting a list of positive
integer values, where each value corresponds to a hidden layer with the
specified number of neurons. To determine a neuron status, the activation or
transfer function is essential. Its type is controlled by the Activation
entry. In this case, let’s use the hyperbolic tangent. For complete list of
activation functions, please consult the corresponding
section.
Mapping¶
Mapping = ACSF {
NRadial = 5
NAngular = 4
RCut = 4.0
Standardization = Yes
}
Fortnet tries to infer physical or chemical properties of your systems based on
structural information, i.e. the atom types and coordinates. Since these raw
values are unsuitable as network inputs, for several reasons, they have to get
mapped to translational, rotational and commutation (same type) invariant
values. One famous set of functions that fulfills this purpose are the so-called
Atom-centered symmetry functions (ACSF) by J. Behler [2]. Fortnet
currently implements radial \(G_2\) and angular \(G_5\) functions, as
denoted in the original ACSF paper. Their respective parameters are calculated
automatically by Fortnet, so that a decent coverage of the sphere defined by
the cutoff radius is guaranteed. Therefore, only the number of radial
(NRadial) and angular (NAngular), as well as the cutoff radius
(RCut), needs to be specified. The unit of the cutoff radius is Angstrom.
Due to the nature of the ACSF it is likely to get input values of very different
magnitudes of order. To compensate for this and achieve an improvement in
convergency and overall stability, it is possible to apply a simple z-score
standardization in the background, before feeding the network. This behavior is
controlled via the Standardization option.
Training¶
Training = LBFGS {
Threshold = 1e-08
NIterations = 10000
NPrintout = 10
NSaveNet = 100
MinDisplacement = 1e-10
MaxDisplacement = 5e-02
LineMin = Yes
Memory = 1000
Loss = 'mse'
}
To successively optimize the weight and bias network parameters during the
training iterations, Fortnet provides different algorithms. In this example
a limited memory implementation of the Broyden–Fletcher–Goldfarb–Shanno
algorithm (L-BFGS) is used. For a complete list of the available optimizers,
please consult the corresponding optimizer section. Every
optimizer provides two options to controll when to end the training process, the
Threshold and maximum number of iterations (NIterations). The training
will be terminated as soon as one of the conditions is fulfilled. Furthermore,
the number of training iterations must be specified, after which the current
loss value and gradient gets printed to stdout (NPrintout) and the current
network status is written out (NSaveNet). For a list of available loss
functions, consult the dedicated Loss Functions section. The
remaining settings of the example above are optional and described in the
corresponding L-BFGS optimizer subsection.
Data¶
Data {
Dataset = 'training_data'
Standardization = No
NetstatFiles = Type2FileNames {
Prefix = "./"
Suffix = ".net"
LowerCaseTypeName = No
}
}
Since the provision of high quality data is key when dealing with neural
networks in general, let’s have a look at the data block and how to hand over a
dataset. Most important, the Dataset entry must be a string pointing to a
file that contains all the paths to the so called fnetdata.xml files. Each
of those files defines a datapoint that consists of a geometry and target
values to optimize the network for. A fundamental design decision of Fortnet is
not to provide native support for the output files of popular simulation
packages directly. Instead, a separate input format is used and a corresponding
Python class is provided which, based on the Atomic Simulation Environment
(ASE) that is also implemented in Python,
enables a dataset to be generated easily. To see how you get from the output
files of your simulation package of choice to a Fortnet compatible dataset,
please consult the Generating a Dataset section.
Another useful feature is that the loss function of an external validation
dataset, that is not included in the optimization prozess, can be monitored
during training. To utilize this so-called validation-monitoring, e.g. for early
stopping purposes, provide an additional pathfile via the Validset entry:
Data {
.
Validset = 'validation_data'
}
In this case a file named training_data is present in the same folder as the
fortnet_in.hsd input:
20
./dataset/point_01
./dataset/point_02
./dataset/point_03
.
.
.
The first line contains an integer that specifies the number of fnetdata.xml
paths the current file contains. Following that, the relative (or absolute)
paths to the directories containing the fnetdata.xml files get listed. Note
that there is no ‘/’ at the end of each path because Fortnet will append the
/fnetdata.xml for you. Analogous to the Mapping block there is an option
(Standardization) to perform a simple z-score standardization on the target
values.
In addition, the Data block also handles the naming scheme of the files
containing all the properties of a single sub-nn of the BPNN, called netstat
files in the Fortnet world. The most convenient method, especially for datasets
with multiple atom types, is to use the Type2FileNames option. In this case
the only necessary entries are the pre- and suffix of the files and wether to
use lower case characters only (optional, default: No). The parser will then
build appropriate filenames (./Si.net, ./C.net, …) based on the atom types
found in the dataset at hand. Although not recommended, the output paths and
filenames can also be specified manually, i.e. if different folders are
desired:
NetstatFiles {
Si = '/home/user/Silicon.net'
}
Options¶
Options {
Mode = 'train'
ReadNetStats = No
RandomSeed = 123456
}
The basic program behavior gets defined in the Option block of the input,
starting with the running mode of Fortnet. There are three valid options:
train, validate, predict. As in this example, the train mode will
optimize the network with respect to the targets provided by the dataset. A
resumption of the training process based on existing netstat files would be
requested by setting the ReadNetStats entry to Yes. To validate the
resulting networks or to predict structures with unknown properties, the
other two modes are used and explained in the
First Predictions with Fortnet section.
The reproducibility of results is particularly important in scientific fields of
application. To meet this requirement, Fortnet provides a RandomSeed entry.
By setting a seed you define the initial state of the luxury random number
generator [3, 4, 5] that is working in the background and
is responsible for the outcome of the initialization of the sub-nn’s and
therefore the training process in general.
Warning
A few warning words about the reproducibility: In theory all the results you
obtain using Fortnet are reproducible since the RandomSeed entry enables
the user to define the initial state of the random number generators used by
the project. However, due to the non-commutativity of floating-point
operations it has been observed that reproducibility is given for a fixed
machine, compiler and number of MPI-processes, but as soon as one of these
parameters changes you will get different results.
Running Fortnet¶
As soon as all files have been generated and are present in their correct
location, you are ready to execute Fortnet. To do so, invoke the fnet binary
without any arguments in the directory containing the fortnet_in.hsd file.
As mentioned above, Fortnet writes some information to the standard output.
Therefore it is recommended to tee this output for later investigation:
fnet | tee output
In most cases Fornet will be compiled with MPI parallelism enabled. To make use of the associated speedup, issue:
mpirun -np 4 fnet | tee output
or something equivalent. Note: It may be necessary to provide the absolute path
to the fnet binary in this case.
Examining the Output¶
Fortnet uses two output channels: 1) the standard output (which you should redirect into a file to keep for later evaluation) and 2) various output files. These two channels will now be outlined, within the context of a training scenario, below.
Standard Output¶
In the following, the standard output, gets broken down and explained piece by piece, in the order as it appears on the screen, starting with the header:
|==============================================================================|
| Fortnet - A BPNN Implementation, Version 0.2 |
| |
| Copyright (C) 2020 - 2021 T. W. van der Heide |
|==============================================================================|
date: 21.06.2021
time: 09:13:06, +0200
As you may have seen, nothing spectacular is happening here. Nevertheless, the version number as well as date and time of the binary execution can be important information in retrospect.
Interpreting input file 'fortnet_in.hsd'
Checking Input Consistency...passed
Processed input written as HSD to 'fortnet_pin.hsd'
--------------------------------------------------------------------------------
As the next step, Fortnet parses and interprets the fortnet_in.hsd input
file and carries out some basic consistency checks on the obtained parameters.
Additionally the input as Fortnet sees and interprets it gets stored in the
fortnet_pin.hsd file.
You will also see a list of information from the HSD input, as printed below:
Initialisation
running in training mode
random seed: 123456
read initial netstats: F
--------------------------------------------------------------------------------
Sub-NN Details
inputs: 9
hidden layers: 2 2
outputs: 1
activation: tanh
--------------------------------------------------------------------------------
ACSF Mappings
cutoff: 4.0000 Angstrom
nr. of radial functions: 5
nr. of angular functions: 4
species identifier:
Si: 1.000000
atom id index: /
Standardization: T
--------------------------------------------------------------------------------
Dataset Information
found: 25 geometries (25 unique ones)
in pathfile: training_data
total sub-nn parameters: 29
targets per parameter: .8621
--------------------------------------------------------------------------------
The entry targets per parameter is of particular importance. Based on this
ratio you can roughly deduce whether the selected network size is suitable
regarding the dataset that was provided. It is calculated in terms of unique
datapoints, by solely considering the unweighted geometry-target pairs.
Up to this stage of binary execution, the input was parsed and the dataset read.
The Calculating ACSF statement tells us, that Fortnet has started to map the
structure information to input-suitable ACSF values. As soon as the word done
appears, this process is complete and the training process starts:
Calculating ACSF...done
Starting training...
iTrain MSE-Loss Gradients
--------------------------------------------------------------------
1000 0.186609E-04 0.145044E+00
2000 0.770420E-05 0.668376E-03
3000 0.405024E-05 0.307614E-03
4000 0.240987E-05 0.134383E-03
5000 0.119669E-05 0.836493E-04
--------------------------------------------------------------------
Training finished (max. Iterations reached)
--------------------------------------------------------------------
Loss Analysis (global min.)
iTrain: 5000, Loss: 1.196695E-06
--------------------------------------------------------------------
While the training process is running, the trajectory of the loss function and
the total gradient of the network parameters are printed regularly, depending on
the NPrintout setting of the Training block. In this case, the
termination criterion is the maximum number of training iterations. After
completion of the training, the iteration with the lowest loss value is written
out.
Output Files¶
Depending on the setting of the program behavior in the input file (i.e. running mode), different output files are created. Running the current example there will be two files written to disk, appart from the redirected standard output: acsf.out, Si.net. The average user does not have to look into either of these files. They only contain information about the ACSF mappings and the status of the silicon network, which are necessary for a later resumption of the training process or for predictions based on the resulting network potential.
In fact, the relevant output fnetout.xml is only created in validation or
prediction mode and introduced in the next section.
If the total trajectory of the loss function and total gradient is of interest,
it can be written out as iterout.dat by setting the corresponding entry
(default: No):
Options {
.
.
.
WriteIterationTrajectory = Yes
}
The column order of the output in iterout.dat is analogous to the standard
output.