Skip to content
Snippets Groups Projects
Commit 3233046c authored by Massimiliano Bonomi's avatar Massimiliano Bonomi
Browse files

add error analysis

parent 458b403b
No related branches found
No related tags found
No related merge requests found
......@@ -11,6 +11,8 @@ Once this tutorial is completed students will be able to:
- Write the PLUMED input file to perform metadynamics simulations
- Calculate the free energy from a metadynamics run
- Monitor the behavior of variables in a metadynamics run
- Estimating the error in reconstructing free energies using block analysis
- Assess the convergence of a metadynamics simulation
\section master-ISDD-2-resources Resources
......@@ -23,6 +25,7 @@ cd handson_2
The \tarball{master-ISDD-2} for this project contains the following files:
- diala.pdb: a PDB file for alanine dipeptide in vacuo
- topol.tpr: a GROMACS run file to perform MD of alanine dipeptide
- do_block_fes.py: a python script to perform error analysis
The archive can be unpacked using the following command:
\verbatim
......@@ -109,7 +112,7 @@ We will play with a toy system, alanine dipeptide simulated in vacuo using the A
force field (see Fig. \ref master-ISDD-2-ala-fig).
This rather simple molecule is useful to understand data analysis and free-energy methods.
This system is a nice example because it presents two metastable states separated by a high free-energy barrier.
It is conventional use to characterize the two states in terms of Ramachandran dihedral angles, which are denoted with \f$ \Phi \f$ and \f$ \Psi \f$ in Fig. \ref master-ISDD-2-transition-fig .
It is conventional use to characterize the two states in terms of Ramachandran dihedral angles, which are denoted with \f$ \Phi \f$ and \f$ \Psi \f$ in Fig. \ref master-ISDD-2-transition-fig.
\anchor master-ISDD-2-ala-fig
\image html belfast-2-ala.png "The molecule of the day: alanine dipeptide."
......@@ -227,7 +230,7 @@ plumed sum_hills --hills HILLS
The command above generates a file called `fes.dat` in which the free-energy surface as function
of \f$ \phi \f$ is calculated on a regular grid. One can modify the default name for the free energy file,
as well as the boundaries and bin size of the grid, by using the following options of \ref sum_hills :
as well as the boundaries and bin size of the grid, by using the following options of \ref sum_hills:
\verbatim
--outfile - specify the outputfile for sumhills
......@@ -269,9 +272,9 @@ of your metadynamics simulation!
\note The two observations above are necessary, but qualitative conditions for convergence.
For a quantitative analysis, please have a look at one of our tutorials about error analysis, such as
\ref trieste-4 .
\ref trieste-4 , or wait for \ref master-ISDD-2-ex-4.
\subsection master-ISDD-2-ex-4 Exercise 3: reweighting
\subsection master-ISDD-2-ex-3 Exercise 3: reweighting
In the previous exercise we biased \f$\phi\f$ and compute the free energy as a function of
the same variable. Many times you want to decide which variable you want to analyze _after_
......@@ -342,13 +345,14 @@ of the file should look like this:
\endverbatim
The last column will give as, in energy units, the logarithm of the weight of each frame.
You can easily obtain the weight of each frame using the expression \f$w\propto\exp\left(\frac{V(s)}{k_BT}\right)\f$.
You might want to read the `COLVAR` file in python and compute a weighted histogram.
You can easily obtain the weight of each frame using the expression \f$w\propto\exp\left(\frac{V(s)}{k_BT}\right)\f$
(umbrella-sampling-like reweighting). You might want to read the `COLVAR` file in python and compute a weighted histogram.
If you want PLUMED to do the histograms for you, you can just add the following
lines to the PLUMED input file:
lines to the previous PLUMED input file:
\plumedfile
# use the metadynamics bias as argument
as: REWEIGHT_BIAS ARG=__FILL__
hhphi: HISTOGRAM ARG=phi STRIDE=50 GRID_MIN=-pi GRID_MAX=pi GRID_BIN=600 BANDWIDTH=0.1 LOGWEIGHTS=as
......@@ -367,12 +371,83 @@ gnuplot> p "ffphi.dat"
gnuplot> p "ffpsi.dat"
\endverbatim
\subsection master-ISDD-2-ex-4 Exercise 4: estimating the error in free-energies using block-analysis
In the previous exercise, we calculated the final bias on the entire metadynamics simulations and we used
this quantity to reweight the (biased) simulation. The same quantity is useful to calculate the error in
reconstructed free energies and assess whether our simulation is converged.
Let's calculate the "un-biasing weights" using the umbrella-sampling-like approach and the COLVAR file obtained
at the end of \ref master-ISDD-2-ex-3:
\verbatim
# find maximum value of bias
bmax=`awk 'BEGIN{max=0.}{if($1!="#!" && $4>max)max=$4}END{print max}' COLVAR`
# print phi values and weights
awk '{if($1!="#!") print $2,exp(($4-bmax)/kbt)}' kbt=2.494339 bmax=$bmax COLVAR > phi.weight
\endverbatim
If you inspect the `phi.weight` file, you will see that each line contains the value of the
dihedral \f$ \phi \f$ along with the corresponding weight:
\verbatim
0.907347 0.0400579
0.814296 0.0169656
1.118951 0.0651276
1.040781 0.0714174
1.218571 0.0344903
1.090823 0.0700568
1.130800 0.0622998
\endverbatim
At this point we can apply the block-analysis technique (more info in the
\ref trieste-2 tutorial) to calculate for different block sizes the average free-energy
and the error. For your convenience, you can use the `do_block_fes.py` python
script to read the `phi.weight` file and produce the desired output.
We use a bash loop to use block sizes ranging from 1 to 1000:
\verbatim
for i in `seq 1 10 1000`; do python3 do_block_fes.py phi.weight 1 -3.141593 3.018393 51 2.494339 $i; done
\endverbatim
For each value of block length `N`, you will obtain a separate `fes.N.dat` file, containing the value
of the \f$ \phi \f$ variable on a grid, the average free-energy, and the associated error (in kJ/mol):
\verbatim
-3.141593 23.184653 0.080659
-3.018393 17.264462 0.055181
-2.895194 13.360259 0.047751
-2.771994 10.772696 0.043548
-2.648794 9.403544 0.042022
\endverbatim
Finally, we can calculate the average error along the free-energy profile as a function of the block length:
\verbatim
for i in `seq 1 10 1000`; do a=`awk '{tot+=$3}END{print tot/NR}' fes.$i.dat`; echo $i $a; done > err.blocks
\endverbatim
and visualize it using `gnuplot`:
\verbatim
gnuplot> p "err.blocks" u 1:2 w lp
\endverbatim
As expected, the error increases with the block length until it reaches a plateau in correspondence of a dimension
of the block that exceeds the correlation between data points (Fig. \ref master-ISDD-2-block-phi).
\anchor master-ISDD-2-block-phi
\image html trieste-4-block-phi.png "Block analysis of a metadynamics simulation using phi as CV"
What can we learn from this analysis about the convergence of the metadynamics simulation?
\section master-ISDD-2-conclusions Conclusions
In summary, in this tutorial you should have learned how to use PLUMED to:
- Setup and run a metadynamics calculation.
- Compute free energies from the metadynamics bias potential using the \ref sum_hills utility.
- Reweight a metadynamics simulation
- Calculate errors and assess convergence
*/
......
import math
import sys
# read FILE with CVs and weights
FILENAME_ = sys.argv[1]
# number of CVs for FES
NCV_ = int(sys.argv[2])
# read minimum, maximum and number of bins for FES grid
gmin = []; gmax = []; nbin = []
for i in range(0, NCV_):
i0 = 3*i + 3
gmin.append(float(sys.argv[i0]))
gmax.append(float(sys.argv[i0+1]))
nbin.append(int(sys.argv[i0+2]))
# read KBT_
KBT_ = float(sys.argv[3*NCV_+3])
# block size
BSIZE_ = int(sys.argv[-1])
def get_indexes_from_index(index, nbin):
indexes = []
# get first index
indexes.append(index%nbin[0])
# loop
kk = index
for i in range(1, len(nbin)-1):
kk = ( kk - indexes[i-1] ) / nbin[i-1]
indexes.append(kk%nbin[i])
if(len(nbin)>=2):
indexes.append( ( kk - indexes[len(nbin)-2] ) / nbin[len(nbin) -2] )
return indexes
def get_indexes_from_cvs(cvs, gmin, dx):
keys = []
for i in range(0, len(cvs)):
keys.append(int( round( ( cvs[i] - gmin[i] ) / dx[i] ) ))
return tuple(keys)
def get_points(key, gmin, dx):
xs = []
for i in range(0, len(key)):
xs.append(gmin[i] + float(key[i]) * dx[i])
return xs
# define bin size
dx = []
for i in range(0, NCV_):
dx.append( (gmax[i]-gmin[i])/float(nbin[i]-1) )
# total numbers of bins
nbins = 1
for i in range(0, len(nbin)): nbins *= nbin[i]
# read file and store lists
cv_list=[]; w_list=[]
for lines in open(FILENAME_, "r").readlines():
riga = lines.strip().split()
# check format
if(len(riga)!=NCV_ and len(riga)!=NCV_+1):
print (FILENAME_,"is in the wrong format!")
exit()
# read CVs
cvs = []
for i in range(0, NCV_): cvs.append(float(riga[i]))
# get indexes
key = get_indexes_from_cvs(cvs, gmin, dx)
# read weight, if present
if(len(riga)==NCV_+1):
w = float(riga[NCV_])
else: w = 1.0
# store into lists
cv_list.append(key)
w_list.append(w)
# total number of data points
ndata = len(cv_list)
# number of blocks
nblock = int(ndata/BSIZE_)
# prepare histo dictionaries
histo_ave = {} ; histo_ave2 = {};
# cycle on blocks
for iblock in range(0, nblock):
# define range in CV
i0 = iblock * BSIZE_
i1 = i0 + BSIZE_
# build histo
histo = {}
for i in range(i0, i1):
if cv_list[i] in histo: histo[cv_list[i]] += w_list[i]
else: histo[cv_list[i]] = w_list[i]
# calculate average histo in block
for key in histo: histo[key] /= float(BSIZE_)
# add to global histo dictionary
for key in histo:
if key in histo_ave:
histo_ave[key] += histo[key]
histo_ave2[key] += histo[key] * histo[key]
else:
histo_ave[key] = histo[key]
histo_ave2[key] = histo[key] * histo[key]
# print out fes and error
log = open("fes."+str(BSIZE_)+".dat", "w")
# this is needed to add a blank line
xs_old = []
for i in range(0, nbins):
# get the indexes in the multi-dimensional grid
key = tuple(get_indexes_from_index(i, nbin))
# get CV values for that grid point
xs = get_points(key, gmin, dx)
# add a blank line for gnuplot
if(i == 0):
xs_old = xs[:]
else:
flag = 0
for j in range(1,len(xs)):
if(xs[j] != xs_old[j]):
flag = 1
xs_old = xs[:]
if (flag == 1): log.write("\n")
# print value of CVs
for x in xs:
log.write("%12.6lf " % x)
# calculate fes
nb = float(nblock)
if key in histo_ave:
# average and variance
aveh = histo_ave[key] / nb
s2h = (histo_ave2[key]/nb-aveh*aveh) * nb / ( nb - 1.0 )
# error
errh = math.sqrt( s2h / nb )
# free energy and error
fes = -KBT_ * math.log(aveh)
errf = KBT_ / aveh * errh
# printout
log.write(" %12.6lf %12.6lf\n" % (fes, errf))
else:
log.write(" Infinity\n")
log.close()
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment