Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
/**
\page lugano-5 Lugano tutorial: Dimensionality reduction
\section lugano-5-aim Aims
This tutorial will show you how to you can use PLUMED to perform dimensionality reduction. The tutorial will try not
to focus on the application of one particular algorithm but will instead try to show you the principles behind the
implementation of these algorithms that has been adopted within PLUMED. By the end of the tutorial you will thus be
able to design your own dimensionality reduction algorithm.
\section lugano-5-lo Objectives
Once this tutorial is completed students will
- Be able to use \ref COLLECT_FRAMES to store a trajectory for later analysis
- Be able to use \ref PCA to perform principal component analysis
- Be able to construct a dissimilarity matrix using \ref EUCLIDEAN_DISSIMILARITIES
- Be able to select a subset of landmark points to analyze with particular dimensionality reduction algorithm.
- Be able to construct low dimensional representations using \ref CLASSICAL_MDS and \ref SKETCH_MAP.
- Be able to generate projections of non-landmark points by using \ref PROJECT_ALL_ANALYSIS_DATA
\section lugano-5-resources Resources
The \tarball{lugano-4} for this project contains the following files:
- beta-hairpin.pdb : A pdb file containing the protein that we are going to study in this tutorial in a beta hairpin configuration. This input will be used as a template so that we can use the names of special groups in many of the inputs that follow.
In addition, you will also need to get a copy of the trajectory that we will be analyzing in this tutorial by executing the following command:
\verbatim
wget https://github.com/plumed/lugano2019/raw/master/handson_5/traj.dcd
\endverbatim
The trajectory we are analyzing is a smaller version of the trajectory that was analyzed in the following paper:
- https://www.frontiersin.org/articles/10.3389/fmolb.2019.00046/full
In this paper the trajectory was analyzed with a variety of different dimensionality reduction algorithms and the
results were compared. The paper may, therefore, be of interest.
This tutorial has been tested on v2.5 but it should also work with other versions of PLUMED.
Also notice that the `.solutions` direction of the tarball contains correct input files for the exercises.
Please only look at these files once you have tried to solve the problems yourself. Similarly the tutorial
below contains questions for you to answer that are shown in bold. You can reveal the answers to these questions
by clicking on links within the tutorial but you should obviously try to answer things yourself before reading these
answers.
\section lugano-5-intro Introduction
In all of the previous tutorials we have used functions that take the position of all the atoms in the system - a \f$3N\f$
dimensional vector, where \f$N\f$ is the number of atoms as input. This function then outputs a single number - the value of the collective variable -
that tells us where in a low dimensional space we should project that configuration. Problems can arise because this collective-variable function
is many-to-one and it may thus be difficult to distinguish between every different pair of structurally distinct conformers of our system.
In this tutorial we are going to introduce an alternative approach to this business of finding collective variables. In this alternative
approach we are going to stop trying to seek out a function that can take any configuration of the atoms (any \f$3N\f$-dimensional vector) and find its
low dimensional projection on the collective variable axis. Instead we are going to take a set of configurations of the atoms (a set of \f$3N\f$-dimensional
vectors of atom positions) and try to find a sensible set of projections for these configurations. We are going to find this low dimensional representation
by seeking out <a href="http://en.wikipedia.org/wiki/Isometry"> an isometry </a> between the space containing the \f$3N\f$-dimensional vectors of atom positions
and some lower-dimensional space. This idea is explained in more detail in the following <a href="https://www.youtube.com/watch?v=ofC2qz0_9_A&feature=youtu.be"> video </a>
and details on the various algorithms that we are using in the tutorial can be found in:
- https://arxiv.org/abs/1907.04170
As you will find out if you read the chapter that is linked above there are multiple ways to construct an isometric embedding of a trajectory. This tutorial will thus try to teach you a set of basic
ideas and will then encourage you to experiment and to develop your own strategy for representing the data set.
\section lugano-5-exercises Exercises
\subsection lugano-5-starting Collecting the trajectory
The first thing that we need to learn to do in order to run these dimensionality reduction algorithms is to store the trajectory so that we can analyze in later. The following input (once
the blanks are filled in) will take the positions of the non-hydrogen atoms in our protein and store them every 1 step in an object that we can refer to later in the input using the label data. All
the configurations stored in data will then be output to a pdb file once the whole trajectory is read in. Fill in the blanks in the input below now:
\plumedfile
# This reads in the template pdb file and thus allows us to use the @nonhydrogens
# special group later in the input
MOLINFO STRUCTURE=__FILL__ MOLTYPE=protein
# This stores the positions of all the nonhydrogen atoms for later analysis
cc: COLLECT_FRAMES __FILL__=@nonhydrogens
# This should output the atomic positions for the frames that were collected to a pdb file called traj.pdb
OUTPUT_ANALYSIS_DATA_TO_PDB USE_OUTPUT_DATA_FROM=__FILL__ FILE=traj.pdb
\endplumedfile
Then, once all the blanks are filled in, run the command using:
\verbatim
plumed driver --mf_dcd traj.dcd
\endverbatim
Notice that the above input stored the atomic positions of the atoms. We can use the atomic positions in many of the dimensionality reductions that will be discussed later in this tutorial or
we can use a high-dimensional vector of collective variables. The following input thus gives an example of which shows you can compute and store the values the Ramachandran angles of the protein
took in all the trajectory frames so that they can be analyzed using a dimensionality reduction algorithm. Try to fill in the blanks on this input and then run this form of analysis on the trajectory
using the command above once more:
\plumedfile
# This reads in the template pdb file and thus allows us to use the @nonhydrogens
# special group later in the input
MOLINFO STRUCTURE=__FILL__ MOLTYPE=protein
# The following commands compute all the Ramachandran angles of the protein for you
r2-phi: TORSION ATOMS=@phi-2
r2-psi: TORSION ATOMS=@psi-2
r3-phi: TORSION ATOMS=@phi-3
r3-psi: TORSION ATOMS=@psi-3
r4-phi: TORSION __FILL__
r4-psi: TORSION __FILL__
r5-phi: TORSION __FILL__
r5-psi: TORSION __FILL__
r6-phi: TORSION __FILL__
r6-psi: TORSION __FILL__
r7-phi: TORSION __FILL__
r7-psi: TORSION __FILL__
r8-phi: TORSION __FILL__
r8-psi: TORSION __FILL__
r9-phi: TORSION __FILL__
r9-psi: TORSION __FILL__
r10-phi: TORSION __FILL__
r10-psi: TORSION __FILL__
r11-phi: TORSION __FILL__
r11-psi: TORSION __FILL__
r12-phi: TORSION __FILL__
r12-psi: TORSION __FILL__
r13-phi: TORSION __FILL__
r13-psi: TORSION __FILL__
r14-phi: TORSION __FILL__
r14-psi: TORSION __FILL__
r15-phi: TORSION __FILL__
r15-psi: TORSION __FILL__
r16-phi: TORSION __FILL__
r16-psi: TORSION __FILL__
# This command stores all the Ramachandran angles that were computed
cc: COLLECT_FRAMES __FILL__=r2-phi,r2-psi,r3-phi,r3-psi,r4-phi,r4-psi,r5-phi,r5-psi,r6-phi,r6-psi,r7-phi,r7-psi,r8-phi,r8-psi,r9-phi,r9-psi,r10-phi,r10-psi,r11-phi,r11-psi,r12-phi,r12-psi,r13-phi,r13-psi,r14-phi,r14-psi,r15-phi,r15-psi,r16-phi,r16-psi
# This command outputs all the Ramachandran angles that were stored to a file called angles_data
OUTPUT_ANALYSIS_DATA_TO_COLVAR USE_OUTPUT_DATA_FROM=__FILL__ ARG=cc.* FILE=angles_data
\endplumedfile
\subsection lugano-5-pca Performing PCA
Having learned how to store data for later analysis with a dimensionality reduction algorithm lets now apply principal component analysis (PCA) upon
our stored data. In principal component analysis a low dimensional projections for our trajectory are constructed by:
- Computing a covariance matrix from the trajectory data
- Diagonalizing the covariance matrix.
- Calculating the projection of each trajectory frame on a subset of the eigenvectors of the covariance matrix.
To perform PCA using PLUMED we are going to use the following input with the blanks filled in:
\plumedfile
# This reads in the template pdb file and thus allows us to use the @nonhydrogens
# special group later in the input
MOLINFO STRUCTURE=__FILL__ MOLTYPE=protein
# This stores the positions of all the nonhydrogen atoms for later analysis
cc: COLLECT_FRAMES __FILL__=@nonhydrogens
# This diagonalizes the covariance matrix
pca: PCA USE_OUTPUT_DATA_FROM=__FILL__ METRIC=OPTIMAL NLOW_DIM=2
# This projects each of the trajectory frames onto the low dimensional space that was
# identified by the PCA command
dat: PROJECT_ALL_ANALYSIS_DATA USE_OUTPUT_DATA_FROM=__FILL__ PROJECTION=__FILL__
# This should output the atomic positions for the frames that were collected and analyzed using PCA
OUTPUT_ANALYSIS_DATA_TO_PDB USE_OUTPUT_DATA_FROM=__FILL__ FILE=traj.pdb
# This should output the PCA projections of all the coordinates
OUTPUT_ANALYSIS_DATA_TO_COLVAR USE_OUTPUT_DATA_FROM=__FILL__ ARG=dat.* FILE=pca_data
# These next three commands calculate the secondary structure variables. These
# variables measure how much of the structure resembles an alpha helix, an antiparallel beta sheet
# and a parallel beta sheet. Configurations that have different secondary structures should be projected
# in different parts of the low dimensional space.
alpha: ALPHARMSD RESIDUES=all
abeta: ANTIBETARMSD RESIDUES=all STRANDS_CUTOFF=1.0
pbeta: PARABETARMSD RESIDUES=all STRANDS_CUTOFF=1.0
# These commands collect and output the secondary structure variables so that we can use this information to
# determine how good our projection of the trajectory data is.
cc2: COLLECT_FRAMES ARG=alpha,abeta,pbeta
OUTPUT_ANALYSIS_DATA_TO_COLVAR USE_OUTPUT_DATA_FROM=cc2 ARG=cc2.* FILE=secondary_structure_data
\endplumed
To generate the projection you run the command:
\verbatim
plumed driver --mf_dcd traj.dcd
\endverbatim
I would recommend visualizing this data using the GISMO plugin to VMD. You can find instructions on how to compile this code on the page below:
http://epfl-cosmo.github.io/sketchmap/index.html?page=code
(you don't need to compile the sketch-map code) Once GISMO is installed you should have an option to open it when you open vmd. The option
to open GISMO can be found under Extensions>Analysis>GISMO. To visualize the results from what we have just done you should need to follow
the following instructions:
- Open vmd and load the pdb file that was output: traj.pdb
- Open GISMO and load the pca projections file: pca_data
- Open GISMO and load the secondary structure variables: secondary_structure_data
- You can safely ignore the error message that GISMO will give at this stage.
- Now choose to plot the quantities dat.coord-1 and dat.coord-2 on the x and y axis respectively. Color the points using cc2.alpha.
If you follow the instructions above you should get an image like the one shown below:
\anchor lugano-5-gismo
\image html lugano-5-gismo.png "Figure created using GISMO that shows where each frame of the trajectory is projected in the low-dimensional space. Points are colored in accordance with the alpha helical content of the structure."
You can click on the various points in the plot and VMD will show you the structure in the corresponding trajectory frame. Furthermore, you can get a particularly useful representation of the structures by adding the following
text to your ~/.vmdrc file:
\verbatim
user add key m {
puts "Automatic update of secondary structure, and alignment to first frame"
trace variable vmd_frame w structure_trace
rmsdtt
rmsdtt::doAlign
destroy $::rmsdtt::w
clear_reps top
mol color Structure
mol selection backbone
mol representation NewCartoon
mol addrep top
}
\endverbatim
With this text in your ~/.vmdrc file VMD will align all the structures with the first frame and then show the cartoon representation of each structure when you press the m button on your keyboard
\subsection lugano-5-mds Performing MDS
In the previous section we performed PCA on the atomic positions directly. In the section before last, however, we also saw how we can store high-dimensional vectors of collective variables and then
use these vectors as input to a dimensionality reduction algorithm. We might legitimately ask, therefore, if we can do PCA using these high-dimensional vectors as input rather than atomic positions.
The answer to this question is yes as long as the CV is not periodic. If any of our CVs are not periodic we cannot analyze them using the \ref PCA action. We can, however, formulate the PCA algorithm
in a different way. In this alternative formulation, which is known as classical multidimensional scaling (MDS) we do the following:
- We calculate the matrix of distances between configurations
- We perform an operation known as centering the matrix.
- We diagonalize the centered matrix
- The eigenvectors multiplied by the square root of the corresponding eigenvalue can then be used as a set of projections for our input points.
This method is used less often the PCA as the matrix that we have to diagonalize here in the third step can be considerably larger than the matrix that we have to diagonalize when we perform PCA. In fact
in order to avoid this expensive diagonalization step we often select a subset of so called landmark points on which to run the algorithm directly. Projections for the remaining points are then found
by using a so-called out-of-sample procedure. This is what has been done in the following input:
\plumedfile
This reads in the template pdb file and thus allows us to use the @nonhydrogens
# special group later in the input
MOLINFO STRUCTURE=beta-hairpin.pdb MOLTYPE=protein
# This stores the positions of all the nonhydrogen atoms for later analysis
cc: COLLECT_FRAMES ATOMS=@nonhydrogens
# This should output the atomic positions for the frames that were collected and analyzed using MDS
OUTPUT_ANALYSIS_DATA_TO_PDB USE_OUTPUT_DATA_FROM=cc FILE=traj.pdb
# The following commands compute all the Ramachandran angles of the protein for you
r2-phi: TORSION ATOMS=@phi-2
r2-psi: TORSION ATOMS=@psi-2
r3-phi: TORSION ATOMS=@phi-3
r3-psi: TORSION ATOMS=@psi-3
r4-phi: TORSION __FILL__
r4-psi: TORSION __FILL__
r5-phi: TORSION __FILL__
r5-psi: TORSION __FILL__
r6-phi: TORSION __FILL__
r6-psi: TORSION __FILL__
r7-phi: TORSION __FILL__
r7-psi: TORSION __FILL__
r8-phi: TORSION __FILL__
r8-psi: TORSION __FILL__
r9-phi: TORSION __FILL__
r9-psi: TORSION __FILL__
r10-phi: TORSION __FILL__
r10-psi: TORSION __FILL__
r11-phi: TORSION __FILL__
r11-psi: TORSION __FILL__
r12-phi: TORSION __FILL__
r12-psi: TORSION __FILL__
r13-phi: TORSION __FILL__
r13-psi: TORSION __FILL__
r14-phi: TORSION __FILL__
r14-psi: TORSION __FILL__
r15-phi: TORSION __FILL__
r15-psi: TORSION __FILL__
r16-phi: TORSION __FILL__
r16-psi: TORSION __FILL__
# This command stores all the Ramachandran angles that were computed
angles: COLLECT_FRAMES __FILL__=r2-phi,r2-psi,r3-phi,r3-psi,r4-phi,r4-psi,r5-phi,r5-psi,r6-phi,r6-psi,r7-phi,r7-psi,r8-phi,r8-psi,r9-phi,r9-psi,r10-phi,r10-psi,r11-phi,r11-psi,r12-phi,r12-psi,r13-phi,r13-psi,r14-phi,r14-psi,r15-phi,r15-psi,r16-phi,r16-psi
# Lets now compute the matrix of distances between the frames in the space of the Ramachandran angles
distmat: EUCLIDEAN_DISSIMILARITIES USE_OUTPUT_DATA_FROM=__FILL__ METRIC=EUCLIDEAN
# Now select 500 landmark points to analyze
fps: LANDMARK_SELECT_FPS USE_OUTPUT_DATA_FROM=__FILL__ NLANDMARKS=500
# Run MDS on the landmarks
mds: CLASSICAL_MDS __FILL__=fps NLOW_DIM=2
# Project the remaining trajectory data
osample: PROJECT_ALL_ANALYSIS_DATA USE_OUTPUT_DATA_FROM=__FILL__ PROJECTION=__FILL__
# This command outputs all the projections of all the points in the low dimensional space
OUTPUT_ANALYSIS_DATA_TO_COLVAR USE_OUTPUT_DATA_FROM=__FILL__ ARG=osample.* FILE=mds_data
# These next three commands calculate the secondary structure variables. These
# variables measure how much of the structure resembles an alpha helix, an antiparallel beta sheet
# and a parallel beta sheet. Configurations that have different secondary structures should be projected
# in different parts of the low dimensional space.
alpha: ALPHARMSD RESIDUES=all
abeta: ANTIBETARMSD RESIDUES=all STRANDS_CUTOFF=1.0
pbeta: PARABETARMSD RESIDUES=all STRANDS_CUTOFF=1.0
# These commands collect and output the secondary structure variables so that we can use this information to
# determine how good our projection of the trajectory data is.
cc2: COLLECT_FRAMES ARG=alpha,abeta,pbeta
OUTPUT_ANALYSIS_DATA_TO_COLVAR USE_OUTPUT_DATA_FROM=cc2 ARG=cc2.* FILE=secondary_structure_data
\endplumedfile
This input collects all the torsional angles for the configurations in the trajectory. Then, at the end of the calculation, the matrix of distances between these points is computed and a set of landmark points
is selected using a method known as farthest point sampling. A matrix that contains only those distances between the landmarks is then constructed and diagonalized by the \ref CLASSICAL_MDS action so that
projections of the landmarks can be constructed. The final step is then to project the remainder of the trajectory using the \ref PROJECT_ALL_ANALYSIS_DATA action. Try to fill in the blanks in the input above
and run this calculation now using the command:
\verbatim
plumed driver --mf_dcd traj.dcd
\endverbatim
Once the calculation has completed you can, once again, visualize the data generated using the GISMO plugin.
\subsection lugano-5-smap Performing sketch-map
The two algorithms (PCA and MDS) that we have looked at thus far are both linear dimensionality reduction algorithms. In addition to these there are a whole class of non-linear dimensionality reduction
reduction algorithms which work by transforming the matrix of dissimilarities between configurations, calculating geodesic rather than Euclidean distances between configurations or by changing the form of the
loss function that is optimized. In this final exercise we are going to use an algorithm that uses the last of the these three strategies to construct a non-linear projection. The algorithm is known as sketch-map
and an input for sketch-map is provided below:
\plumedfile
# This reads in the template pdb file and thus allows us to use the @nonhydrogens
# special group later in the input
MOLINFO STRUCTURE=__FILL__ MOLTYPE=protein
# This stores the positions of all the nonhydrogen atoms for later analysis
cc: COLLECT_FRAMES __FILL__=@nonhydrogens
# This should output the atomic positions for the frames that were collected and analyzed using MDS
OUTPUT_ANALYSIS_DATA_TO_PDB USE_OUTPUT_DATA_FROM=__FILL__ FILE=traj.pdb
# The following commands compute all the Ramachandran angles of the protein for you
r2-phi: TORSION ATOMS=@phi-2
r2-psi: TORSION ATOMS=@psi-2
r3-phi: TORSION ATOMS=@phi-3
r3-psi: TORSION ATOMS=@psi-3
r4-phi: TORSION __FILL__
r4-psi: TORSION __FILL__
r5-phi: TORSION __FILL__
r5-psi: TORSION __FILL__
r6-phi: TORSION __FILL__
r6-psi: TORSION __FILL__
r7-phi: TORSION __FILL__
r7-psi: TORSION __FILL__
r8-phi: TORSION __FILL__
r8-psi: TORSION __FILL__
r9-phi: TORSION __FILL__
r9-psi: TORSION __FILL__
r10-phi: TORSION __FILL__
r10-psi: TORSION __FILL__
r11-phi: TORSION __FILL__
r11-psi: TORSION __FILL__
r12-phi: TORSION __FILL__
r12-psi: TORSION __FILL__
r13-phi: TORSION __FILL__
r13-psi: TORSION __FILL__
r14-phi: TORSION __FILL__
r14-psi: TORSION __FILL__
r15-phi: TORSION __FILL__
r15-psi: TORSION __FILL__
r16-phi: TORSION __FILL__
r16-psi: TORSION __FILL__
# This command stores all the Ramachandran angles that were computed
angles: COLLECT_FRAMES __FILL__=r2-phi,r2-psi,r3-phi,r3-psi,r4-phi,r4-psi,r5-phi,r5-psi,r6-phi,r6-psi,r7-phi,r7-psi,r8-phi,r8-psi,r9-phi,r9-psi,r10-phi,r10-psi,r11-phi,r11-psi,r12-phi,r12-psi,r13-phi,r13-psi,r14-phi,r14-psi,r15-phi,r15-psi,r16-phi,r16-psi
# Lets now compute the matrix of distances between the frames in the space of the Ramachandran angles
distmat: EUCLIDEAN_DISSIMILARITIES USE_OUTPUT_DATA_FROM=__FILL__ METRIC=EUCLIDEAN
# Now select 500 landmark points to analyze
fps: LANDMARK_SELECT_FPS USE_OUTPUT_DATA_FROM=__FILL__ NLANDMARKS=500
# Run sketch-map on the landmarks
smap: SKETCH_MAP __FILL__=fps NLOW_DIM=2 HIGH_DIM_FUNCTION={SMAP R_0=6 A=8 B=2} LOW_DIM_FUNCTION={SMAP R_0=6 A=2 B=2} CGTOL=1E-3 CGRID_SIZE=20 FGRID_SIZE=200 ANNEAL_STEPS=0
# Project the remaining trajectory data
osample: PROJECT_ALL_ANALYSIS_DATA USE_OUTPUT_DATA_FROM=__FILL__ PROJECTION=__FILL__
# This command outputs all the projections of all the points in the low dimensional space
OUTPUT_ANALYSIS_DATA_TO_COLVAR USE_OUTPUT_DATA_FROM=__FILL__ ARG=osample.* FILE=smap_data
# These next three commands calculate the secondary structure variables. These
# variables measure how much of the structure resembles an alpha helix, an antiparallel beta sheet
# and a parallel beta sheet. Configurations that have different secondary structures should be projected
# in different parts of the low dimensional space.
alpha: ALPHARMSD RESIDUES=all
abeta: ANTIBETARMSD RESIDUES=all STRANDS_CUTOFF=1.0
pbeta: PARABETARMSD RESIDUES=all STRANDS_CUTOFF=1.0
# These commands collect and output the secondary structure variables so that we can use this information to
# determine how good our projection of the trajectory data is.
cc2: COLLECT_FRAMES ARG=alpha,abeta,pbeta
OUTPUT_ANALYSIS_DATA_TO_COLVAR USE_OUTPUT_DATA_FROM=cc2 ARG=cc2.* FILE=secondary_structure_data
\endplumedfile
This input collects all the torsional angles for the configurations in the trajectory. Then, at the end of the calculation, the matrix of distances between these points is computed and a set of landmark points
is selected using a method known as farthest point sampling. A matrix that contains only those distances between the landmarks is then constructed and diagonalized by the \ref CLASSICAL_MDS action and this
set of projections is used as the initial configuration for the various minimization algorithms that are then used to optimize the sketch-map stress function. As in the previous exercise once the projections of
the landmarks are found the projections for the remainder of the points in the trajectory are found by using the \ref PROJECT_ALL_ANALYSIS_DATA action. Try to fill in the blanks in the input above
and run this calculation now using the command:
\verbatim
plumed driver --mf_dcd traj.dcd
\endverbatim
Once the calculation has completed you can, once again, visualize the data generated using the GISMO plugin.
\section lugano-5-extensions Conclusions and extensions
This tutorial shown you that running dimensionality reduction algorithms using PLUMED involves the following stages:
- Data is collected from the trajectory using \ref COLLECT_FRAMES.
- Landmark points are selected using a \ref landmarks algorithm
- The distances between the trajectory frames are computed using \ref EUCLIDEAN_DISSIMILARITIES
- A loss function is optimized in order to generate projections of the landmarks.
- Projections of the non-landmark points are generated using \ref PROJECT_ALL_ANALYSIS_DATA.
There are multiple choices to be made in each of the various stages described above. For example, you can change the particular sort of data this is collected from the
trajectory, there are multiple different ways to select landmarks, you can use the distances directly or you can transform them, you can use various different loss function and you can
optimize the loss function using a variety of different algorithms. In this final exercise of the tutorial I thus want you to experiment with these various different choices that can
be made. Use the data set that we have been working with throughout this tutorial and try to construct an interesting representation of it using some combination of Actions that we have
not explored in the tutorial. Some things you can perhaps try:
- Try sketch-map with RMSD distances as input rather than angles
- Try using different \ref landmarks algorithms
- Try using different numbers of landmarks
- Try to use PCA followed by sketch-map
- See if you can work out how to draw contour plot showing the free energy as a function of the low-dimensional coordinates.
*/
link: @subpage lugano-5
description: How to perform dimensionality reduction using PLUMED
additional-files: lugano-5