This page describes how to run constant pH replica exchange MD (pH-REMD) in which replicas are run at the same temperature but different pH values. Note, this tutorial is for implicit (Generalized Born) solvent. After you learn how to run CpHMD in explicit solvent (click here for tutorial), you can set up each replica to run in explicit solvent instead.
Note: You must have at least version 14 of Amber and AmberTools to run pH-REMD.
Table of Contents
|
Creating the input files
Creating the input files for pH-REM simulations is identical to creating input files for normal constant pH MD (CpHMD) simulations.
Generating your structures
For this part, I will assume that you have a PDB file from a crystal structure or similar. For the purposes of this example, I will use the crystal structure of HEWL from the PDB 1AKI. You can download it here. You then have to modify the PDB file to get rid of all of the crystallographic waters and information at the top. The result is here
This step is handled with LEaP (particularly tleap) from AmberTools. Follow the steps below to generate your topology files
- Determine which residues you wish to titrate. Depending on which residues you wish to titrate, you must substitute titrating residues for their non-titrating versions. That is, Amber knows how to titrate the aspartate residue AS4 (an aspartate with 4 hydrogens — syn- and anti- positions on both carboxylate oxygens), but it does not know how to titrate ASP or ASH. The substitutions you must make are listed below (note some residues do not require a name change):
- ASP -> AS4
- GLU -> GL4
- HIS -> HIP
- LYS -> LYS
- TYR -> TYR
- CYS -> CYS
- Start tleap, loading the constant pH LEaP resource file: tleap -f leaprc.constph
- Save your topology file (loading any extra parameters if necessary)
Starting tleap with leaprc.constph
should look something like this:
bash$ tleap -f leaprc.constph
-I: Adding /home/swails/build_amber/amber/dat/leap/prep to search path.
-I: Adding /home/swails/build_amber/amber/dat/leap/lib to search path.
-I: Adding /home/swails/build_amber/amber/dat/leap/parm to search path.
-I: Adding /home/swails/build_amber/amber/dat/leap/cmd to search path.
-f: Source leaprc.constph.
Welcome to LEaP!
(no leaprc in search path)
Sourcing: /home/swails/build_amber/amber/dat/leap/cmd/leaprc.constph
----- Source: /home/swails/build_amber/amber/dat/leap/cmd/leaprc.ff10
----- Source of /home/swails/build_amber/amber/dat/leap/cmd/leaprc.ff10 done
Log file: ./leap.log
Loading parameters: /home/swails/build_amber/amber/dat/leap/parm/parm10.dat
Reading title:
PARM99 + frcmod.ff99SB + frcmod.parmbsc0 + OL3 for RNA
Loading library: /home/swails/build_amber/amber/dat/leap/lib/amino10.lib
Loading library: /home/swails/build_amber/amber/dat/leap/lib/aminoct10.lib
Loading library: /home/swails/build_amber/amber/dat/leap/lib/aminont10.lib
Loading library: /home/swails/build_amber/amber/dat/leap/lib/phosphoaa10.lib
Loading library: /home/swails/build_amber/amber/dat/leap/lib/nucleic10.lib
Loading library: /home/swails/build_amber/amber/dat/leap/lib/ions08.lib
Loading library: /home/swails/build_amber/amber/dat/leap/lib/solvents.lib
Loading library: /home/swails/build_amber/amber/dat/leap/lib/constph.lib
Loading library: /home/swails/build_amber/amber/dat/leap/lib/all_prot_nucleic10.lib
Loading library: /home/swails/build_amber/amber/dat/leap/lib/cph_nucleic_caps.lib
Loading parameters: /home/swails/build_amber/amber/dat/leap/parm/frcmod.constph
Reading force field modification type file (frcmod)
Reading title:
Force field modifcations for titrations
Loading parameters: /home/swails/build_amber/amber/dat/leap/parm/frcmod.protonated_nucleic
Reading force field modification type file (frcmod)
Reading title:
Force field modifications for protonated nucleic acids
Using H(N)-modified Bondi radii
> [Insert leap commands here]
The resulting prmtop can be be created using the following tleap script (don't forget about the disulfides!):
source leaprc.constph
# Load the PDBs
1aki = loadPDB 1AKI_modified.pdb
# Load ion force field params
loadAmberParams frcmod.ionsjc_tip3p
# Make the disulfide bonds
bond 1aki.6.SG 1aki.127.SG
bond 1aki.30.SG 1aki.115.SG
bond 1aki.64.SG 1aki.80.SG
bond 1aki.76.SG 1aki.94.SG
# Save topology files
saveAmberParm 1aki 1AKI.dry.parm7 1AKI.dry.rst7
# Quit
quit
Creating the CPIN file
Now that you have a topology file and an input coordinate file, you must create an input file that tells sander what parts of your system you wish to titrate. Use the cpinutil.py script, included with AmberTools, to accomplish this. It is used with the following syntax:
bash$ $ cpinutil.py -h
usage: cpinutil.py [Options]
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-d, --debug Enable verbose tracebacks to debug this program
Output files:
-o FILE, --output FILE
Output file. Defaults to standard output
-op FILE, --output-prmtop FILE
For explicit solvent simulations, a custom set of
radii are necessary to obtain reasonable results for
carboxylate pKas (e.g., AS4 and GL4 residues). If
specified, this file will be the prmtop compatible
with the reference energies in the printed cpin file.
Required Arguments:
-p FILE Topology file to be used in constant pH simulation
Simulation Options:
-igb IGB Generalized Born model which you intend to use to
evaluate dynamics (or protonation state swaps).
Default is 2.
-intdiel DIEL Internal dielectric constant to use in the evaluation
of the GB potential. Default 1.0.
Residue Selection Options:
-resnames [RES [RES ...]]
Residue names to include in CPIN file
-notresnames [RES [RES ...]]
Residue names to exclude from CPIN file
-resnums [NUM [NUM ...]]
Residue numbers to include in CPIN file
-notresnums [NUM [NUM ...]]
Residue numbers to exclude from CPIN file
-minpKa pKa Minimum reference pKa to include in CPIN file
-maxpKa pKa Maximum reference pKa to include in CPIN file
System Information:
-states [NUM [NUM ...]]
List of default states to assign to titratable
residues
-system <system name>
Name of system to titrate. No effect on simulation.
Residue Information:
If any options here are used, no CPIN file will be written. These
arguments take precedence and are mutually exclusive with each other.
--describe [RESNAME [RESNAME ...]]
Print out the details of given residues
-l, --list List all titratable residues
--counter-ions Transform a water molecule into a chloride ion when
deprotonating (or a water into a chloride while
protonating) to maintain charge neutrality
This program will read a topology file and generate a cpin file for constant
pH simulations with sander
The only required flag is -p prmtop — the rest of the flags are there to give you control over which residues you allow to titrate. You can pick specific residue numbers (in which the first residue is residue 1) using the -resnum flag or specific residue names using the -resname flag. Likewise, -notresnum and -notresname can be used to omit certain residue names and numbers. We can also pre-define the protonation states for each residue using the -states flag.
The original implementation of CpHMD by Mongan, et. al. used the igb=2 Generalized Born model, so that is what we use here.
bash$ cpinutil.py -p 1AKI.prmtop -resname AS4,GL4,HIP -igb 2 -o 1AKI.cpin
CPin created!
bash$ cat 1AKI.cpin
&CNSTPH
CHRGDAT=-0.4157,0.2719,0.0145,0.0779,-0.0398,-0.0173,-0.0173,0.0136,-0.0425,
-0.0425,0.8054,-0.8188,-0.8188,0.0,0.5973,-0.5679,0.0,0.0,0.0,-0.4157,0.2719,
0.0145,0.0779,-0.0071,0.0256,0.0256,-0.0174,0.043,0.043,0.6801,-0.5838,-0.6511,
0.4641,0.5973,-0.5679,0.0,0.0,0.0,-0.4157,0.2719,0.0145,0.0779,-0.0071,0.0256,
0.0256,-0.0174,0.043,0.043,0.6801,-0.5838,-0.6511,0.0,0.5973,-0.5679,0.4641,
0.0,0.0,-0.4157,0.2719,0.0145,0.0779,-0.0071,0.0256,0.0256,-0.0174,0.043,0.043,
0.6801,-0.6511,-0.5838,0.0,0.5973,-0.5679,0.0,0.4641,0.0,-0.4157,0.2719,0.0145,
0.0779,-0.0071,0.0256,0.0256,-0.0174,0.043,0.043,0.6801,-0.6511,-0.5838,0.0,
0.5973,-0.5679,0.0,0.0,0.4641,-0.3479,0.2747,-0.1354,0.1212,-0.0414,0.081,
0.081,-0.0012,-0.1513,0.3866,-0.017,0.2681,-0.1718,0.3911,-0.1141,0.2317,
0.7341,-0.5894,-0.3479,0.2747,-0.1354,0.1212,-0.111,0.0402,0.0402,-0.0266,
-0.3811,0.3649,0.2057,0.1392,-0.5727,0.0,0.1292,0.1147,0.7341,-0.5894,-0.3479,
0.2747,-0.1354,0.1212,-0.1012,0.0367,0.0367,0.1868,-0.5432,0.0,0.1635,0.1435,
-0.2795,0.3339,-0.2207,0.1862,0.7341,-0.5894,-0.4157,0.2719,0.0341,0.0864,
-0.1783,-0.0122,-0.0122,0.7994,-0.8014,-0.8014,0.0,0.5973,-0.5679,0.0,0.0,0.0,
-0.4157,0.2719,0.0341,0.0864,-0.0316,0.0488,0.0488,0.6462,-0.5554,-0.6376,
0.4747,0.5973,-0.5679,0.0,0.0,0.0,-0.4157,0.2719,0.0341,0.0864,-0.0316,0.0488,
0.0488,0.6462,-0.5554,-0.6376,0.0,0.5973,-0.5679,0.4747,0.0,0.0,-0.4157,0.2719,
0.0341,0.0864,-0.0316,0.0488,0.0488,0.6462,-0.6376,-0.5554,0.0,0.5973,-0.5679,
0.0,0.4747,0.0,-0.4157,0.2719,0.0341,0.0864,-0.0316,0.0488,0.0488,0.6462,
-0.6376,-0.5554,0.0,0.5973,-0.5679,0.0,0.0,0.4747,
PROTCNT=0,1,1,1,1,2,1,1,0,1,1,1,1,
RESNAME='System: Unknown','Residue: GL4 7','Residue: HIP 15','Residue: AS4 18',
'Residue: GL4 35','Residue: AS4 48','Residue: AS4 52','Residue: AS4 66',
'Residue: AS4 87','Residue: AS4 101','Residue: AS4 119',
RESSTATE=0,0,0,0,0,0,0,0,0,0,
STATEINF(0)%FIRST_ATOM=102, STATEINF(0)%FIRST_CHARGE=0,
STATEINF(0)%FIRST_STATE=0, STATEINF(0)%NUM_ATOMS=19, STATEINF(0)%NUM_STATES=5,
STATEINF(1)%FIRST_ATOM=233, STATEINF(1)%FIRST_CHARGE=95,
STATEINF(1)%FIRST_STATE=5, STATEINF(1)%NUM_ATOMS=18, STATEINF(1)%NUM_STATES=3,
STATEINF(2)%FIRST_ATOM=277, STATEINF(2)%FIRST_CHARGE=149,
STATEINF(2)%FIRST_STATE=8, STATEINF(2)%NUM_ATOMS=16, STATEINF(2)%NUM_STATES=5,
STATEINF(3)%FIRST_ATOM=543, STATEINF(3)%FIRST_CHARGE=0,
STATEINF(3)%FIRST_STATE=0, STATEINF(3)%NUM_ATOMS=19, STATEINF(3)%NUM_STATES=5,
STATEINF(4)%FIRST_ATOM=742, STATEINF(4)%FIRST_CHARGE=149,
STATEINF(4)%FIRST_STATE=8, STATEINF(4)%NUM_ATOMS=16, STATEINF(4)%NUM_STATES=5,
STATEINF(5)%FIRST_ATOM=790, STATEINF(5)%FIRST_CHARGE=149,
STATEINF(5)%FIRST_STATE=8, STATEINF(5)%NUM_ATOMS=16, STATEINF(5)%NUM_STATES=5,
STATEINF(6)%FIRST_ATOM=1029, STATEINF(6)%FIRST_CHARGE=149,
STATEINF(6)%FIRST_STATE=8, STATEINF(6)%NUM_ATOMS=16, STATEINF(6)%NUM_STATES=5,
STATEINF(7)%FIRST_ATOM=1327, STATEINF(7)%FIRST_CHARGE=149,
STATEINF(7)%FIRST_STATE=8, STATEINF(7)%NUM_ATOMS=16, STATEINF(7)%NUM_STATES=5,
STATEINF(8)%FIRST_ATOM=1537, STATEINF(8)%FIRST_CHARGE=149,
STATEINF(8)%FIRST_STATE=8, STATEINF(8)%NUM_ATOMS=16, STATEINF(8)%NUM_STATES=5,
STATEINF(9)%FIRST_ATOM=1811, STATEINF(9)%FIRST_CHARGE=149,
STATEINF(9)%FIRST_STATE=8, STATEINF(9)%NUM_ATOMS=16, STATEINF(9)%NUM_STATES=5,
STATENE=0,14.4542090223,14.4542090223,14.4542090223,14.4542090223,0,
-11.7770114534,-16.3478974337,0,32.3880313021,32.3880313021,32.3880313021,
32.3880313021,
TRESCNT=10,
/
Simulation Setup
At this point, we have our topology file (prmtop), input coordinate file (inpcrd) and constant pH input file (cpin), so we can set up our simulation. We will follow the 3 standard steps: minimization, heating, and equilibration (note the apparent misnomer — we're not actually simulating to equilibrium, we're really relaxing our structure).
All steps here can be run using sander, pmemd, or pmemd.cuda (I implemented CpHMD in implicit solvent along with pH-REMD on GPUs)
Minimization
Our first step is to minimize our structure. I typically like to restrain the backbone of the system. A sample input that restrains the backbone with a force constant of 10 kcal/mol/A^2 is shown below:
Minimize for constant pH REM simulations
&cntrl
ntb=0, imin=1, igb=2, ntr=1,
maxcyc=1000, cut=1000.0,
restraint_wt=10.0,
restraintmask='@CA,C,O,N',
/
Heating
Our next step is to heat our structure. I typically keep restraints on the backbone like I did with the minimization, although the restraints are notably weaker. I also use the nmropt=1 facility to heat the system slowly and linearly throughout the first 2/3 of the simulation. A sample input file for the heating step is shown below:
Heating step for constant pH REM simulations
&cntrl
nstlim=500000, ntc=2, ig=-1, ntb=0,
igb=2, ntr=1, ntt=3, ntwr=10000,
cut=1000.0, dt=0.002, ntwx=1000,
ntf=2, restraint_wt=1.0, ntpr=1000,
gamma_ln=5.0, ioutfm=1, nmropt=1,
restraintmask='@CA,C,O,N',
/
&wt
TYPE='TEMP0', ISTEP1=0, ISTEP2=333333
VALUE1=10.00, VALUE2=300.00,
/
&wt TYPE='END' /
Equilibration (Relaxation)
Our next step is to 'equilibrate' our system. In this step, for the first time, we use our cpin file that we generated above. We use this file because it will make sander assign the appropriate charges for each protonation state. Whether or not you allow the protonation states to change (controlled by setting ntcnstph lower than nstlim to allow changes or setting ntcnstph greater than nstlim to prevent changes) is a matter of personal preference. It should not matter for your final answer. In this example, I do not allow exchanges, because it allows me to use the same ending structure for each pH I simulate at later. I still keep some restraints on the backbone, but in this step they are now very weak (0.1 kcal/mol/A^2). A sample input is shown below:
Equilibration (relaxation) mdin for pH-REM, no protonation changes
&cntrl
icnstph=1, dt=0.002, ioutfm=1,
nstlim=1000000, ig=-1, ntb=0,
ntwr=10000, ntwx=1000, irest=1,
cut=1000.0, ntcnstph=1000000,
ntpr=1000, ntx=5, saltcon=0.1,
restraint_wt=0.1, ntr=1, ntt=3, ntc=2,
ntf=2, restraintmask='@CA,C,O,N',
gamma_ln=10.0, igb=2,
/
Note that at this point, we only have a single structure. This will serve as the starting point for each of our replicas next.
Setting up for pH-REMD
At this point, we have a single, 'equilibrated' structure, and we wish to start a pH ladder of replicas. To run replica exchange simulations, we need a groupfile, the -rem and -remlog command-line flags, and the variables nstlim and numexchg in the &cntrl namelist of the input files.
Input (mdin) files
Each replica needs its own input file (because each replica runs with a different pH value). A couple notes about these input files:
- nstlim now means something different. In replica exchange, nstlim is the number of steps between exchange attempts.
- numexchg is the number of times to attempt exchanging. Therefore, the total number of steps in a simulation can be found by multiplying nstlim and numexchg together.
- icnstph is the flag to turn on constant pH simulations.
- solvph is the solvent pH — this should be different for each replica
- ntcnstph is the number of steps between attempting protonation state changes. While it should work for nstlim to be smaller than ntcnstph, I would suggest always having ntcnstph smaller than nstlim.
- saltcon is the salt concentration. Because the reference compounds were calculated using a salt concentration of 0.1 M, you should make sure that is present here as well.
An example input file (for one replica) is shown below:
REM for CpH
&cntrl
icnstph=1, dt=0.002, ioutfm=1, ntxo=2,
nstlim=100, ig=-1, ntb=0, numexchg=5000,
ntwr=10000, ntwx=1000, irest=1,
cut=1000.0, ntcnstph=5, ntpr=1000,
ntx=5, solvph=2, saltcon=0.1, ntt=3,
ntc=2, ntf=2, gamma_ln=10.0, igb=2,
/
In this step, we are attempting replica exchanges every 100 steps (200 fs) and attempting protonation state changes every 5 steps (10 fs). Note this is the sample input file for the replica running with pH set to 2.
The groupfile
This input file defines all of the replicas. In a groupfile, each line looks like a separate sequence of command-line flags for a separate instance of sander (or pmemd). A couple notes:
- Each line should contain the -rem 4 and -remlog rem.log options (where rem.log is the file you want the replica exchange information is stored).
- Each line should have a different input file, but the same topology file. The input coordinate file can differ (and will for restarts), or it can be the same.
- There must be an even number of replicas so that each replica has a partner
- The ordering of the replicas in the groupfile does not matter. A pH ladder (much like a temperature) ladder is defined, and exchange partners are chosen that way.
An example groupfile is shown below:
# pH 2
-O -i pH2/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH2/1AKI.dry.md1.rst7 -cpin pH2/1AKI.dry.md1.cpin -o pH2/1AKI.dry.md2.mdout -cpout pH2/1AKI.dry.md2.cpout -cprestrt pH2/1AKI.dry.md2.cpin -r pH2/1AKI.dry.md2.rst7 -inf pH2/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH2/1AKI.dry.md2.nc
# pH 2.5
-O -i pH2.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH2.5/1AKI.dry.md1.rst7 -cpin pH2.5/1AKI.dry.md1.cpin -o pH2.5/1AKI.dry.md2.mdout -cpout pH2.5/1AKI.dry.md2.cpout -cprestrt pH2.5/1AKI.dry.md2.cpin -r pH2.5/1AKI.dry.md2.rst7 -inf pH2.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH2.5/1AKI.dry.md2.nc
# pH 3
-O -i pH3/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH3/1AKI.dry.md1.rst7 -cpin pH3/1AKI.dry.md1.cpin -o pH3/1AKI.dry.md2.mdout -cpout pH3/1AKI.dry.md2.cpout -cprestrt pH3/1AKI.dry.md2.cpin -r pH3/1AKI.dry.md2.rst7 -inf pH3/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH3/1AKI.dry.md2.nc
# pH 3.5
-O -i pH3.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH3.5/1AKI.dry.md1.rst7 -cpin pH3.5/1AKI.dry.md1.cpin -o pH3.5/1AKI.dry.md2.mdout -cpout pH3.5/1AKI.dry.md2.cpout -cprestrt pH3.5/1AKI.dry.md2.cpin -r pH3.5/1AKI.dry.md2.rst7 -inf pH3.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH3.5/1AKI.dry.md2.nc
# pH 4
-O -i pH4/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH4/1AKI.dry.md1.rst7 -cpin pH4/1AKI.dry.md1.cpin -o pH4/1AKI.dry.md2.mdout -cpout pH4/1AKI.dry.md2.cpout -cprestrt pH4/1AKI.dry.md2.cpin -r pH4/1AKI.dry.md2.rst7 -inf pH4/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH4/1AKI.dry.md2.nc
# pH 4.5
-O -i pH4.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH4.5/1AKI.dry.md1.rst7 -cpin pH4.5/1AKI.dry.md1.cpin -o pH4.5/1AKI.dry.md2.mdout -cpout pH4.5/1AKI.dry.md2.cpout -cprestrt pH4.5/1AKI.dry.md2.cpin -r pH4.5/1AKI.dry.md2.rst7 -inf pH4.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH4.5/1AKI.dry.md2.nc
# pH 5
-O -i pH5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH5/1AKI.dry.md1.rst7 -cpin pH5/1AKI.dry.md1.cpin -o pH5/1AKI.dry.md2.mdout -cpout pH5/1AKI.dry.md2.cpout -cprestrt pH5/1AKI.dry.md2.cpin -r pH5/1AKI.dry.md2.rst7 -inf pH5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH5/1AKI.dry.md2.nc
# pH 5.5
-O -i pH5.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH5.5/1AKI.dry.md1.rst7 -cpin pH5.5/1AKI.dry.md1.cpin -o pH5.5/1AKI.dry.md2.mdout -cpout pH5.5/1AKI.dry.md2.cpout -cprestrt pH5.5/1AKI.dry.md2.cpin -r pH5.5/1AKI.dry.md2.rst7 -inf pH5.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH5.5/1AKI.dry.md2.nc
# pH 6
-O -i pH6/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH6/1AKI.dry.md1.rst7 -cpin pH6/1AKI.dry.md1.cpin -o pH6/1AKI.dry.md2.mdout -cpout pH6/1AKI.dry.md2.cpout -cprestrt pH6/1AKI.dry.md2.cpin -r pH6/1AKI.dry.md2.rst7 -inf pH6/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH6/1AKI.dry.md2.nc
# pH 6.5
-O -i pH6.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH6.5/1AKI.dry.md1.rst7 -cpin pH6.5/1AKI.dry.md1.cpin -o pH6.5/1AKI.dry.md2.mdout -cpout pH6.5/1AKI.dry.md2.cpout -cprestrt pH6.5/1AKI.dry.md2.cpin -r pH6.5/1AKI.dry.md2.rst7 -inf pH6.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH6.5/1AKI.dry.md2.nc
# pH 7
-O -i pH7/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH7/1AKI.dry.md1.rst7 -cpin pH7/1AKI.dry.md1.cpin -o pH7/1AKI.dry.md2.mdout -cpout pH7/1AKI.dry.md2.cpout -cprestrt pH7/1AKI.dry.md2.cpin -r pH7/1AKI.dry.md2.rst7 -inf pH7/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH7/1AKI.dry.md2.nc
# pH 7.5
-O -i pH7.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH7.5/1AKI.dry.md1.rst7 -cpin pH7.5/1AKI.dry.md1.cpin -o pH7.5/1AKI.dry.md2.mdout -cpout pH7.5/1AKI.dry.md2.cpout -cprestrt pH7.5/1AKI.dry.md2.cpin -r pH7.5/1AKI.dry.md2.rst7 -inf pH7.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH7.5/1AKI.dry.md2.nc
Restarting simulations
You often want to restart simulations. For instance, it's much easier to run 10 simulations of 2 ns each rather than a single 20 ns simulation. In this case, restarting simulations can be done by using the restart file from each replica as the input coordinate file for that same replica in the restart. Likewise, the cprestrt file from each replica becomes the cpin file for that same replica in the restart. The pH is automatically adjusted based on the value stored in the restart file (analogous to what happens for T-REMD). For example, the groupfile for the next simulation following the groupfile above is shown below:
# pH 2
-O -i pH2/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH2/1AKI.dry.md2.rst7 -cpin pH2/1AKI.dry.md2.cpin -o pH2/1AKI.dry.md3.mdout -cpout pH2/1AKI.dry.md3.cpout -cprestrt pH2/1AKI.dry.md3.cpin -r pH2/1AKI.dry.md3.rst7 -inf pH2/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH2/1AKI.dry.md3.nc
# pH 2.5
-O -i pH2.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH2.5/1AKI.dry.md2.rst7 -cpin pH2.5/1AKI.dry.md2.cpin -o pH2.5/1AKI.dry.md3.mdout -cpout pH2.5/1AKI.dry.md3.cpout -cprestrt pH2.5/1AKI.dry.md3.cpin -r pH2.5/1AKI.dry.md3.rst7 -inf pH2.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH2.5/1AKI.dry.md3.nc
# pH 3
-O -i pH3/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH3/1AKI.dry.md2.rst7 -cpin pH3/1AKI.dry.md2.cpin -o pH3/1AKI.dry.md3.mdout -cpout pH3/1AKI.dry.md3.cpout -cprestrt pH3/1AKI.dry.md3.cpin -r pH3/1AKI.dry.md3.rst7 -inf pH3/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH3/1AKI.dry.md3.nc
# pH 3.5
-O -i pH3.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH3.5/1AKI.dry.md2.rst7 -cpin pH3.5/1AKI.dry.md2.cpin -o pH3.5/1AKI.dry.md3.mdout -cpout pH3.5/1AKI.dry.md3.cpout -cprestrt pH3.5/1AKI.dry.md3.cpin -r pH3.5/1AKI.dry.md3.rst7 -inf pH3.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH3.5/1AKI.dry.md3.nc
# pH 4
-O -i pH4/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH4/1AKI.dry.md2.rst7 -cpin pH4/1AKI.dry.md2.cpin -o pH4/1AKI.dry.md3.mdout -cpout pH4/1AKI.dry.md3.cpout -cprestrt pH4/1AKI.dry.md3.cpin -r pH4/1AKI.dry.md3.rst7 -inf pH4/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH4/1AKI.dry.md3.nc
# pH 4.5
-O -i pH4.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH4.5/1AKI.dry.md2.rst7 -cpin pH4.5/1AKI.dry.md2.cpin -o pH4.5/1AKI.dry.md3.mdout -cpout pH4.5/1AKI.dry.md3.cpout -cprestrt pH4.5/1AKI.dry.md3.cpin -r pH4.5/1AKI.dry.md3.rst7 -inf pH4.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH4.5/1AKI.dry.md3.nc
# pH 5
-O -i pH5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH5/1AKI.dry.md2.rst7 -cpin pH5/1AKI.dry.md2.cpin -o pH5/1AKI.dry.md3.mdout -cpout pH5/1AKI.dry.md3.cpout -cprestrt pH5/1AKI.dry.md3.cpin -r pH5/1AKI.dry.md3.rst7 -inf pH5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH5/1AKI.dry.md3.nc
# pH 5.5
-O -i pH5.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH5.5/1AKI.dry.md2.rst7 -cpin pH5.5/1AKI.dry.md2.cpin -o pH5.5/1AKI.dry.md3.mdout -cpout pH5.5/1AKI.dry.md3.cpout -cprestrt pH5.5/1AKI.dry.md3.cpin -r pH5.5/1AKI.dry.md3.rst7 -inf pH5.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH5.5/1AKI.dry.md3.nc
# pH 6
-O -i pH6/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH6/1AKI.dry.md2.rst7 -cpin pH6/1AKI.dry.md2.cpin -o pH6/1AKI.dry.md3.mdout -cpout pH6/1AKI.dry.md3.cpout -cprestrt pH6/1AKI.dry.md3.cpin -r pH6/1AKI.dry.md3.rst7 -inf pH6/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH6/1AKI.dry.md3.nc
# pH 6.5
-O -i pH6.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH6.5/1AKI.dry.md2.rst7 -cpin pH6.5/1AKI.dry.md2.cpin -o pH6.5/1AKI.dry.md3.mdout -cpout pH6.5/1AKI.dry.md3.cpout -cprestrt pH6.5/1AKI.dry.md3.cpin -r pH6.5/1AKI.dry.md3.rst7 -inf pH6.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH6.5/1AKI.dry.md3.nc
# pH 7
-O -i pH7/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH7/1AKI.dry.md2.rst7 -cpin pH7/1AKI.dry.md2.cpin -o pH7/1AKI.dry.md3.mdout -cpout pH7/1AKI.dry.md3.cpout -cprestrt pH7/1AKI.dry.md3.cpin -r pH7/1AKI.dry.md3.rst7 -inf pH7/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH7/1AKI.dry.md3.nc
# pH 7.5
-O -i pH7.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH7.5/1AKI.dry.md2.rst7 -cpin pH7.5/1AKI.dry.md2.cpin -o pH7.5/1AKI.dry.md3.mdout -cpout pH7.5/1AKI.dry.md3.cpout -cprestrt pH7.5/1AKI.dry.md3.cpin -r pH7.5/1AKI.dry.md3.rst7 -inf pH7.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH7.5/1AKI.dry.md3.nc
Because replicas change pH (rather than changing structure), each output file will contain a combination of snapshots from different pH values, but the pH is automatically reassigned (based on what is in the restart file) at the beginning of the simulation. The pH ladder is automatically generated properly.
Analyzing the results
Because replicas periodically change their target pH when replica exchange attempts succeed, we need to have some way of extracting snapshots and titration data that correspond to a pH ensemble rather than a replica time series.
Reconstructing pH-based cpout files
When running replica exchange simulations, the format of the cpout file is slightly changed so that the current target pH value is printed next to each record. An example is shown below:
...
Residue 4 State: 0 pH: 3.000
Residue 5 State: 2 pH: 3.000
Residue 3 State: 3 pH: 3.000
Residue 9 State: 1 pH: 2.500
Residue 9 State: 1 pH: 2.500
Residue 7 State: 0 pH: 2.500
Residue 5 State: 2 pH: 2.500
Residue 2 State: 1 pH: 2.500
Residue 6 State: 0 pH: 2.500
...
I wrote a program in C++ that will parse the constant pH output files from Amber and compute different statistics from these output files. You can download the source code from here: https://github.com/swails/cphmd_tools (there is a "Download ZIP" option that will give you the files in a zipped folder). The README file has instructions for installing the program (it is similar to installing Amber).
After you have installed this program, cphstats, you will be able to use it like this:
bash$ cphstats --help
Usage: cphstats [-O] [-V] [-h] [-i <cpin>] [-t] [-o FILE] [-R FILE -r INT]
[--chunk INT --chunk-out FILE] [--cumulative --cumulative-out FILE]
[-v INT] [-n INT] [-p|-d] [--calcpka|--no-calcpka] [--fix-remd]
[--population FILE] [-c CONDITION -c CONDITION -c ...]
[--conditional-output FILE] [--chunk-conditional FILE]
cpout1 [cpout2 [cpout3 ...] ]
General Options:
-h, --help Print this help and exit.
-V, --version Print the version number and exit.
-O, --overwrite
Allow existing outputs to be overwritten.
--debug Print out information about the files that are
being read in and used for the calculations.
Input Files and Options:
-i FILE, --cpin FILE
Input cpin file (from sander) with titrating residue
information.
-t FLOAT, --time-step FLOAT
This is the time step in ps you used in your simulations.
It will be used to print data as a function of time.
Default is 2 fs (0.002)
Output Files:
-o FILE, --calcpka-output FILE
File to which the standard `calcpka'-type statistics
are written. Default is stdout
-R FILE, --running-avg-out FILE
Output file where the running averages for each
residue is printed. Default is [running_avgs.dat]
--chunk-out FILE
Output file where the protonated fraction calculated
over chunks of the simulation are printed.
Default is [chunk.dat]
--cumulative-out FILE
Output file where the cumulative protonated fraction
is printed. Default is [cumulative.dat]
--population FILE
Output file where protonation state populations are
printed for every state of every residue.
--conditional-output FILE
Output file with requested conditional probabilities.
Default is [conditional_prob.dat].
--chunk-conditional FILE
Prints a time series of the conditional probabilities over
a trajectory split up into chunks.
Output Options:
These options modify how the output files will appear
-v INT, --verbose INT
Controls how much information is printed to the
calcpka-style output file. Options are:
(0) Just print fraction protonated. [Default]
(1) Print everything calcpka prints.
-n INT, --interval INT
An interval between which to print out time series data
like `chunks', `cumulative' data, and running averages.
It is also used as the 'window' of the conditional
probability time series (--chunk-conditional).
Default [1000]
-p, --protonated
Print out protonation fraction instead of deprotonation
fraction in time series data (Default behavior).
-d, --deprotonated
Print out deprotonation fraction instead of protonation
fraction in time series data.
-a, --pKa Print predicted pKas (via Henderson-Hasselbalch) in place
of fraction (de)protonated. NOT default behavior.
Analysis Options:
These options control which analyses are done. By default, only
the original, calcpka-style analysis is done.
--calcpka Triggers the calcpka-style output [On by default]
--no-calcpka Turns off the calcpka-style output
-r WINDOW, --running-avg WINDOW
Defines a window size for a moving, running average
fraction protonated. <WINDOW> is the number of MD
steps (NOT the number of MC exchange attempts).
--chunk WINDOW
Computes the pKa's over a chunk of the simulation of
size <WINDOW> time steps.
--cumulative Computes the cumulative fraction protonated over the
course of the trajectory.
--fix-remd PREFIX
This option will trigger cphstats to reassemble the
titration data into pH-specific ensembles. This
is an exclusive mode of the program---no other
analyses will be done.
-c CONDITIONAL, --conditional CONDITIONAL
Evaluates conditional probabilities. CONDITIONAL should be a
string of the format:
<resid>:<state>,<resid>:<state>,...
Where <resid> is the residue number in the prmtop (NOT the
cpin) and <state> is either the state number or (p)rotonated
or (d)eprotonated, case-insensitive
This program analyzes constant pH output files (cpout) from Amber.
These output files can be compressed using gzip compression. The
compression will be detected automatically by the file name extension.
You must have the gzip headers for this functionality to work.
The first thing you must do before doing any kind of analysis is to extract the ensemble of protonation states for a single pH (since the pH is exchanged between replicas, each replica's cpout file has points from every pH we simulated). The --fix-remd flag in cphstats will re-order the cpout files.
The input cpout files should be each cpout file from a single pH-REMD simulation. If you have run 10 ns of simulation in 1 ns chunks, you will need to run this program 10 times! The number of cpout files you provide must be equal to the number of replicas you have. Each pH value will print a cpout file named prefix.pH_2.00 where prefix is the prefix string (it is the argument that follows the --fix-remd flag), and 2.00 is the pH. The pH_# suffix is added for each pH value.
For example:
cphstats --fix-remd reordered_cpouts cpout.md1.*
(where each replica has a different number following cpout.md1.).
After you create the pH-based cpout files, you can analyze these pH-specific cpout files using either cphstats or calcpka (the latter is distributed with AmberTools). cphstats is ca. twice as fast as calcpka for the same analyses, and is capable of doing everything calcpka does in addition to computing running averages with a flexible window size, cumulative averages, and 'chunk' averages (the simulation is split up into 'chunks' that are analyzed separately). (Like calcpka, cphstats also performs the same standard analysis and population dumps in exactly the same format that calcpka uses).
Python alternative to cphstats
Before writing cphstats, I wrote a Python script (remake_cpouts.py) to extract pH-based cpout files from replica-based cpout files. You can download it here. Since implementing CpHMD on GPUs, the cpout files have become so long that processing them with Python was becoming painfully slow. While there is no installation process (just download and run), it is at least 40 times slower than cphstats and does not do any analyses (it is the equivalent of just the --fix-remd flag.
Reconstructing pH-based mdcrd (trajectory) files
Like the cpout files, trajectory files will be filled with snapshots from each pH value in the ladder (unless of course you get severely limited state space mixing). To account for this, I have 'tricked' the trajectory files into thinking that the pH value is really the temperature. Therefore, you can use ptraj and/or cpptraj the same way you would for T-REMD, but using the pH value where you would normally put the temperature.
For instance, to load all of the structures from pH 2 from trajectories named 1AKI.dry.md1.mdcrd.00X (where X spans the range from 0 to NUM_REPS-1), use the following trajin line in your (cp)ptraj script:
trajin 1AKI.dry.md1.nc.000 1 2000 remdtraj remdtrajtemp 2.0
This will extract all of the snapshots with pH 2 (which is stored as a temperature of 2) from the trajectory files 1AKI.dry.md1.nc.000, 1AKI.dry.md1.nc.001, … etc.
At this point, if pH-trajectories are generated with a trajout command, then they will align with the reconstructed cpout files created in the previous section as though regular CpHMD simulations were run.
Analyzing the remlog file
The replica exchange log file has a lot of information regarding the exchange attempts for each replica. A section of a file is shown below.
# Replica Exchange log file
# numexchg is 500000
# REMD filenames:
# remlog= rem.md2.log
# remtype= rem.type
# Rep#, N_prot, old_pH, new_pH, Success rate (i,i+1)
# exchange 1
1 9 1.000 7.500 0.0000
2 3 1.500 2.000 2.0000
3 7 2.000 1.500 0.0000
4 3 2.500 3.000 2.0000
5 6 3.000 2.500 0.0000
6 2 3.500 4.000 2.0000
7 2 4.000 3.500 0.0000
8 2 4.500 4.500 0.0000
9 1 5.000 5.000 0.0000
10 6 5.500 5.500 0.0000
11 3 6.000 6.000 0.0000
12 5 6.500 6.500 0.0000
13 1 7.000 7.000 0.0000
14 11 7.500 1.000 2.0000
# exchange 2
1 9 7.500 7.000 1.0000
2 3 2.000 2.500 1.0000
3 7 1.500 1.500 1.0000
4 4 3.000 3.000 0.0000
5 6 2.500 2.000 1.0000
6 3 4.000 4.000 0.0000
7 2 3.500 3.500 1.0000
8 2 4.500 4.500 0.0000
9 1 5.000 5.500 1.0000
10 6 5.500 5.000 0.0000
11 3 6.000 6.500 1.0000
12 5 6.500 6.000 0.0000
13 1 7.000 7.500 1.0000
14 11 1.000 1.000 0.0000
[snip]
# exchange 500000
1 10 1.000 1.500 0.5211
2 3 5.500 5.000 0.5742
3 6 3.000 3.500 0.4758
4 1 7.500 7.000 0.0000
5 7 2.000 2.000 0.3579
6 5 3.500 3.000 0.5683
7 2 6.000 6.000 0.6781
8 1 6.500 6.500 0.8467
9 4 4.500 4.000 0.6855
10 1 7.000 7.500 0.9442
11 3 5.000 5.500 0.6438
12 6 2.500 2.500 0.4235
13 3 4.000 4.500 0.6458
14 10 1.500 1.000 0.3199
The columns are the replica number, followed by the number of protons present in the current set of protonation states, followed by the original pH, followed by the "new" pH (if the two pHs are the same, then the exchange attempt failed). The last column is the exchange success rate with the higher-pH replica (notice that the highest pH, 7.5, has a success rate of 0 in the final exchange because its 'higher' pH is the first replica, which has a pH of 1).
The Maths and Theory Behind pH-REM
This section will be fairly short, as it assumes a basic working knowledge of replica exchange simulations, detailed balance, and Mongan's original constant pH MD method. This will hopefully provide some insight about why things are implemented the way they are (i.e. why replicas change pH rather than change structures).
Each replica is defined within a semigrand canonical ensemble (subject to the restraints of constant temperature, pressure-or-volume, and chemical potential of hydronium). The chemical potential of hydronium is fixed by the solvent pH, and allows the number of protons on the system to vary. In this ensemble, the probability of each state being a member of the ensemble is proportional to:
(1)In Eq.(1), $x_i$ refers to the conformational degrees of freedom (the x-, y-, z-coordinates of each atom), and $N_i$ is the number of titratable protons on the system, which is directly related to the net charge of the system.
In an exchange attempt with only two replicas exchanging information independent of all other replicas, we want to evaluate the probability of accepting a proposed change in which two replicas swap pH values. Here, we define two replicas, $X$, each with the same temperature (and therefore same $\beta$), but different pH values, $pH_i$ and $pH_j$. We then want to evaluate the transition probability of
(2)In Eq.(2), notice that only the $pH$ has been exchanged. The probability of accepting this exchange must be calculated in a way that preserves detailed balance so that we continue to sample from the proper expanded ensemble. A Metropolized Monte Carlo criteria satisfies the detailed balance equation, and the transition probability becomes
(3)An important note — because we are not exchanging either the number of protons or coordinates, all terms involving the energies cancel out, so the energies do not appear in our exchange probability! Therefore, we don't have to calculate any energies, and this exchange attempt is as cheap as exchanges in T-REMD (effectively free).