pH Replica Exchange

This page describes how to run constant pH replica exchange MD (pH-REMD) in which replicas are run at the same temperature but different pH values. Note, this tutorial is for implicit (Generalized Born) solvent. After you learn how to run CpHMD in explicit solvent (click here for tutorial), you can set up each replica to run in explicit solvent instead.

Note: You must have at least version 14 of Amber and AmberTools to run pH-REMD.

# Creating the input files

Creating the input files for pH-REM simulations is identical to creating input files for normal constant pH MD (CpHMD) simulations.

For this part, I will assume that you have a PDB file from a crystal structure or similar. For the purposes of this example, I will use the crystal structure of HEWL from the PDB 1AKI. You can download it here. You then have to modify the PDB file to get rid of all of the crystallographic waters and information at the top. The result is here

This step is handled with LEaP (particularly tleap) from AmberTools. Follow the steps below to generate your topology files

1. Determine which residues you wish to titrate. Depending on which residues you wish to titrate, you must substitute titrating residues for their non-titrating versions. That is, Amber knows how to titrate the aspartate residue AS4 (an aspartate with 4 hydrogens — syn- and anti- positions on both carboxylate oxygens), but it does not know how to titrate ASP or ASH. The substitutions you must make are listed below (note some residues do not require a name change):
• ASP -> AS4
• GLU -> GL4
• HIS -> HIP
• LYS -> LYS
• TYR -> TYR
• CYS -> CYS
2. Start tleap, loading the constant pH LEaP resource file: tleap -f leaprc.constph

Starting tleap with leaprc.constph

should look something like this:

CPin created!
bash$cat 1AKI.cpin &CNSTPH CHRGDAT=-0.4157,0.2719,0.0145,0.0779,-0.0398,-0.0173,-0.0173,0.0136,-0.0425, -0.0425,0.8054,-0.8188,-0.8188,0.0,0.5973,-0.5679,0.0,0.0,0.0,-0.4157,0.2719, 0.0145,0.0779,-0.0071,0.0256,0.0256,-0.0174,0.043,0.043,0.6801,-0.5838,-0.6511, 0.4641,0.5973,-0.5679,0.0,0.0,0.0,-0.4157,0.2719,0.0145,0.0779,-0.0071,0.0256, 0.0256,-0.0174,0.043,0.043,0.6801,-0.5838,-0.6511,0.0,0.5973,-0.5679,0.4641, 0.0,0.0,-0.4157,0.2719,0.0145,0.0779,-0.0071,0.0256,0.0256,-0.0174,0.043,0.043, 0.6801,-0.6511,-0.5838,0.0,0.5973,-0.5679,0.0,0.4641,0.0,-0.4157,0.2719,0.0145, 0.0779,-0.0071,0.0256,0.0256,-0.0174,0.043,0.043,0.6801,-0.6511,-0.5838,0.0, 0.5973,-0.5679,0.0,0.0,0.4641,-0.3479,0.2747,-0.1354,0.1212,-0.0414,0.081, 0.081,-0.0012,-0.1513,0.3866,-0.017,0.2681,-0.1718,0.3911,-0.1141,0.2317, 0.7341,-0.5894,-0.3479,0.2747,-0.1354,0.1212,-0.111,0.0402,0.0402,-0.0266, -0.3811,0.3649,0.2057,0.1392,-0.5727,0.0,0.1292,0.1147,0.7341,-0.5894,-0.3479, 0.2747,-0.1354,0.1212,-0.1012,0.0367,0.0367,0.1868,-0.5432,0.0,0.1635,0.1435, -0.2795,0.3339,-0.2207,0.1862,0.7341,-0.5894,-0.4157,0.2719,0.0341,0.0864, -0.1783,-0.0122,-0.0122,0.7994,-0.8014,-0.8014,0.0,0.5973,-0.5679,0.0,0.0,0.0, -0.4157,0.2719,0.0341,0.0864,-0.0316,0.0488,0.0488,0.6462,-0.5554,-0.6376, 0.4747,0.5973,-0.5679,0.0,0.0,0.0,-0.4157,0.2719,0.0341,0.0864,-0.0316,0.0488, 0.0488,0.6462,-0.5554,-0.6376,0.0,0.5973,-0.5679,0.4747,0.0,0.0,-0.4157,0.2719, 0.0341,0.0864,-0.0316,0.0488,0.0488,0.6462,-0.6376,-0.5554,0.0,0.5973,-0.5679, 0.0,0.4747,0.0,-0.4157,0.2719,0.0341,0.0864,-0.0316,0.0488,0.0488,0.6462, -0.6376,-0.5554,0.0,0.5973,-0.5679,0.0,0.0,0.4747, PROTCNT=0,1,1,1,1,2,1,1,0,1,1,1,1, RESNAME='System: Unknown','Residue: GL4 7','Residue: HIP 15','Residue: AS4 18', 'Residue: GL4 35','Residue: AS4 48','Residue: AS4 52','Residue: AS4 66', 'Residue: AS4 87','Residue: AS4 101','Residue: AS4 119', RESSTATE=0,0,0,0,0,0,0,0,0,0, STATEINF(0)%FIRST_ATOM=102, STATEINF(0)%FIRST_CHARGE=0, STATEINF(0)%FIRST_STATE=0, STATEINF(0)%NUM_ATOMS=19, STATEINF(0)%NUM_STATES=5, STATEINF(1)%FIRST_ATOM=233, STATEINF(1)%FIRST_CHARGE=95, STATEINF(1)%FIRST_STATE=5, STATEINF(1)%NUM_ATOMS=18, STATEINF(1)%NUM_STATES=3, STATEINF(2)%FIRST_ATOM=277, STATEINF(2)%FIRST_CHARGE=149, STATEINF(2)%FIRST_STATE=8, STATEINF(2)%NUM_ATOMS=16, STATEINF(2)%NUM_STATES=5, STATEINF(3)%FIRST_ATOM=543, STATEINF(3)%FIRST_CHARGE=0, STATEINF(3)%FIRST_STATE=0, STATEINF(3)%NUM_ATOMS=19, STATEINF(3)%NUM_STATES=5, STATEINF(4)%FIRST_ATOM=742, STATEINF(4)%FIRST_CHARGE=149, STATEINF(4)%FIRST_STATE=8, STATEINF(4)%NUM_ATOMS=16, STATEINF(4)%NUM_STATES=5, STATEINF(5)%FIRST_ATOM=790, STATEINF(5)%FIRST_CHARGE=149, STATEINF(5)%FIRST_STATE=8, STATEINF(5)%NUM_ATOMS=16, STATEINF(5)%NUM_STATES=5, STATEINF(6)%FIRST_ATOM=1029, STATEINF(6)%FIRST_CHARGE=149, STATEINF(6)%FIRST_STATE=8, STATEINF(6)%NUM_ATOMS=16, STATEINF(6)%NUM_STATES=5, STATEINF(7)%FIRST_ATOM=1327, STATEINF(7)%FIRST_CHARGE=149, STATEINF(7)%FIRST_STATE=8, STATEINF(7)%NUM_ATOMS=16, STATEINF(7)%NUM_STATES=5, STATEINF(8)%FIRST_ATOM=1537, STATEINF(8)%FIRST_CHARGE=149, STATEINF(8)%FIRST_STATE=8, STATEINF(8)%NUM_ATOMS=16, STATEINF(8)%NUM_STATES=5, STATEINF(9)%FIRST_ATOM=1811, STATEINF(9)%FIRST_CHARGE=149, STATEINF(9)%FIRST_STATE=8, STATEINF(9)%NUM_ATOMS=16, STATEINF(9)%NUM_STATES=5, STATENE=0,14.4542090223,14.4542090223,14.4542090223,14.4542090223,0, -11.7770114534,-16.3478974337,0,32.3880313021,32.3880313021,32.3880313021, 32.3880313021, TRESCNT=10, /  # Simulation Setup At this point, we have our topology file (prmtop), input coordinate file (inpcrd) and constant pH input file (cpin), so we can set up our simulation. We will follow the 3 standard steps: minimization, heating, and equilibration (note the apparent misnomer — we're not actually simulating to equilibrium, we're really relaxing our structure). All steps here can be run using sander, pmemd, or pmemd.cuda (I implemented CpHMD in implicit solvent along with pH-REMD on GPUs) ## Minimization Our first step is to minimize our structure. I typically like to restrain the backbone of the system. A sample input that restrains the backbone with a force constant of 10 kcal/mol/A^2 is shown below: Minimize for constant pH REM simulations &cntrl ntb=0, imin=1, igb=2, ntr=1, maxcyc=1000, cut=1000.0, restraint_wt=10.0, restraintmask='@CA,C,O,N', /  ## Heating Our next step is to heat our structure. I typically keep restraints on the backbone like I did with the minimization, although the restraints are notably weaker. I also use the nmropt=1 facility to heat the system slowly and linearly throughout the first 2/3 of the simulation. A sample input file for the heating step is shown below: Heating step for constant pH REM simulations &cntrl nstlim=500000, ntc=2, ig=-1, ntb=0, igb=2, ntr=1, ntt=3, ntwr=10000, cut=1000.0, dt=0.002, ntwx=1000, ntf=2, restraint_wt=1.0, ntpr=1000, gamma_ln=5.0, ioutfm=1, nmropt=1, restraintmask='@CA,C,O,N', / &wt TYPE='TEMP0', ISTEP1=0, ISTEP2=333333 VALUE1=10.00, VALUE2=300.00, / &wt TYPE='END' /  ## Equilibration (Relaxation) Our next step is to 'equilibrate' our system. In this step, for the first time, we use our cpin file that we generated above. We use this file because it will make sander assign the appropriate charges for each protonation state. Whether or not you allow the protonation states to change (controlled by setting ntcnstph lower than nstlim to allow changes or setting ntcnstph greater than nstlim to prevent changes) is a matter of personal preference. It should not matter for your final answer. In this example, I do not allow exchanges, because it allows me to use the same ending structure for each pH I simulate at later. I still keep some restraints on the backbone, but in this step they are now very weak (0.1 kcal/mol/A^2). A sample input is shown below: Equilibration (relaxation) mdin for pH-REM, no protonation changes &cntrl icnstph=1, dt=0.002, ioutfm=1, nstlim=1000000, ig=-1, ntb=0, ntwr=10000, ntwx=1000, irest=1, cut=1000.0, ntcnstph=1000000, ntpr=1000, ntx=5, saltcon=0.1, restraint_wt=0.1, ntr=1, ntt=3, ntc=2, ntf=2, restraintmask='@CA,C,O,N', gamma_ln=10.0, igb=2, /  Note that at this point, we only have a single structure. This will serve as the starting point for each of our replicas next. # Setting up for pH-REMD At this point, we have a single, 'equilibrated' structure, and we wish to start a pH ladder of replicas. To run replica exchange simulations, we need a groupfile, the -rem and -remlog command-line flags, and the variables nstlim and numexchg in the &cntrl namelist of the input files. ## Input (mdin) files Each replica needs its own input file (because each replica runs with a different pH value). A couple notes about these input files: • nstlim now means something different. In replica exchange, nstlim is the number of steps between exchange attempts. • numexchg is the number of times to attempt exchanging. Therefore, the total number of steps in a simulation can be found by multiplying nstlim and numexchg together. • icnstph is the flag to turn on constant pH simulations. • solvph is the solvent pH — this should be different for each replica • ntcnstph is the number of steps between attempting protonation state changes. While it should work for nstlim to be smaller than ntcnstph, I would suggest always having ntcnstph smaller than nstlim. • saltcon is the salt concentration. Because the reference compounds were calculated using a salt concentration of 0.1 M, you should make sure that is present here as well. An example input file (for one replica) is shown below: REM for CpH &cntrl icnstph=1, dt=0.002, ioutfm=1, ntxo=2, nstlim=100, ig=-1, ntb=0, numexchg=5000, ntwr=10000, ntwx=1000, irest=1, cut=1000.0, ntcnstph=5, ntpr=1000, ntx=5, solvph=2, saltcon=0.1, ntt=3, ntc=2, ntf=2, gamma_ln=10.0, igb=2, /  In this step, we are attempting replica exchanges every 100 steps (200 fs) and attempting protonation state changes every 5 steps (10 fs). Note this is the sample input file for the replica running with pH set to 2. ## The groupfile This input file defines all of the replicas. In a groupfile, each line looks like a separate sequence of command-line flags for a separate instance of sander (or pmemd). A couple notes: • Each line should contain the -rem 4 and -remlog rem.log options (where rem.log is the file you want the replica exchange information is stored). • Each line should have a different input file, but the same topology file. The input coordinate file can differ (and will for restarts), or it can be the same. • There must be an even number of replicas so that each replica has a partner • The ordering of the replicas in the groupfile does not matter. A pH ladder (much like a temperature) ladder is defined, and exchange partners are chosen that way. An example groupfile is shown below: # pH 2 -O -i pH2/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH2/1AKI.dry.md1.rst7 -cpin pH2/1AKI.dry.md1.cpin -o pH2/1AKI.dry.md2.mdout -cpout pH2/1AKI.dry.md2.cpout -cprestrt pH2/1AKI.dry.md2.cpin -r pH2/1AKI.dry.md2.rst7 -inf pH2/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH2/1AKI.dry.md2.nc # pH 2.5 -O -i pH2.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH2.5/1AKI.dry.md1.rst7 -cpin pH2.5/1AKI.dry.md1.cpin -o pH2.5/1AKI.dry.md2.mdout -cpout pH2.5/1AKI.dry.md2.cpout -cprestrt pH2.5/1AKI.dry.md2.cpin -r pH2.5/1AKI.dry.md2.rst7 -inf pH2.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH2.5/1AKI.dry.md2.nc # pH 3 -O -i pH3/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH3/1AKI.dry.md1.rst7 -cpin pH3/1AKI.dry.md1.cpin -o pH3/1AKI.dry.md2.mdout -cpout pH3/1AKI.dry.md2.cpout -cprestrt pH3/1AKI.dry.md2.cpin -r pH3/1AKI.dry.md2.rst7 -inf pH3/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH3/1AKI.dry.md2.nc # pH 3.5 -O -i pH3.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH3.5/1AKI.dry.md1.rst7 -cpin pH3.5/1AKI.dry.md1.cpin -o pH3.5/1AKI.dry.md2.mdout -cpout pH3.5/1AKI.dry.md2.cpout -cprestrt pH3.5/1AKI.dry.md2.cpin -r pH3.5/1AKI.dry.md2.rst7 -inf pH3.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH3.5/1AKI.dry.md2.nc # pH 4 -O -i pH4/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH4/1AKI.dry.md1.rst7 -cpin pH4/1AKI.dry.md1.cpin -o pH4/1AKI.dry.md2.mdout -cpout pH4/1AKI.dry.md2.cpout -cprestrt pH4/1AKI.dry.md2.cpin -r pH4/1AKI.dry.md2.rst7 -inf pH4/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH4/1AKI.dry.md2.nc # pH 4.5 -O -i pH4.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH4.5/1AKI.dry.md1.rst7 -cpin pH4.5/1AKI.dry.md1.cpin -o pH4.5/1AKI.dry.md2.mdout -cpout pH4.5/1AKI.dry.md2.cpout -cprestrt pH4.5/1AKI.dry.md2.cpin -r pH4.5/1AKI.dry.md2.rst7 -inf pH4.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH4.5/1AKI.dry.md2.nc # pH 5 -O -i pH5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH5/1AKI.dry.md1.rst7 -cpin pH5/1AKI.dry.md1.cpin -o pH5/1AKI.dry.md2.mdout -cpout pH5/1AKI.dry.md2.cpout -cprestrt pH5/1AKI.dry.md2.cpin -r pH5/1AKI.dry.md2.rst7 -inf pH5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH5/1AKI.dry.md2.nc # pH 5.5 -O -i pH5.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH5.5/1AKI.dry.md1.rst7 -cpin pH5.5/1AKI.dry.md1.cpin -o pH5.5/1AKI.dry.md2.mdout -cpout pH5.5/1AKI.dry.md2.cpout -cprestrt pH5.5/1AKI.dry.md2.cpin -r pH5.5/1AKI.dry.md2.rst7 -inf pH5.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH5.5/1AKI.dry.md2.nc # pH 6 -O -i pH6/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH6/1AKI.dry.md1.rst7 -cpin pH6/1AKI.dry.md1.cpin -o pH6/1AKI.dry.md2.mdout -cpout pH6/1AKI.dry.md2.cpout -cprestrt pH6/1AKI.dry.md2.cpin -r pH6/1AKI.dry.md2.rst7 -inf pH6/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH6/1AKI.dry.md2.nc # pH 6.5 -O -i pH6.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH6.5/1AKI.dry.md1.rst7 -cpin pH6.5/1AKI.dry.md1.cpin -o pH6.5/1AKI.dry.md2.mdout -cpout pH6.5/1AKI.dry.md2.cpout -cprestrt pH6.5/1AKI.dry.md2.cpin -r pH6.5/1AKI.dry.md2.rst7 -inf pH6.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH6.5/1AKI.dry.md2.nc # pH 7 -O -i pH7/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH7/1AKI.dry.md1.rst7 -cpin pH7/1AKI.dry.md1.cpin -o pH7/1AKI.dry.md2.mdout -cpout pH7/1AKI.dry.md2.cpout -cprestrt pH7/1AKI.dry.md2.cpin -r pH7/1AKI.dry.md2.rst7 -inf pH7/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH7/1AKI.dry.md2.nc # pH 7.5 -O -i pH7.5/1AKI.dry.md2.mdin -p 1AKI.dry.parm7 -c pH7.5/1AKI.dry.md1.rst7 -cpin pH7.5/1AKI.dry.md1.cpin -o pH7.5/1AKI.dry.md2.mdout -cpout pH7.5/1AKI.dry.md2.cpout -cprestrt pH7.5/1AKI.dry.md2.cpin -r pH7.5/1AKI.dry.md2.rst7 -inf pH7.5/1AKI.dry.md2.mdinfo -rem 4 -remlog rem.md2.log -x pH7.5/1AKI.dry.md2.nc  # Restarting simulations You often want to restart simulations. For instance, it's much easier to run 10 simulations of 2 ns each rather than a single 20 ns simulation. In this case, restarting simulations can be done by using the restart file from each replica as the input coordinate file for that same replica in the restart. Likewise, the cprestrt file from each replica becomes the cpin file for that same replica in the restart. The pH is automatically adjusted based on the value stored in the restart file (analogous to what happens for T-REMD). For example, the groupfile for the next simulation following the groupfile above is shown below: # pH 2 -O -i pH2/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH2/1AKI.dry.md2.rst7 -cpin pH2/1AKI.dry.md2.cpin -o pH2/1AKI.dry.md3.mdout -cpout pH2/1AKI.dry.md3.cpout -cprestrt pH2/1AKI.dry.md3.cpin -r pH2/1AKI.dry.md3.rst7 -inf pH2/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH2/1AKI.dry.md3.nc # pH 2.5 -O -i pH2.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH2.5/1AKI.dry.md2.rst7 -cpin pH2.5/1AKI.dry.md2.cpin -o pH2.5/1AKI.dry.md3.mdout -cpout pH2.5/1AKI.dry.md3.cpout -cprestrt pH2.5/1AKI.dry.md3.cpin -r pH2.5/1AKI.dry.md3.rst7 -inf pH2.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH2.5/1AKI.dry.md3.nc # pH 3 -O -i pH3/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH3/1AKI.dry.md2.rst7 -cpin pH3/1AKI.dry.md2.cpin -o pH3/1AKI.dry.md3.mdout -cpout pH3/1AKI.dry.md3.cpout -cprestrt pH3/1AKI.dry.md3.cpin -r pH3/1AKI.dry.md3.rst7 -inf pH3/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH3/1AKI.dry.md3.nc # pH 3.5 -O -i pH3.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH3.5/1AKI.dry.md2.rst7 -cpin pH3.5/1AKI.dry.md2.cpin -o pH3.5/1AKI.dry.md3.mdout -cpout pH3.5/1AKI.dry.md3.cpout -cprestrt pH3.5/1AKI.dry.md3.cpin -r pH3.5/1AKI.dry.md3.rst7 -inf pH3.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH3.5/1AKI.dry.md3.nc # pH 4 -O -i pH4/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH4/1AKI.dry.md2.rst7 -cpin pH4/1AKI.dry.md2.cpin -o pH4/1AKI.dry.md3.mdout -cpout pH4/1AKI.dry.md3.cpout -cprestrt pH4/1AKI.dry.md3.cpin -r pH4/1AKI.dry.md3.rst7 -inf pH4/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH4/1AKI.dry.md3.nc # pH 4.5 -O -i pH4.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH4.5/1AKI.dry.md2.rst7 -cpin pH4.5/1AKI.dry.md2.cpin -o pH4.5/1AKI.dry.md3.mdout -cpout pH4.5/1AKI.dry.md3.cpout -cprestrt pH4.5/1AKI.dry.md3.cpin -r pH4.5/1AKI.dry.md3.rst7 -inf pH4.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH4.5/1AKI.dry.md3.nc # pH 5 -O -i pH5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH5/1AKI.dry.md2.rst7 -cpin pH5/1AKI.dry.md2.cpin -o pH5/1AKI.dry.md3.mdout -cpout pH5/1AKI.dry.md3.cpout -cprestrt pH5/1AKI.dry.md3.cpin -r pH5/1AKI.dry.md3.rst7 -inf pH5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH5/1AKI.dry.md3.nc # pH 5.5 -O -i pH5.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH5.5/1AKI.dry.md2.rst7 -cpin pH5.5/1AKI.dry.md2.cpin -o pH5.5/1AKI.dry.md3.mdout -cpout pH5.5/1AKI.dry.md3.cpout -cprestrt pH5.5/1AKI.dry.md3.cpin -r pH5.5/1AKI.dry.md3.rst7 -inf pH5.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH5.5/1AKI.dry.md3.nc # pH 6 -O -i pH6/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH6/1AKI.dry.md2.rst7 -cpin pH6/1AKI.dry.md2.cpin -o pH6/1AKI.dry.md3.mdout -cpout pH6/1AKI.dry.md3.cpout -cprestrt pH6/1AKI.dry.md3.cpin -r pH6/1AKI.dry.md3.rst7 -inf pH6/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH6/1AKI.dry.md3.nc # pH 6.5 -O -i pH6.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH6.5/1AKI.dry.md2.rst7 -cpin pH6.5/1AKI.dry.md2.cpin -o pH6.5/1AKI.dry.md3.mdout -cpout pH6.5/1AKI.dry.md3.cpout -cprestrt pH6.5/1AKI.dry.md3.cpin -r pH6.5/1AKI.dry.md3.rst7 -inf pH6.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH6.5/1AKI.dry.md3.nc # pH 7 -O -i pH7/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH7/1AKI.dry.md2.rst7 -cpin pH7/1AKI.dry.md2.cpin -o pH7/1AKI.dry.md3.mdout -cpout pH7/1AKI.dry.md3.cpout -cprestrt pH7/1AKI.dry.md3.cpin -r pH7/1AKI.dry.md3.rst7 -inf pH7/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH7/1AKI.dry.md3.nc # pH 7.5 -O -i pH7.5/1AKI.dry.md3.mdin -p 1AKI.dry.parm7 -c pH7.5/1AKI.dry.md2.rst7 -cpin pH7.5/1AKI.dry.md2.cpin -o pH7.5/1AKI.dry.md3.mdout -cpout pH7.5/1AKI.dry.md3.cpout -cprestrt pH7.5/1AKI.dry.md3.cpin -r pH7.5/1AKI.dry.md3.rst7 -inf pH7.5/1AKI.dry.md3.mdinfo -rem 4 -remlog rem.md3.log -x pH7.5/1AKI.dry.md3.nc  Because replicas change pH (rather than changing structure), each output file will contain a combination of snapshots from different pH values, but the pH is automatically reassigned (based on what is in the restart file) at the beginning of the simulation. The pH ladder is automatically generated properly. # Analyzing the results Because replicas periodically change their target pH when replica exchange attempts succeed, we need to have some way of extracting snapshots and titration data that correspond to a pH ensemble rather than a replica time series. ## Reconstructing pH-based cpout files When running replica exchange simulations, the format of the cpout file is slightly changed so that the current target pH value is printed next to each record. An example is shown below: ... Residue 4 State: 0 pH: 3.000 Residue 5 State: 2 pH: 3.000 Residue 3 State: 3 pH: 3.000 Residue 9 State: 1 pH: 2.500 Residue 9 State: 1 pH: 2.500 Residue 7 State: 0 pH: 2.500 Residue 5 State: 2 pH: 2.500 Residue 2 State: 1 pH: 2.500 Residue 6 State: 0 pH: 2.500 ...  I wrote a program in C++ that will parse the constant pH output files from Amber and compute different statistics from these output files. You can download the source code from here: https://github.com/swails/cphmd_tools (there is a "Download ZIP" option that will give you the files in a zipped folder). The README file has instructions for installing the program (it is similar to installing Amber). After you have installed this program, cphstats, you will be able to use it like this: bash$ cphstats --help
Usage: cphstats [-O] [-V] [-h] [-i <cpin>] [-t] [-o FILE] [-R FILE -r INT]
[--chunk INT --chunk-out FILE] [--cumulative --cumulative-out FILE]
[-v INT] [-n INT] [-p|-d] [--calcpka|--no-calcpka] [--fix-remd]
[--population FILE] [-c CONDITION -c CONDITION -c ...]
[--conditional-output FILE] [--chunk-conditional FILE]
cpout1 [cpout2 [cpout3 ...] ]

General Options:
-h, --help     Print this help and exit.
-V, --version  Print the version number and exit.
-O, --overwrite
Allow existing outputs to be overwritten.
--debug        Print out information about the files that are
being read in and used for the calculations.

Input Files and Options:
-i FILE, --cpin FILE
Input cpin file (from sander) with titrating residue
information.
-t FLOAT, --time-step FLOAT
This is the time step in ps you used in your simulations.
It will be used to print data as a function of time.
Default is 2 fs (0.002)

Output Files:
-o FILE, --calcpka-output FILE
File to which the standard calcpka'-type statistics
are written. Default is stdout
-R FILE, --running-avg-out FILE
Output file where the running averages for each
residue is printed. Default is [running_avgs.dat]
--chunk-out FILE
Output file where the protonated fraction calculated
over chunks of the simulation are printed.
Default is [chunk.dat]
--cumulative-out FILE
Output file where the cumulative protonated fraction
is printed. Default is [cumulative.dat]
--population FILE
Output file where protonation state populations are
printed for every state of every residue.
--conditional-output FILE
Output file with requested conditional probabilities.
Default is [conditional_prob.dat].
--chunk-conditional FILE
Prints a time series of the conditional probabilities over
a trajectory split up into chunks.

Output Options:
These options modify how the output files will appear

-v INT, --verbose INT
Controls how much information is printed to the
calcpka-style output file. Options are:
(0) Just print fraction protonated. [Default]
(1) Print everything calcpka prints.
-n INT, --interval INT
An interval between which to print out time series data
like chunks', cumulative' data, and running averages.
It is also used as the 'window' of the conditional
probability time series (--chunk-conditional).
Default [1000]
-p, --protonated
Print out protonation fraction instead of deprotonation
fraction in time series data (Default behavior).
-d, --deprotonated
Print out deprotonation fraction instead of protonation
fraction in time series data.
-a, --pKa      Print predicted pKas (via Henderson-Hasselbalch) in place
of fraction (de)protonated. NOT default behavior.

Analysis Options:
These options control which analyses are done. By default, only
the original, calcpka-style analysis is done.

--calcpka      Triggers the calcpka-style output [On by default]
--no-calcpka   Turns off the calcpka-style output
-r WINDOW, --running-avg WINDOW
Defines a window size for a moving, running average
fraction protonated. <WINDOW> is the number of MD
steps (NOT the number of MC exchange attempts).
--chunk WINDOW
Computes the pKa's over a chunk of the simulation of
size <WINDOW> time steps.
--cumulative   Computes the cumulative fraction protonated over the
course of the trajectory.
--fix-remd PREFIX
This option will trigger cphstats to reassemble the
titration data into pH-specific ensembles. This
is an exclusive mode of the program---no other
analyses will be done.
-c CONDITIONAL, --conditional CONDITIONAL
Evaluates conditional probabilities. CONDITIONAL should be a
string of the format:
<resid>:<state>,<resid>:<state>,...
Where <resid> is the residue number in the prmtop (NOT the
cpin) and <state> is either the state number or (p)rotonated
or (d)eprotonated, case-insensitive

This program analyzes constant pH output files (cpout) from Amber.
These output files can be compressed using gzip compression. The
compression will be detected automatically by the file name extension.
You must have the gzip headers for this functionality to work.


The first thing you must do before doing any kind of analysis is to extract the ensemble of protonation states for a single pH (since the pH is exchanged between replicas, each replica's cpout file has points from every pH we simulated). The --fix-remd flag in cphstats will re-order the cpout files.

The input cpout files should be each cpout file from a single pH-REMD simulation. If you have run 10 ns of simulation in 1 ns chunks, you will need to run this program 10 times! The number of cpout files you provide must be equal to the number of replicas you have. Each pH value will print a cpout file named prefix.pH_2.00 where prefix is the prefix string (it is the argument that follows the --fix-remd flag), and 2.00 is the pH. The pH_# suffix is added for each pH value.

For example:

cphstats --fix-remd reordered_cpouts cpout.md1.*

(where each replica has a different number following cpout.md1.).

After you create the pH-based cpout files, you can analyze these pH-specific cpout files using either cphstats or calcpka (the latter is distributed with AmberTools). cphstats is ca. twice as fast as calcpka for the same analyses, and is capable of doing everything calcpka does in addition to computing running averages with a flexible window size, cumulative averages, and 'chunk' averages (the simulation is split up into 'chunks' that are analyzed separately). (Like calcpka, cphstats also performs the same standard analysis and population dumps in exactly the same format that calcpka uses).

### Python alternative to cphstats

Before writing cphstats, I wrote a Python script (remake_cpouts.py) to extract pH-based cpout files from replica-based cpout files. You can download it here. Since implementing CpHMD on GPUs, the cpout files have become so long that processing them with Python was becoming painfully slow. While there is no installation process (just download and run), it is at least 40 times slower than cphstats and does not do any analyses (it is the equivalent of just the --fix-remd flag.

## Reconstructing pH-based mdcrd (trajectory) files

Like the cpout files, trajectory files will be filled with snapshots from each pH value in the ladder (unless of course you get severely limited state space mixing). To account for this, I have 'tricked' the trajectory files into thinking that the pH value is really the temperature. Therefore, you can use ptraj and/or cpptraj the same way you would for T-REMD, but using the pH value where you would normally put the temperature.

For instance, to load all of the structures from pH 2 from trajectories named 1AKI.dry.md1.mdcrd.00X (where X spans the range from 0 to NUM_REPS-1), use the following trajin line in your (cp)ptraj script:

trajin 1AKI.dry.md1.nc.000 1 2000 remdtraj remdtrajtemp 2.0

This will extract all of the snapshots with pH 2 (which is stored as a temperature of 2) from the trajectory files 1AKI.dry.md1.nc.000, 1AKI.dry.md1.nc.001, … etc.

At this point, if pH-trajectories are generated with a trajout command, then they will align with the reconstructed cpout files created in the previous section as though regular CpHMD simulations were run.

## Analyzing the remlog file

The replica exchange log file has a lot of information regarding the exchange attempts for each replica. A section of a file is shown below.

# Replica Exchange log file
# numexchg is     500000
# REMD filenames:
#   remlog= rem.md2.log
#   remtype= rem.type
# Rep#, N_prot, old_pH, new_pH, Success rate (i,i+1)
# exchange        1
1       9   1.000  7.500   0.0000
2       3   1.500  2.000   2.0000
3       7   2.000  1.500   0.0000
4       3   2.500  3.000   2.0000
5       6   3.000  2.500   0.0000
6       2   3.500  4.000   2.0000
7       2   4.000  3.500   0.0000
8       2   4.500  4.500   0.0000
9       1   5.000  5.000   0.0000
10       6   5.500  5.500   0.0000
11       3   6.000  6.000   0.0000
12       5   6.500  6.500   0.0000
13       1   7.000  7.000   0.0000
14      11   7.500  1.000   2.0000
# exchange        2
1       9   7.500  7.000   1.0000
2       3   2.000  2.500   1.0000
3       7   1.500  1.500   1.0000
4       4   3.000  3.000   0.0000
5       6   2.500  2.000   1.0000
6       3   4.000  4.000   0.0000
7       2   3.500  3.500   1.0000
8       2   4.500  4.500   0.0000
9       1   5.000  5.500   1.0000
10       6   5.500  5.000   0.0000
11       3   6.000  6.500   1.0000
12       5   6.500  6.000   0.0000
13       1   7.000  7.500   1.0000
14      11   1.000  1.000   0.0000

[snip]

# exchange   500000
1      10   1.000  1.500   0.5211
2       3   5.500  5.000   0.5742
3       6   3.000  3.500   0.4758
4       1   7.500  7.000   0.0000
5       7   2.000  2.000   0.3579
6       5   3.500  3.000   0.5683
7       2   6.000  6.000   0.6781
8       1   6.500  6.500   0.8467
9       4   4.500  4.000   0.6855
10       1   7.000  7.500   0.9442
11       3   5.000  5.500   0.6438
12       6   2.500  2.500   0.4235
13       3   4.000  4.500   0.6458
14      10   1.500  1.000   0.3199
`

The columns are the replica number, followed by the number of protons present in the current set of protonation states, followed by the original pH, followed by the "new" pH (if the two pHs are the same, then the exchange attempt failed). The last column is the exchange success rate with the higher-pH replica (notice that the highest pH, 7.5, has a success rate of 0 in the final exchange because its 'higher' pH is the first replica, which has a pH of 1).

# The Maths and Theory Behind pH-REM

This section will be fairly short, as it assumes a basic working knowledge of replica exchange simulations, detailed balance, and Mongan's original constant pH MD method. This will hopefully provide some insight about why things are implemented the way they are (i.e. why replicas change pH rather than change structures).

Each replica is defined within a semigrand canonical ensemble (subject to the restraints of constant temperature, pressure-or-volume, and chemical potential of hydronium). The chemical potential of hydronium is fixed by the solvent pH, and allows the number of protons on the system to vary. In this ensemble, the probability of each state being a member of the ensemble is proportional to:

(1)
\begin{align} P(x_i, N_i) \propto \exp \left[ -\beta \left ( H(x_i, N_i) - kT\ln(10) \cdot pH \cdot N_i \right) \right] \end{align}

In Eq.(1), $x_i$ refers to the conformational degrees of freedom (the x-, y-, z-coordinates of each atom), and $N_i$ is the number of titratable protons on the system, which is directly related to the net charge of the system.

In an exchange attempt with only two replicas exchanging information independent of all other replicas, we want to evaluate the probability of accepting a proposed change in which two replicas swap pH values. Here, we define two replicas, $X$, each with the same temperature (and therefore same $\beta$), but different pH values, $pH_i$ and $pH_j$. We then want to evaluate the transition probability of

(2)
\begin{align} \left( X(x_i, N_i, pH_i) , X(x_j, N_j, pH_j) \right) \rightarrow \left( X(x_i, N_i, pH_j), X(x_j, N_j, pH_i) \right) \end{align}

In Eq.(2), notice that only the $pH$ has been exchanged. The probability of accepting this exchange must be calculated in a way that preserves detailed balance so that we continue to sample from the proper expanded ensemble. A Metropolized Monte Carlo criteria satisfies the detailed balance equation, and the transition probability becomes

(3)
\begin{align} P_{i \rightarrow j} = \min \left\lbrace 1, \exp \left[ -\ln(10) \left( N_i - N_j \right) \left( pH_j - pH_i \right) \right] \right\rbrace \end{align}

An important note — because we are not exchanging either the number of protons or coordinates, all terms involving the energies cancel out, so the energies do not appear in our exchange probability! Therefore, we don't have to calculate any energies, and this exchange attempt is as cheap as exchanges in T-REMD (effectively free).