All of these scripts have included a command-line help function that is accessible through
Please feel free to add scripts to this page if you think they are helpful (but please give it a --help flag to explain its usage). Also list any prerequisites to using those scripts, including the lowest Python version that's necessary as well as any other Python modules you need to download with it. Required Python modules should be placed either in the same directory as the main script or in a directory included in the PYTHONPATH environment variable.
Any of the scripts will either have to be executed with the interpreter directly (python or bash), or made executable (chmod +x <script_name> before you can run it.
Table of Contents
This script will take a data set and construct histograms from it. It selects "smart" default bin ranges and number of bins. The bin range is determined from the maximum and minimum of the data set, and selects the number of bins according to Scott's Choice. The bin range is from one integer value to another, so if you need finer graining than integer-to-integer, you'll need to supply your own bin range or change
binrange = math.floor(xmaxminholder) binrange = math.ceil(xmaxminholder)
binrange = xmaxminholder binrange = xmaxminholder
in the script. You can also normalize the histograms, optionally, so that the function will integrate to 1.
Required Modules: Download utilities.py
Python Version: 2.6 or 2.7
Usage: 1Dbinning.py [options] Options: -h, --help show this help message and exit -f DATA_FILE, --file=DATA_FILE Input data file -o OUTPUT_FILE, --output=OUTPUT_FILE Binned output file -b BINS, --bins=BINS Number of bins to use -r BINRANGE, --binrange=BINRANGE Range of bins to use -n, --normalize Normalize the histograms -g SCRIPT_NAME, --gnuplot=SCRIPT_NAME Gnuplot script name -d DELIMITER, --delimiter=DELIMITER The column delimiter (defaults to any kind of whitespace) -c COLUMN, --column=COLUMN Which column to pull the data from
This script works just like 1Dbinning.py above, but creates bins in 2 dimensions. It selects "smart" default bin ranges and number of bins. The bin range is determined from the maximum and minimum of the data set in both dimensions, and selects the number of bins according to Scott's Choice. You can also normalize the histograms.
Required Modules: Download utilities.py
Python Version: 2.4 — 2.7
Usage: 2Dbinning.py [options] Options: -h, --help show this help message and exit -f DATA_FILE, --file=DATA_FILE Input data file -o OUTPUT_FILE, --output=OUTPUT_FILE Binned output file -b BINS, --bins=BINS Number of bins [NxM] -x XBINRANGE, --xbinrange=XBINRANGE Range for bins in the x-direction -y YBINRANGE, --ybinrange=YBINRANGE Range for bins in the y-direction -n, --normalize Normalize the histograms -g GNUPLOT, --gnuplot=GNUPLOT Name of gnuplot script to write. -c COLUMNS, --columns=COLUMNS Which two columns to analyze [##,##] starting from 1 -d DELIMITER, --delimiter=DELIMITER The string/character that delimits fields in the input data file
This script will load up all of the data in a series of Amber mdout files and allow you to immediately plot any of the data. It allows you to do a number of things with the data:
- Plot the raw data
- Save the raw data to a file
- Calculate the average and standard deviation
- Histogram the data
- Plot the normalized autocorrelation function
Prerequisites: numpy, matplotlib, Tkinter
Python Version: 2.4 — 2.7
mdout_analyzer.py is now a part of AmberTools. (It used to be called MdoutAnalyzer.py in AmberTools 13).
Launch the script on the command-line either by using:
mdout_analyzer.py [mdout1] [mdout2] … [mdoutN]
If you provide no mdout files, you will be prompted for one as soon as you launch the program.
Every energy component printed to the mdout file is assigned to a button. Clicking that button 'activates' that data set for any action that you apply to it. After you have selected the data sets you want to analyze, you can click any of the 'Actions' ('Graph Them!', 'Save to File', 'Show Statistics', 'Histogram Data', 'Autocorrelation'). See the image below:
After selecting however many data sets you want, you can do one of 5 things with them — the image below shows the window with EPtot and EELEC chosen.
You can modify the graph appearance (e.g., add a title, legend, lines, points, etc.) via the "Graph Options" menu at the top of the GUI.
Note, the X-coordinate for all of the analyses is either the TIME(PS) data if present, or a simple enumeration of the number of data points (except for Histogramming, of course, where the X-coordinate is the data you are histogramming and the Y- is either the probability or frequency).
The 5 actions are demonstrated in the sections below
This graphs all of the selected data sets side-by-side. An example graph is shown below. You can add a title to the graph as well as axis labels via the "Graph Options" menu at the top of the main window. Use the buttons at the bottom of the graph window to save the image to a PNG file or zoom in on a portion of the plot.
Save to File
This brings up a window to allow you pick the file name to save your data to. The data will be printed with one data set in each column where the first column just enumerates the data. If the file name ends in .csv, a comma-separated-value file will be printed. Excel (and other spreadsheets) natively read in this data format. Otherwise, an ASCII file will be printed with 20-character columns per data set.
This brings up a text window in which the average and standard deviation is printed for each data set selected. You can save this to a text file if you want using the "File" menu at the top of this window. See the image below as an example:
This option will histogram the data and plot those histograms. The default bin width comes from Scott's choice. Otherwise, you can control the bins manually with the "Histogram Bins" option in the "Graph Options". If you put a positive number, that will be taken as the bin width. If you put a negative number, that will be taken as the number of bins you want. (It is not currently possible to provide different bin characteristics to different data sets in the same plot).
You can choose to normalize the data or not via the option in "Graph Options".
An example figure is shown below:
This option will plot the normalized autocorrelation function of the given data sets. In this case, the X-coordinate effectively becomes the 'lag' parameter. By default, the autocorrelation is calculated out to the maximum lag (e.g., out to the total number of frames provided). It is probably best to only consider lags out to half the size of your data set.
An example figure is shown below:
This script will take a NetCDF trajectory and extract restart files from it. If velocity information is present in that trajectory (because you set ntwv = -1), it will also dump that velocity information into the restart file as well (that part I don't think ptraj or cpptraj does yet).
Prerequisites: ScientificPython, numpy
Python Version: 2.4 — 2.7
These prerequisites are typically available from package manager. For Ubuntu, these packages are python-scientific and python-numpy, although numpy is a dependency of ScientificPython. See the webpage for ScientificPython here
Usage: get_restarts.py [options] Options: -h, --help show this help message and exit -y FILE, --netcdf=FILE Input NetCDF trajectory file to generate restarts from -r FILE, --restrt=FILE Name for output restart file. If multiple files are created, a .# suffix will be added to this file name where # is the frame number being dumped. Single Restart File: This option will print a single restart file from the desired frame of the trajectory -f INT, --frame=INT Frame number to convert to restart file Multiple Restart Files: These options will print multiple restart files from the desired frames of the trajectory -s INT, --start-frame=INT First frame to turn into a restart -e INT, --end-frame=INT Last frame to turn into a restart -i INT, --interval=INT Number of frames to skip between consecutive restart files created. This script will extract restart files from NetCDF trajectory files and preserve any velocity information that may be present.
This is technically a program, written in C++ (with a Fortran namelist parser written in Fortran 90), that is designed to provide many available options for analyzing CpHMD and pH-REMD simulations from sander and pmemd.
It is ca. 2x faster than calcpka.F90 that is built right now with AmberTools, and provides identical results. It also does a lot more, like conditional probabilities, running averages, cumulative averages, and chunk averages.
It is available on Github, complete with instructions. You can visit the webpage here: [https://github.com/swails/cphmd_tools]
If you have git on your computer, you can get the source code immediately via:
which will create a directory cphmd_tools with the source code inside. You can use the command
periodically to download any updates that I may have pushed to github. Build and use instructions are available on the github website.
Beginning with AmberTools 14, cphstats is distributed alongside Amber and is built during the normal build process of AmberTools.
Prerequisites: Fortran 90 and C++ compilers capable of linking to each other (e.g., Intel compilers or GNU compilers)
From the command-line with git:
From the github website | here
Usage: cphstats [-O] [-V] [-h] [-i <cpin>] [-t] [-o FILE] [-R FILE -r INT] [--chunk INT --chunk-out FILE] [--cumulative --cumulative-out FILE] [-v INT] [-n INT] [-p|-d] [--calcpka|--no-calcpka] [--fix-remd] [--population FILE] [-c CONDITION -c CONDITION -c ...] [--conditional-output FILE] [--chunk-conditional FILE] cpout1 [cpout2 [cpout3 ...] ] General Options: -h, --help Print this help and exit. -V, --version Print the version number and exit. -O, --overwrite Allow existing outputs to be overwritten. --debug Print out information about the files that are being read in and used for the calculations. --expert I will consider you an expert user and NOT warn you if you try to compute statistics from REMD-based files before using --fix-remd [NOT default behavior] --novice I will warn you if you try to use REMD-based files to compute statistics. [Default behavior] Input Files and Options: -i FILE, --cpin FILE Input cpin file (from sander) with titrating residue information. -t FLOAT, --time-step FLOAT This is the time step in ps you used in your simulations. It will be used to print data as a function of time. Default is 2 fs (0.002) Output Files: -o FILE, --calcpka-output FILE File to which the standard `calcpka'-type statistics are written. Default is stdout -R FILE, --running-avg-out FILE Output file where the running averages of time series data for each residue is printed (see [Output Options] below for details). Default is [running_avgs.dat] --chunk-out FILE Output file where the time series data calculated over chunks of the simulation are printed (see [Output Options] below for details). Default is [chunk.dat] --cumulative-out FILE Output file where the cumulative time series data is printed (see [Output Options] below for details). Default is [cumulative.dat] --population FILE Output file where protonation state populations are printed for every state of every residue. --conditional-output FILE Output file with requested conditional probabilities. Default is [conditional_prob.dat]. --chunk-conditional FILE Prints a time series of the conditional probabilities over a trajectory split up into chunks. Output Options: These options modify how the output files will appear -v INT, --verbose INT Controls how much information is printed to the calcpka-style output file. Options are: (0) Just print fraction protonated. [Default] (1) Print everything calcpka prints. -n INT, --interval INT An interval between which to print out time series data like `chunks', `cumulative' data, and running averages. It is also used as the 'window' of the conditional probability time series (--chunk-conditional). Default  -p, --protonated Print out protonation fraction instead of deprotonation fraction in time series data (Default behavior). -d, --deprotonated Print out deprotonation fraction instead of protonation fraction in time series data. -a, --pKa Print predicted pKas (via Henderson-Hasselbalch) in place of fraction (de)protonated. NOT default behavior. Analysis Options: These options control which analyses are done. By default, only the original, calcpka-style analysis is done. --calcpka Triggers the calcpka-style output [On by default] --no-calcpka Turns off the calcpka-style output -r WINDOW, --running-avg WINDOW Defines a window size for a moving, running average time series. <WINDOW> is the number of MD steps (NOT the number of MC exchange attempts). --chunk WINDOW Computes the time series data over a chunk of the simulation of size <WINDOW> time steps. See above for details. --cumulative Computes the cumulative average time series data (see above for options) over the course of the trajectory. --fix-remd PREFIX This option will trigger cphstats to reassemble the titration data into pH-specific ensembles. This is an exclusive mode of the program---no other analyses will be done. -c CONDITIONAL, --conditional CONDITIONAL Evaluates conditional probabilities. CONDITIONAL should be a string of the format: <resid>:<state>,<resid>:<state>,... or <resid>:PROT,<resid>:DEPROT,... or <resid>:<state1>;<state2>,<resid>:PROT,... Where <resid> is the residue number in the prmtop (NOT the cpin) and <state> is either the state number or (p)rotonated or (d)eprotonated, case-insensitive This program analyzes constant pH output files (cpout) from Amber. These output files can be compressed using gzip compression. The compression will be detected automatically by the file name extension. You must have the gzip headers for this functionality to work.