All of these scripts have included a command-line help function that is accessible through
<script_name> --help
Please feel free to add scripts to this page if you think they are helpful (but please give it a --help flag to explain its usage). Also list any prerequisites to using those scripts, including the lowest Python version that's necessary as well as any other Python modules you need to download with it. Required Python modules should be placed either in the same directory as the main script or in a directory included in the PYTHONPATH environment variable.
Any of the scripts will either have to be executed with the interpreter directly (python or bash), or made executable (chmod +x <script_name> before you can run it.
Table of Contents
|
1Dbinning.py
This script will take a data set and construct histograms from it. It selects "smart" default bin ranges and number of bins. The bin range is determined from the maximum and minimum of the data set, and selects the number of bins according to Scott's Choice. The bin range is from one integer value to another, so if you need finer graining than integer-to-integer, you'll need to supply your own bin range or change
binrange[0] = math.floor(xmaxminholder[0])
binrange[1] = math.ceil(xmaxminholder[1])
to
binrange[0] = xmaxminholder[0]
binrange[1] = xmaxminholder[1]
in the script. You can also normalize the histograms, optionally, so that the function will integrate to 1.
Prerequisites
Prerequisites: None
Required Modules: Download utilities.py
Python Version: 2.6 or 2.7
Download
Usage
Usage: 1Dbinning.py [options]
Options:
-h, --help show this help message and exit
-f DATA_FILE, --file=DATA_FILE
Input data file
-o OUTPUT_FILE, --output=OUTPUT_FILE
Binned output file
-b BINS, --bins=BINS Number of bins to use
-r BINRANGE, --binrange=BINRANGE
Range of bins to use
-n, --normalize Normalize the histograms
-g SCRIPT_NAME, --gnuplot=SCRIPT_NAME
Gnuplot script name
-d DELIMITER, --delimiter=DELIMITER
The column delimiter (defaults to any kind of
whitespace)
-c COLUMN, --column=COLUMN
Which column to pull the data from
2Dbinning.py
This script works just like 1Dbinning.py above, but creates bins in 2 dimensions. It selects "smart" default bin ranges and number of bins. The bin range is determined from the maximum and minimum of the data set in both dimensions, and selects the number of bins according to Scott's Choice. You can also normalize the histograms.
Prerequisites
Prerequisites:
Required Modules: Download utilities.py
Python Version: 2.4 — 2.7
Download
Usage
Usage: 2Dbinning.py [options]
Options:
-h, --help show this help message and exit
-f DATA_FILE, --file=DATA_FILE
Input data file
-o OUTPUT_FILE, --output=OUTPUT_FILE
Binned output file
-b BINS, --bins=BINS Number of bins [NxM]
-x XBINRANGE, --xbinrange=XBINRANGE
Range for bins in the x-direction
-y YBINRANGE, --ybinrange=YBINRANGE
Range for bins in the y-direction
-n, --normalize Normalize the histograms
-g GNUPLOT, --gnuplot=GNUPLOT
Name of gnuplot script to write.
-c COLUMNS, --columns=COLUMNS
Which two columns to analyze [##,##] starting from 1
-d DELIMITER, --delimiter=DELIMITER
The string/character that delimits fields in the input
data file
mdout_analyzer.py
This script will load up all of the data in a series of Amber mdout files and allow you to immediately plot any of the data. It allows you to do a number of things with the data:
- Plot the raw data
- Save the raw data to a file
- Calculate the average and standard deviation
- Histogram the data
- Plot the normalized autocorrelation function
Prerequisites
Prerequisites: numpy, matplotlib, Tkinter
Python Version: 2.4 — 2.7
Download
mdout_analyzer.py is now a part of AmberTools. (It used to be called MdoutAnalyzer.py in AmberTools 13).
Usage
Launch the script on the command-line either by using:
mdout_analyzer.py [mdout1] [mdout2] … [mdoutN]
or just
mdout_analyzer.py
If you provide no mdout files, you will be prompted for one as soon as you launch the program.
Every energy component printed to the mdout file is assigned to a button. Clicking that button 'activates' that data set for any action that you apply to it. After you have selected the data sets you want to analyze, you can click any of the 'Actions' ('Graph Them!', 'Save to File', 'Show Statistics', 'Histogram Data', 'Autocorrelation'). See the image below:
After selecting however many data sets you want, you can do one of 5 things with them — the image below shows the window with EPtot and EELEC chosen.
You can modify the graph appearance (e.g., add a title, legend, lines, points, etc.) via the "Graph Options" menu at the top of the GUI.
Note, the X-coordinate for all of the analyses is either the TIME(PS) data if present, or a simple enumeration of the number of data points (except for Histogramming, of course, where the X-coordinate is the data you are histogramming and the Y- is either the probability or frequency).
The 5 actions are demonstrated in the sections below
Graph Them!
This graphs all of the selected data sets side-by-side. An example graph is shown below. You can add a title to the graph as well as axis labels via the "Graph Options" menu at the top of the main window. Use the buttons at the bottom of the graph window to save the image to a PNG file or zoom in on a portion of the plot.
Save to File
This brings up a window to allow you pick the file name to save your data to. The data will be printed with one data set in each column where the first column just enumerates the data. If the file name ends in .csv, a comma-separated-value file will be printed. Excel (and other spreadsheets) natively read in this data format. Otherwise, an ASCII file will be printed with 20-character columns per data set.
Show Statistics
This brings up a text window in which the average and standard deviation is printed for each data set selected. You can save this to a text file if you want using the "File" menu at the top of this window. See the image below as an example:
Histogram Data
This option will histogram the data and plot those histograms. The default bin width comes from Scott's choice. Otherwise, you can control the bins manually with the "Histogram Bins" option in the "Graph Options". If you put a positive number, that will be taken as the bin width. If you put a negative number, that will be taken as the number of bins you want. (It is not currently possible to provide different bin characteristics to different data sets in the same plot).
You can choose to normalize the data or not via the option in "Graph Options".
An example figure is shown below:
Autocorrelation
This option will plot the normalized autocorrelation function of the given data sets. In this case, the X-coordinate effectively becomes the 'lag' parameter. By default, the autocorrelation is calculated out to the maximum lag (e.g., out to the total number of frames provided). It is probably best to only consider lags out to half the size of your data set.
An example figure is shown below:
get_restarts.py
This script will take a NetCDF trajectory and extract restart files from it. If velocity information is present in that trajectory (because you set ntwv = -1), it will also dump that velocity information into the restart file as well (that part I don't think ptraj or cpptraj does yet).
Prerequisites
Prerequisites: ScientificPython, numpy
Python Version: 2.4 — 2.7
These prerequisites are typically available from package manager. For Ubuntu, these packages are python-scientific and python-numpy, although numpy is a dependency of ScientificPython. See the webpage for ScientificPython here
Download
Usage
Usage: get_restarts.py [options]
Options:
-h, --help show this help message and exit
-y FILE, --netcdf=FILE
Input NetCDF trajectory file to generate restarts from
-r FILE, --restrt=FILE
Name for output restart file. If multiple files are
created, a .# suffix will be added to this file name
where # is the frame number being dumped.
Single Restart File:
This option will print a single restart file from the desired frame of
the trajectory
-f INT, --frame=INT
Frame number to convert to restart file
Multiple Restart Files:
These options will print multiple restart files from the desired
frames of the trajectory
-s INT, --start-frame=INT
First frame to turn into a restart
-e INT, --end-frame=INT
Last frame to turn into a restart
-i INT, --interval=INT
Number of frames to skip between consecutive restart
files created.
This script will extract restart files from NetCDF trajectory files and
preserve any velocity information that may be present.
cphstats
This is technically a program, written in C++ (with a Fortran namelist parser written in Fortran 90), that is designed to provide many available options for analyzing CpHMD and pH-REMD simulations from sander and pmemd.
It is ca. 2x faster than calcpka.F90 that is built right now with AmberTools, and provides identical results. It also does a lot more, like conditional probabilities, running averages, cumulative averages, and chunk averages.
It is available on Github, complete with instructions. You can visit the webpage here: [https://github.com/swails/cphmd_tools]
If you have git on your computer, you can get the source code immediately via:
git clone https://github.com/swails/cphmd_tools.git
which will create a directory cphmd_tools with the source code inside. You can use the command
git pull
periodically to download any updates that I may have pushed to github. Build and use instructions are available on the github website.
Beginning with AmberTools 14, cphstats is distributed alongside Amber and is built during the normal build process of AmberTools.
Prerequisites
Prerequisites: Fortran 90 and C++ compilers capable of linking to each other (e.g., Intel compilers or GNU compilers)
Download
From the command-line with git:
git clone https://github.com/swails/cphmd_tools.git
From the github website | here
Usage
Usage: cphstats [-O] [-V] [-h] [-i <cpin>] [-t] [-o FILE] [-R FILE -r INT]
[--chunk INT --chunk-out FILE] [--cumulative --cumulative-out FILE]
[-v INT] [-n INT] [-p|-d] [--calcpka|--no-calcpka] [--fix-remd]
[--population FILE] [-c CONDITION -c CONDITION -c ...]
[--conditional-output FILE] [--chunk-conditional FILE]
cpout1 [cpout2 [cpout3 ...] ]
General Options:
-h, --help Print this help and exit.
-V, --version Print the version number and exit.
-O, --overwrite
Allow existing outputs to be overwritten.
--debug Print out information about the files that are
being read in and used for the calculations.
--expert I will consider you an expert user and NOT warn
you if you try to compute statistics from REMD-based
files before using --fix-remd [NOT default behavior]
--novice I will warn you if you try to use REMD-based files
to compute statistics. [Default behavior]
Input Files and Options:
-i FILE, --cpin FILE
Input cpin file (from sander) with titrating residue
information.
-t FLOAT, --time-step FLOAT
This is the time step in ps you used in your simulations.
It will be used to print data as a function of time.
Default is 2 fs (0.002)
Output Files:
-o FILE, --calcpka-output FILE
File to which the standard `calcpka'-type statistics
are written. Default is stdout
-R FILE, --running-avg-out FILE
Output file where the running averages of time series
data for each residue is printed (see [Output Options]
below for details). Default is [running_avgs.dat]
--chunk-out FILE
Output file where the time series data calculated
over chunks of the simulation are printed (see
[Output Options] below for details).
Default is [chunk.dat]
--cumulative-out FILE
Output file where the cumulative time series data
is printed (see [Output Options] below for details).
Default is [cumulative.dat]
--population FILE
Output file where protonation state populations are
printed for every state of every residue.
--conditional-output FILE
Output file with requested conditional probabilities.
Default is [conditional_prob.dat].
--chunk-conditional FILE
Prints a time series of the conditional probabilities over
a trajectory split up into chunks.
Output Options:
These options modify how the output files will appear
-v INT, --verbose INT
Controls how much information is printed to the
calcpka-style output file. Options are:
(0) Just print fraction protonated. [Default]
(1) Print everything calcpka prints.
-n INT, --interval INT
An interval between which to print out time series data
like `chunks', `cumulative' data, and running averages.
It is also used as the 'window' of the conditional
probability time series (--chunk-conditional).
Default [1000]
-p, --protonated
Print out protonation fraction instead of deprotonation
fraction in time series data (Default behavior).
-d, --deprotonated
Print out deprotonation fraction instead of protonation
fraction in time series data.
-a, --pKa Print predicted pKas (via Henderson-Hasselbalch) in place
of fraction (de)protonated. NOT default behavior.
Analysis Options:
These options control which analyses are done. By default, only
the original, calcpka-style analysis is done.
--calcpka Triggers the calcpka-style output [On by default]
--no-calcpka Turns off the calcpka-style output
-r WINDOW, --running-avg WINDOW
Defines a window size for a moving, running average
time series. <WINDOW> is the number of MD steps (NOT
the number of MC exchange attempts).
--chunk WINDOW
Computes the time series data over a chunk of the
simulation of size <WINDOW> time steps. See above for
details.
--cumulative Computes the cumulative average time series data (see above
for options) over the course of the trajectory.
--fix-remd PREFIX
This option will trigger cphstats to reassemble the
titration data into pH-specific ensembles. This
is an exclusive mode of the program---no other
analyses will be done.
-c CONDITIONAL, --conditional CONDITIONAL
Evaluates conditional probabilities. CONDITIONAL should be a
string of the format:
<resid>:<state>,<resid>:<state>,...
or
<resid>:PROT,<resid>:DEPROT,...
or
<resid>:<state1>;<state2>,<resid>:PROT,...
Where <resid> is the residue number in the prmtop (NOT the
cpin) and <state> is either the state number or (p)rotonated
or (d)eprotonated, case-insensitive
This program analyzes constant pH output files (cpout) from Amber.
These output files can be compressed using gzip compression. The
compression will be detected automatically by the file name extension.
You must have the gzip headers for this functionality to work.