Helpful Scripts

All of these scripts have included a command-line help function that is accessible through

<script_name> --help

Please feel free to add scripts to this page if you think they are helpful (but please give it a --help flag to explain its usage). Also list any prerequisites to using those scripts, including the lowest Python version that's necessary as well as any other Python modules you need to download with it. Required Python modules should be placed either in the same directory as the main script or in a directory included in the PYTHONPATH environment variable.

Any of the scripts will either have to be executed with the interpreter directly (python or bash), or made executable (chmod +x <script_name> before you can run it.

1Dbinning.py

This script will take a data set and construct histograms from it. It selects "smart" default bin ranges and number of bins. The bin range is determined from the maximum and minimum of the data set, and selects the number of bins according to Scott's Choice. The bin range is from one integer value to another, so if you need finer graining than integer-to-integer, you'll need to supply your own bin range or change

   binrange[0] = math.floor(xmaxminholder[0])
   binrange[1] = math.ceil(xmaxminholder[1])

to

   binrange[0] = xmaxminholder[0]
   binrange[1] = xmaxminholder[1]

in the script. You can also normalize the histograms, optionally, so that the function will integrate to 1.

Prerequisites

Prerequisites: None
Required Modules: Download utilities.py
Python Version: 2.6 or 2.7

Download

Download 1Dbinning.py here

Usage

Usage: 1Dbinning.py [options]

Options:
  -h, --help            show this help message and exit
  -f DATA_FILE, --file=DATA_FILE
                        Input data file
  -o OUTPUT_FILE, --output=OUTPUT_FILE
                        Binned output file
  -b BINS, --bins=BINS  Number of bins to use
  -r BINRANGE, --binrange=BINRANGE
                        Range of bins to use
  -n, --normalize       Normalize the histograms
  -g SCRIPT_NAME, --gnuplot=SCRIPT_NAME
                        Gnuplot script name
  -d DELIMITER, --delimiter=DELIMITER
                        The column delimiter (defaults to any kind of
                        whitespace)
  -c COLUMN, --column=COLUMN
                        Which column to pull the data from

2Dbinning.py

This script works just like 1Dbinning.py above, but creates bins in 2 dimensions. It selects "smart" default bin ranges and number of bins. The bin range is determined from the maximum and minimum of the data set in both dimensions, and selects the number of bins according to Scott's Choice. You can also normalize the histograms.

Prerequisites

Prerequisites:
Required Modules: Download utilities.py
Python Version: 2.4 — 2.7

Download

Download 2Dbinning.py here

Usage

Usage: 2Dbinning.py [options]

Options:
  -h, --help            show this help message and exit
  -f DATA_FILE, --file=DATA_FILE
                        Input data file
  -o OUTPUT_FILE, --output=OUTPUT_FILE
                        Binned output file
  -b BINS, --bins=BINS  Number of bins [NxM]
  -x XBINRANGE, --xbinrange=XBINRANGE
                        Range for bins in the x-direction
  -y YBINRANGE, --ybinrange=YBINRANGE
                        Range for bins in the y-direction
  -n, --normalize       Normalize the histograms
  -g GNUPLOT, --gnuplot=GNUPLOT
                        Name of gnuplot script to write.
  -c COLUMNS, --columns=COLUMNS
                        Which two columns to analyze [##,##] starting from 1
  -d DELIMITER, --delimiter=DELIMITER
                        The string/character that delimits fields in the input
                        data file

mdout_analyzer.py

This script will load up all of the data in a series of Amber mdout files and allow you to immediately plot any of the data. It allows you to do a number of things with the data:

  1. Plot the raw data
  2. Save the raw data to a file
  3. Calculate the average and standard deviation
  4. Histogram the data
  5. Plot the normalized autocorrelation function

Prerequisites

Prerequisites: numpy, matplotlib, Tkinter
Python Version: 2.4 — 2.7

Download

mdout_analyzer.py is now a part of AmberTools. (It used to be called MdoutAnalyzer.py in AmberTools 13).

Usage

Launch the script on the command-line either by using:

mdout_analyzer.py [mdout1] [mdout2] … [mdoutN]

or just

mdout_analyzer.py

If you provide no mdout files, you will be prompted for one as soon as you launch the program.

Every energy component printed to the mdout file is assigned to a button. Clicking that button 'activates' that data set for any action that you apply to it. After you have selected the data sets you want to analyze, you can click any of the 'Actions' ('Graph Them!', 'Save to File', 'Show Statistics', 'Histogram Data', 'Autocorrelation'). See the image below:

mdout_analyzer_screen.jpg

After selecting however many data sets you want, you can do one of 5 things with them — the image below shows the window with EPtot and EELEC chosen.

mdout_analyzer_2picks.jpg

You can modify the graph appearance (e.g., add a title, legend, lines, points, etc.) via the "Graph Options" menu at the top of the GUI.

Note, the X-coordinate for all of the analyses is either the TIME(PS) data if present, or a simple enumeration of the number of data points (except for Histogramming, of course, where the X-coordinate is the data you are histogramming and the Y- is either the probability or frequency).

The 5 actions are demonstrated in the sections below

Graph Them!

This graphs all of the selected data sets side-by-side. An example graph is shown below. You can add a title to the graph as well as axis labels via the "Graph Options" menu at the top of the main window. Use the buttons at the bottom of the graph window to save the image to a PNG file or zoom in on a portion of the plot.

mdout_analyzer_graphthem.jpg

Save to File

This brings up a window to allow you pick the file name to save your data to. The data will be printed with one data set in each column where the first column just enumerates the data. If the file name ends in .csv, a comma-separated-value file will be printed. Excel (and other spreadsheets) natively read in this data format. Otherwise, an ASCII file will be printed with 20-character columns per data set.

Show Statistics

This brings up a text window in which the average and standard deviation is printed for each data set selected. You can save this to a text file if you want using the "File" menu at the top of this window. See the image below as an example:

mdout_analyzer_stats.jpg

Histogram Data

This option will histogram the data and plot those histograms. The default bin width comes from Scott's choice. Otherwise, you can control the bins manually with the "Histogram Bins" option in the "Graph Options". If you put a positive number, that will be taken as the bin width. If you put a negative number, that will be taken as the number of bins you want. (It is not currently possible to provide different bin characteristics to different data sets in the same plot).

You can choose to normalize the data or not via the option in "Graph Options".

An example figure is shown below:

mdout_analyzer_histogram.jpg

Autocorrelation

This option will plot the normalized autocorrelation function of the given data sets. In this case, the X-coordinate effectively becomes the 'lag' parameter. By default, the autocorrelation is calculated out to the maximum lag (e.g., out to the total number of frames provided). It is probably best to only consider lags out to half the size of your data set.

An example figure is shown below:

mdout_analyzer_autocorr.jpg

get_restarts.py

This script will take a NetCDF trajectory and extract restart files from it. If velocity information is present in that trajectory (because you set ntwv = -1), it will also dump that velocity information into the restart file as well (that part I don't think ptraj or cpptraj does yet).

Prerequisites

Prerequisites: ScientificPython, numpy
Python Version: 2.4 — 2.7

These prerequisites are typically available from package manager. For Ubuntu, these packages are python-scientific and python-numpy, although numpy is a dependency of ScientificPython. See the webpage for ScientificPython here

Download

Download get_restarts.py here

Usage

Usage: get_restarts.py [options]

Options:
  -h, --help            show this help message and exit
  -y FILE, --netcdf=FILE
                        Input NetCDF trajectory file to generate restarts from
  -r FILE, --restrt=FILE
                        Name for output restart file. If multiple files are
                        created, a .# suffix will be added to this file name
                        where # is the frame number being dumped.

  Single Restart File:
    This option will print a single restart file from the desired frame of
    the trajectory

    -f INT, --frame=INT
                        Frame number to convert to restart file

  Multiple Restart Files:
    These options will print multiple restart files from the desired
    frames of the trajectory

    -s INT, --start-frame=INT
                        First frame to turn into a restart
    -e INT, --end-frame=INT
                        Last frame to turn into a restart
    -i INT, --interval=INT
                        Number of frames to skip between consecutive restart
                        files created.

This script will extract restart files from NetCDF trajectory files and
preserve any velocity information that may be present.

cphstats

This is technically a program, written in C++ (with a Fortran namelist parser written in Fortran 90), that is designed to provide many available options for analyzing CpHMD and pH-REMD simulations from sander and pmemd.

It is ca. 2x faster than calcpka.F90 that is built right now with AmberTools, and provides identical results. It also does a lot more, like conditional probabilities, running averages, cumulative averages, and chunk averages.

It is available on Github, complete with instructions. You can visit the webpage here: [https://github.com/swails/cphmd_tools]

If you have git on your computer, you can get the source code immediately via:

git clone https://github.com/swails/cphmd_tools.git

which will create a directory cphmd_tools with the source code inside. You can use the command

git pull

periodically to download any updates that I may have pushed to github. Build and use instructions are available on the github website.

Beginning with AmberTools 14, cphstats is distributed alongside Amber and is built during the normal build process of AmberTools.

Prerequisites

Prerequisites: Fortran 90 and C++ compilers capable of linking to each other (e.g., Intel compilers or GNU compilers)

Download

From the command-line with git:

git clone https://github.com/swails/cphmd_tools.git

From the github website | here

Usage

Usage: cphstats [-O] [-V] [-h] [-i <cpin>] [-t] [-o FILE] [-R FILE -r INT]
             [--chunk INT --chunk-out FILE] [--cumulative --cumulative-out FILE]
             [-v INT] [-n INT] [-p|-d] [--calcpka|--no-calcpka] [--fix-remd]
             [--population FILE] [-c CONDITION -c CONDITION -c ...]
             [--conditional-output FILE] [--chunk-conditional FILE]
             cpout1 [cpout2 [cpout3 ...] ]

General Options:
    -h, --help     Print this help and exit.
    -V, --version  Print the version number and exit.
    -O, --overwrite
                   Allow existing outputs to be overwritten.
    --debug        Print out information about the files that are
                   being read in and used for the calculations.
    --expert       I will consider you an expert user and NOT warn
                   you if you try to compute statistics from REMD-based
                   files before using --fix-remd [NOT default behavior]
    --novice       I will warn you if you try to use REMD-based files
                   to compute statistics. [Default behavior]

Input Files and Options:
    -i FILE, --cpin FILE
                   Input cpin file (from sander) with titrating residue
                   information.
    -t FLOAT, --time-step FLOAT
                   This is the time step in ps you used in your simulations.
                   It will be used to print data as a function of time.
                   Default is 2 fs (0.002)

Output Files:
    -o FILE, --calcpka-output FILE
                   File to which the standard `calcpka'-type statistics
                   are written. Default is stdout
    -R FILE, --running-avg-out FILE
                   Output file where the running averages of time series
                   data for each residue is printed (see [Output Options]
                   below for details). Default is [running_avgs.dat]
    --chunk-out FILE
                   Output file where the time series data calculated
                   over chunks of the simulation are printed (see 
                   [Output Options] below for details).
                   Default is [chunk.dat]
    --cumulative-out FILE
                   Output file where the cumulative time series data
                   is printed (see [Output Options] below for details).
                   Default is [cumulative.dat]
    --population FILE
                   Output file where protonation state populations are
                   printed for every state of every residue.
    --conditional-output FILE
                   Output file with requested conditional probabilities.
                   Default is [conditional_prob.dat].
    --chunk-conditional FILE
                   Prints a time series of the conditional probabilities over
                   a trajectory split up into chunks.

Output Options:
  These options modify how the output files will appear

    -v INT, --verbose INT
                   Controls how much information is printed to the
                   calcpka-style output file. Options are:
                      (0) Just print fraction protonated. [Default]
                      (1) Print everything calcpka prints.
    -n INT, --interval INT
                   An interval between which to print out time series data
                   like `chunks', `cumulative' data, and running averages.
                   It is also used as the 'window' of the conditional
                   probability time series (--chunk-conditional).
                   Default [1000]
    -p, --protonated
                   Print out protonation fraction instead of deprotonation
                   fraction in time series data (Default behavior).
    -d, --deprotonated
                   Print out deprotonation fraction instead of protonation
                   fraction in time series data.
    -a, --pKa      Print predicted pKas (via Henderson-Hasselbalch) in place
                   of fraction (de)protonated. NOT default behavior.

Analysis Options:
  These options control which analyses are done. By default, only
  the original, calcpka-style analysis is done.

    --calcpka      Triggers the calcpka-style output [On by default]
    --no-calcpka   Turns off the calcpka-style output
    -r WINDOW, --running-avg WINDOW
                   Defines a window size for a moving, running average
                   time series. <WINDOW> is the number of MD steps (NOT
                   the number of MC exchange attempts).
    --chunk WINDOW
                   Computes the time series data over a chunk of the
                   simulation of size <WINDOW> time steps. See above for
                   details.
    --cumulative   Computes the cumulative average time series data (see above
                   for options) over the course of the trajectory.
    --fix-remd PREFIX
                   This option will trigger cphstats to reassemble the 
                   titration data into pH-specific ensembles. This
                   is an exclusive mode of the program---no other
                   analyses will be done.
    -c CONDITIONAL, --conditional CONDITIONAL
                   Evaluates conditional probabilities. CONDITIONAL should be a
                   string of the format:
                         <resid>:<state>,<resid>:<state>,...
                     or
                         <resid>:PROT,<resid>:DEPROT,...
                     or
                         <resid>:<state1>;<state2>,<resid>:PROT,...
                   Where <resid> is the residue number in the prmtop (NOT the
                   cpin) and <state> is either the state number or (p)rotonated
                   or (d)eprotonated, case-insensitive

This program analyzes constant pH output files (cpout) from Amber.
These output files can be compressed using gzip compression. The
compression will be detected automatically by the file name extension.
You must have the gzip headers for this functionality to work.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License