Using git

git is a software version control system. While it is used for such esteemed projects as the Linux kernel, Android, itself (git), and many others, of principle interest here is the fact that it is also used for Amber and AmberTools.

For that reason, I include here a brief tutorial to help get started using git

Table of Contents

Why git?

Why use git (or related products like mercurial) instead of other products like CVS and svn?

  • It is secure (it uses SHA hashes to make sure that no intentional or accidental data corruption ruins your project)
  • It is distributed, so no "central server crashing" will cripple a project. Every person's copy of the repository is a full-fledged repository with a full history.
    • The only way to get rid of a repository is to get rid of everybody's repository that has ever cloned the original making it far less susceptible to hardware failures, etc.
    • You can develop off-line. You do not need a direct connection to commit, update, branch, etc. on your local copy.
  • It tracks content, not files. Therefore, git can continue to track the history of a particular block of code even if it moves from file to file.
  • In my opinion, the best feature of git, branches are incredibly easy to create, manage, and merge. Even conflicts are easy to resolve if two people make changes that overlap with each other.

Using git on the command-line

The git project is actually a collection of several programs that interact with git repositories. The git program itself simply calls other programs. Specifically, git Action calls the program git-Action.

Almost all git programs need to be executed inside a git repository or any of the sub-directories contained within a git repository.

I'm not sure if git repositories can be nested, but I don't think there's much point in nesting them, anyway.

Getting help

If you use the command-line flag --help on any of the git programs, it launches the Unix man-page for that command. For example,

git init —help

is the same as

man git-init

Repository type: bare vs. working

There are two types of git repositories. A bare repository and a working repository.

Bare repository

A bare repository contains just git's book-keeping. That is, it keeps track of all of the source code in every branch (for every remote it knows about). There is no "working" branch, and as such, you should never do any "work" in a bare git repository.

These are primarily intended to be places where changes can be pushed to and pulled from. If you never plan on working on any files inside this particular git repository, make it a bare repository.

Because bare repositories have no working files, not all git commands will work in a bare git repository, but the size of the repository itself will be much, much smaller.

Working repository

A working repository effectively has all of the information found in a bare git repository in a .git/ folder in the root of the git repository (the top-level directory of the git repository). However, it also has every file contained in the "working branch" of the repository in its proper place in the directory structure. If you plan to make any changes or do any development in a git repository, it will need to be done in a working repository.

Configuring your git repositories

git gives you the ability to configure some of the default behavior via the git config command. This command typically works as follows:

git config [--global] <action>.<property> <default_value>

Where you can set the default value for a particular action's property to a new default value. You can either do this on a repository-by-repository basis, or for every repository using the --global option. The way defaults are determined are according to a ranking from most important to least important. If a particular default for the given level is not specified, it looks to the next level until a specified default is given:

  1. Local repository — Does the default exist for this repository?
  2. Global definition for this user — Did the current user define a default using --global
  3. The built-in default for git

Configurations you should do

There are a couple git config commands you should execute before you get started.

  • git config --global user.name "Jason Swails"
  • git config --global user.email "jason@swails.com"
  • git config --global status.showUntrackedFiles no

The first will register every commit you perform on this machine (in any repository) with your name an email address. This is generally an absolute must for collaborative projects. The second will register your email address as well.

The last will prevent the git status command from showing every file that the current git repository doesn't track.

I use this option because I often compile source code in my git repositories, and I don't want git to tell me about every object file I may have laying around from an old compile (or the programs that were built in that directory). For the repositories that I want to see untracked files in, I simply use the command

git config status.showUntrackedFiles all

in that git repository to override my default just for that repository.

Creating a new repository: git init

This command is just for creating a brand-new git repository. If you want to make your own working copy of someone else's repository, you want the git clone command, not this one.

To create a new repository, go to the directory that you want to turn into a git repository and type the command

git init [--bare] .

Note that this only initializes the repository (optionally making it a bare repository). It does not add any of the folders or files in this directory that repository. That has to be done via git add

Cloning an existing repository: git clone

If you want to make your own copy of somebody else's repository (for instance, Amber's git repository), you need to clone that repository. You can clone any repository — a bare repository or a full working repository.

git clone [--bare] <remote-repository>

The <remote-repository> can be an ssh/rsync-type path, or simply a path to another git repository on the same file system. As an example, I'll use the public git repository for Gromacs:

git clone [--bare] git://git.gromacs.org/gromacs.git [folder_name]

Note: the --bare indicates that the repository that you're creating with git clone is a bare repository, not that that repository you're cloning is bare.

If you do not specify a [folder_name], the .git suffix will be stripped, if it exists, and the folder name will be given the same name as the folder name on the remote server. In this case, it will create a gromacs/ directory and fill it with its tracked files.

Adding files: git add

In order to tell git that you want to start recording files it is not already recording (new to the git repository) or changes to existing, tracked files, you need to use the git add. In git language, git add will stage new files and changes to existing files to be committed. Use this command as follows:

git add [-u] <path_1> [<path_2> [<path_3> …] ]

The <path_#> can either be directory names or file names. If it's a directory, every file in that directory, or subdirectories therein, will be added. If it's a file name, just that file will be added.

The optional -u argument will only add updates. That is, if you do not want to add untracked files, you can use the -u option to ignore those. Also, there's no limit to how many git add commands you can use before committing changes.

Deleting files: git rm

Simply deleting files from the file system does not prevent git from tracking those files. (This is a very nice feature!) If you really want to delete files from a git repository, you need to use the command git rm to tell git to stop tracking those files. In git language, git rm stages files to be removed from a git repository. Use it as follows:

git rm [-r] <path_1> [<path_2> [<path_3> …] ]

You can remove files one-at-a-time, or you can recursively (with the -r argument) remove an entire folder. Also, there's no limit to how many git rm commands you can use before committing changes.

Tip:

When I delete files, I usually use the command [git status | grep deleted] to extract the files that were deleted, and give those files to git rm.

Finding out what changes have been made: git status

If you want to see what changes have been made to your current working directory, with respect to the HEAD, or the latest commit in the current branch that git knows about, use the git status command. Use it as follows:

git status [-uall | -uno] [path]

The -uall command will show all untracked files, regardless of the default value of status.showUntrackedFiles. The -uno will hide all untracked files.

The [path] can be a directory or file name, and it will show only changes to files within that directory or if there are changes to the specified file.

For instance, if you want to see if any new files have been added to a directory by a given operation, you can use the command

git status -uall directory/

Committing staged changes: git commit

git will not make any changes that are not staged (see git add and git rm for details about staging changes}}. Once changes have been staged, though, you can update the repository with your recent, great changes. There is an option to stage changes and commit them with a single git commit command — see below. Use this command as follows:

git commit [-m "log message"]

You can optionally include a log message on the command-line. If you don't provide one here, you will be dropped into a text editor (specified by the EDITOR environment variable) to provide one there. git will not allow you to make a commit with no log message. And please, don't be that person that uses "updates." as their commit log. Tell people what you've done in some level of detail.

Committing and Staging Simultaneously: git commit -a

The command git commit -a acts like the following sequence of commands:

git add -u <root_directory>
git rm <deleted files>
git commit

It will add any files that have been changed (but it will not add any untracked files), and it will delete any files that have been removed.

Changing details about your commit: git commit --amend

If you made a mistake in your last commit, or you wish to update your log message to give more details, or you forgot to set your user name and email, you can use the command

git commit --amend

to change your last commit. Note, you should only do this if you haven't disseminated your changes anywhere else. For instance, if you've already pushed to the central repository, or others have pulled your changes a lot, don't do this.

Undoing Commits

There are two ways to undo a particular commit.

The first is to use git reset, which effectively rewinds your change history. This approach should be used only in the situations in which git commit --amend could be used (that is, if your changes haven't been distributed anywhere else).

The second, and my strongly suggested option, is to simply reverse the changes you made and make a new commit. git diff can help you create a patch file to do this very simply. This is detailed farther below when git diff is introduced.

Suppose your git-log looks like this:

bash$ git log

commit 0e3eac9647f122762561e8291ea34e66ed95349c
Author: Jason Swails <jason.swails@gmail.com>
Date:   Thu Apr 19 17:52:34 2012 -0400

    Update collect_pH_stats.py to collect deprotonated fraction optionally.

    Updated mdcrd-py.log.check to reflect new cpptraj version.

commit 48414994114de591d13e565467b8d7d277e00f58
Author: Jason Swails <jason.swails@gmail.com>
Date:   Thu Apr 19 17:34:06 2012 -0400

    Change permissions.

Now suppose I want to undo my updates to collect_pH_stats.py and mdcrd-py.log.check. What I would do is use git diff to print the diffs to go from commit 0e3eac9647f to commit 4841499411 (since I want to reverse that change), then apply that diff using git apply

It is generally a good idea to look at the diff before blindly committing it to make sure it's not including too much (because of, e.g., a merge that it was part of). To do this, use the command:

git diff 0e3eac9647f 4841499411

Once you have decided the diff is appropriate, use the command

git diff 0e3eac9647f 4841499411 | git apply

to apply the patch and reverse your commit. Keep in mind you will have to commit your new changes, since it didn't create a commit for you.

However, if the git diff is contains more than just that part that you want to reverse, you can dump the diff to a file and cut out the parts you don't want to change, or you can copy-and-paste the diffs from gitk into a file. Then, use git-apply on that file: i.e.,

git apply < my_patch_file.txt

Branches: git branch

Branches in git are an integral part of its design. There is actually no avoiding branches with git. The simple act of creating a git repository creates a branch (the default branch in any new repository is called master). Thus, at the simplest level, everybody's repository is a set of at least one branch.

When do I use branches?

Branches can be used for anything — you can have a separate branch for every little feature you wish to experiment with. You can blow branches away whenever you want, giving you the freedom to experiment without ever having to worry about breaking the repository. If one of your branches produces something you like, you can always merge it back into the master branch, or some other widely-used branch.

Listing branches: git branch [-r | -a]

To get a full listing of all branches in your local repository, run the command

git branch

To get a full listing of all branches on all registered remotes, run the command

git branch -r

To get a full listing of all branches on both your local repository and all registered remotes, use the command

git branch -a

Creating a new branch: git branch <branch_name>

To create a new branch, use the command:

git branch [--track] <branch_name> [starting_point]

You can optionally create your new branch to "track" from a specified remote branch (starting_point) using the optional arguments shown above. If you just specify a branch_name, your new branch will be created from your current HEAD (the current state of your current branch, discarding any staged or unstaged changes that have not been committed).

As an example, I will use my Amber git repository, whose main remote is called origin (as is the case with all cloned repositories — see the section on remotes below):

git branch parmed

Will create a new branch called "parmed". If there is also a branch on the Amber git server called "parmed" that I want my new branch to follow, I need to amend the above command to

git branch --track parmed origin/parmed

Changing branches: git checkout <branch_name>

To change branches, you need to use the command

git checkout <branch_name>

Some things that may prevent you from successfully changing branches are if you have unstaged changes that conflict with changes between your current branch and your destination branch. You may have to fix these conflicts (or stash them, see below) before you change branches.

Merging branches git merge

Branches are completely useless unless you have some way of combining them at the end. This is done via merging. To merge branch1 into branch2, you need to change to branch2 and merge branch1 into it. This is done as follows:

git checkout branch1
git merge branch2

The first command will change to branch1. The second command will merge branch2 into branch1. This sequence of commands will change branch1, but it will leave branch2 unchanged. It is helpful to understand this to understand how git works with branches. (Note that branch2 can be a remote branch, if you want it to be)

Fixing conflicts

Sometimes, two branches have changed from each other in a way that conflicts with each other. For instance, if each branch changes the same line, git has no way of knowing which branch has the "correct" version. In this case, git has tried its best to merge the two branches, but leaves it up to the user to fix what it couldn't.

A conflicted merge will look like this:

bash$ git merge test_branch 

Auto-merging README
CONFLICT (content): Merge conflict in README
Automatic merge failed; fix conflicts and then commit the result.

The message tells you that the conflict is in the README file. Then, git will modify the conflicted file so that it has the portions of both branches, allowing you to pick between them. The code is delimited into the code pertaining to the HEAD of the current branch («««< HEAD) and the code pertaining to the branch you just merged in (»»»> branch). For instance, the conflicted part of this file is shown below:

<<<<<<< HEAD
   Written in Python by Jason M. Swails
=======
   Written in Python by Jason Swails
>>>>>>> test_branch

In the master branch, I changed my name from Jason to Jason M. Swails, whereas in the test_branch, I changed my name from Jason to Jason Swails. Because these commits modified the same part of the same file, git could not determine what I really wanted. So what I do here is simply modify the conflicted file to keep the parts I want (and make any other changes necessary to make the two changes play nicely with each other), then commit the result.

If I look at the status of my repository:

bash$ git status -uno

# On branch master
# Unmerged paths:
#   (use "git add/rm <file>..." as appropriate to mark resolution)
#
#    both modified:      README
#
no changes added to commit (use "git add" and/or "git commit -a")

I made the changes, now I can commit using git commit -a:

The default commit message opens up as:

Merge branch 'test_branch'

Conflicts:
   README

Once we actually commit this fixed conflicted merge, git will identify our two branches as the same at this point. That is, if you were to then change to the branch that you just merged into your current branch and perform the merge in the reverse direction, git would simply fast-forward that branch to this merge point.

What does a merge really do?

What a merge really does is flatten out commit-histories between different branches and create a new "merge" commit at the top of the commit log. This merge commit does not contain any changes — its only purpose is to indicate to git that, at this point, these two branches have been merged and all conflicts have been resolved.

Squashing a merge: git merge --squash

Normally a merge will insert every commit in the merging branch (the branch that you're merging from) into the commit history of the branch you're merging into.

If one of your branches has reached the end of its life and you are finished with it after merging it into another branch and you don't want all of the intermediate commits you made to be dumped into the history of your other branch, you can "squash" a merge so that it is dumped as a single change onto your current branch. Do this using the --squash flag. For instance:

git merge --squash test_branch

The above command will take all of the changes from test_branch and stage them into your current branch. You can then use git commit directly to apply those changes.

Note that this will not create the merge commit described above to tell git that these branches have been merged. I typically use this command when I'm ready to axe the branch I'm merging into the master.

Make an Existing Branch Track a Remote One

If you have a branch that you pushed to a remote and you want it to start tracking that remote branch, you can do that via the command:

git branch --set-upstream <local_branch> <remote>/<remote_branch>

Remotes: git remote

Remotes are other repositories (of the same project) that your repository knows about via a nickname. If you clone a repository from an external location, your new repository automatically knows about the repository that it cloned as origin.

There's no limit to how many remote repositories you can have nicknamed on your repository, and lets you communicate with many users on the same project in addition to maybe communicating with a centralized server containing the "official" version.

Get a List of Remotes: git remote [-v]

To get a full list of remotes, use no arguments to git remote:

git remote [-v]

If you use the -v flag, it will print out the location of the remote along with nickname. As an example, for my Amber repository, I have a number of remotes I registered with my repository:

bash$ git remote

jmac
origin
ufhpc

bash$ git remote -v

jmac    git@rumford.compbio.ucsf.edu:amber.git (fetch)
jmac    git@rumford.compbio.ucsf.edu:amber.git (push)
origin    gitosis@git.ambermd.org:amber.git (fetch)
origin    gitosis@git.ambermd.org:amber.git (push)
ufhpc    merzberg@128.227.253.87:amber.git (fetch)
ufhpc    merzberg@128.227.253.87:amber.git (push)

Add a New Remote: git remote add

To add a new remote, use the following command:

git remote add <nickname> <URL>

This will register the given URL (local path or online server path to the remote repository) with the given nickname. You can then fetch and pull from that remote (see below) and push to it if you have permission to do so.

Delete an Existing Remote: git remote rm

To get rid of a remote you don't want anymore use the command

git remote rm <nickname>

to remove the remote with the given nickname

Rename an Existing Remote: git remote rename

To rename a remote, use the command

git remote rename <current_nickname> <new_nickname>

This will update all branches that follow branches on that remote, as well (much better than removing the remote and adding it back again).

Change Branch On Remote: git remote set-head

This is an atypical option, in my experience. HEAD only makes sense on a working repository (since an active branch makes no sense in a repository with no working files, does it?). However, if you wanted to do it, you would do it via

git remote set-head <name> <branch_name>

Getting Updates From Remote

Now that we know about branches and remotes, we're ready to talk about getting updates from a tracked remote.

Just Get Information: git fetch

If you just want to update your local information about a remote repository, use the command

git fetch [<remote>]

where <remote> is either the URL of a remote or your nickname for that remote. If your current branch is set up to track a remote branch, you do not need to supply remote — it will automatically fetch from the one that it is tracking. Otherwise, you do. This will not incorporate any of the changes from that remote into your current branch.

Get Updates and Incorporate Them: git pull

If you want to update your local information about a remote repository and update your current branch to reflect those changes, use the command

git pull [<remote>] [<branch>]

where <remote> is either the URL of a remote or your nickname for that remote. If your current branch is set up to track a remote branch, you do not need to supply remote or branch— it will automatically pull from the one that it is tracking. Otherwise, you do. This will incorporate the changes from that remote into your current branch. It is logically equivalent to running a fetch followed by a merge. Specifically,

git fetch <remote>
git merge <remote>/<branch>

is the same as

git fetch <remote> <branch>

Updating a Remote: git push

If you have made your glorious changes and wish to share them with everyone else connected to a given remote, you can push your changes to a remote via the command

git push [<remote>] [<branch>]

where <remote> is either the URL of a remote or your nickname for that remote. If your current branch is set up to track a remote branch, you do not need to supply remote or branch— it will automatically push to the one that it is tracking.

One thing to note: git push is not the logical inverse of a git pull. A git push will not perform a merge — it will only perform a fast-forward (which means that one branch is some number of commits ahead of the other branch after a point where they both meet, such as a merge point). Thus, if you want to push your changes, you typically have to do a git pull first to merge the remote branch with your branch. After this, the two branches have a common merge point, and the remote can be fast-forwarded.

Looking at the History: git log

If you want to look at the history of commits, you can use the command

git log [<path_1> [<path_2>] .. ] ]

If you do not provide any path (file or directory), it will give you a log of every commit for the entire repository. If you do provide a path, it will only show you those logs that affected files in the specified paths.

Note that providing a path will make the log take significantly longer (especially for a large repository with a long history), since git doesn't track files — it tracks content.

Resetting to an earlier state: git reset

If you want to reset your git repository to an earlier state, you can use the command

git reset [--hard] <commit_id>

The commit_id can be found in the git log (see above). If you use the keyword --hard, then it will change all of the working files to the previous version. Otherwise, it just rewinds the history to the given point and unstages those changes (though the working files don't change).

Be very careful with this command in a branch that is linked to remote branches.

Looking at Differences: git diff

Very often, it's desirable to look at differences between different states of a git repository so you can look at changes that have been made between different commits, since the last commit, etc. The command git diff is a versatile program that does this. It is useful for a wide range of uses. One of its nicest functions is the ability to create patch files that can be used directly with the patch program to modify files within another repository or, more often, a released tarball.

Common uses are

git diff <commit_id1> [HEAD]

The above command will show the differences required to move from commit_id1 to the current state of the branch, or HEAD. The HEAD is optional.

git diff <commit_id1> <commit_id2>

The above command will show the differences required to move the tree from commit_id1 to commit_id2. Thus, the first usage is really a special case of this usage.

git diff HEAD

The above command will show all of the differences between the latest commit and the current revisions for all differences that have either been staged (via git add or git rm) and any differences in files that are already being tracked. Thus, the only way to get the above form of git diff to recognize a new file is to stage that new file using git add

Summary of changes: git diff --stat

If you wish to get a summary of all of the changes, use the --stat command-line option. For instance:

bash$ git diff --stat 37663ae7 f2d346cbb5

 AmberTools/src/parmed/ParmedTools/ParmedActions.py |   35 +++-
 doc/AmberTools.lyx                                 |  171 +++++++++++++++++++-
 2 files changed, 194 insertions(+), 12 deletions(-)

Creating a patch

git can be used to very easily create a patch. To generate a patch file, you need the command-line argument --no-prefix, then redirect the output to the patch file.

Another useful flag to use, in my opinion, is --patch-with-stat to display a summary alongside the patch so people know what the patch will modify.

Other useful functions

Transfer Single Commit Between Branches: git cherry-pick

If you have a commit you made in one branch that you want to move to another, the git cherry-pick command was developed specifically for you.

To transfer a single (or multiple, isolated) commit(s) from one branch to another, get the commit ID (SHA hash) from the commit you want (use git log to get this). Then go to the branch you wish to move it to (using git checkout <branch>). Then, use the command

git cherry-pick [-n] <commit_id>

By default (without the -n flag), this command will automatically stage the changes made in the commit and commit it to the current tree, copying the log message from the old commit and adding the comment "cherry-picked from <commit_id>".

If you use -n, then it will just stage the changes to your branch without committing them, giving you the option of changing the commit message or combining multiple cherry-picks into a single commit.

Resetting a Remote Branch

Let's suppose that someone has committed something to a remote branch that they shouldn't have, and you now want to reset it. What you need to do is force-push the desired location or state into your remote repository. Do this with the following command:

git push -f <remote> <commit_id>:<branch>

For instance, to reset the master branch on the remote origin to 2dcd4c0ae68, do this:

git push -f origin 2dcd4c0ae68:master

Deleting a Remote Branch

Now let's suppose that you're done with a branch that's on a remote repository and you wish to delete it. You can do this with a similar syntax to "Resetting a Remote Branch". But this time, you want to push no commit ID to that branch (effectively resetting it to nothing). Do this with the following command:

git push <remote> :<branch>

For instance, to delete the mmpbsa branch on the remote origin:

git push origin :mmpbsa

gitk

gitk is a GUI that helps you visualize differences in a git repository. Common uses are

gitk <branch>

to narrow in on a specific branch (including remote branches)

gitk <path>

to only pull out commits that pertain to files in the given path (whether it's a directory name or a specific file).

Play around with the GUI a little bit to see all of the information you can get from it. I use gitk all the time.

Setting up a git-ized command prompt

If you frequently work with different branches (which I do, as that exposes the full power of git), then it is helpful to have a visual cue that will instantly tell you what branch you are working on (without having to do git branch). That is where the git-prompt.sh file comes in. This file is included with the standard git install and provides bash functions that you can use in your command prompt (the PS1 environment variable) to always display your branch.

You need to make sure that the git-prompt.sh file is properly sourced in your shell. In particular, you need the function __git_ps1 defined in order to beef up your prompt with the git branch name. You can check this easily via the command:

swails@batman ~ $ type __git_ps1 | head
__git_ps1 is a function
__git_ps1 () 
{ 
    local pcmode=no;
    local detached=no;
    local ps1pc_start='\u@\h:\w ';
    local ps1pc_end='\$ ';
    local printf_format=' (%s)';
    case "$#" in 
        2 | 3)

If that function does not exist, you will see something like:

swails@batman ~ $ type __git_ps1
-bash: type: __git_ps1: not found

If that function is not available, locate where git-prompt.sh is on your computer and source it with

source /path/to/the/file/git-prompt.sh

somewhere in your shell or shell resource file. Then you can use the __git_ps1 function in your prompt. For instance, my prompt is defined in my .bashrc file as:

export PS1='\[\033[01;32m\]\u@\h\[\033[01;34m\] \w$(__git_ps1 " (%s)") \$\[\033[00m\] '

It is the __git_ps1 " (%s)" part that adds the current working branch to my prompt. Now when I am in a git repository, my prompt looks like:

swails@batman ~/amber (master) $ git branch
  amber12-with-patches
  amber13-with-patches
  amber14-with-patches
  amoeba2
  amoeba2_diis
* master
  sander-python
  yorkmaster
swails@batman ~/amber (master) $ git checkout sander-python
Switched to branch 'sander-python'
Your branch is up-to-date with 'origin/sander-python'.
swails@batman ~/amber (sander-python) $

Notice how the current working branch is always in my prompt.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License