Carnegie Mellon
SCS logo
Computer Science Department
home
syllabus
staff
schedule
lecture
projects
homeworks
 
 

15-410 Git Quickstart


This document is a work in progress. It may not be complete. To the best of our knowledge, the information that is here is correct. If you have issues following the instructions in this document, or you have suggestions to make this document clearer, please send e-mail to staff-410 at the CS
domain.

At some point over the course of 15-410 (or later!), you may find that PRCS is insufficiently powerful or usable to perform some tasks that you might want a source control system to perform for you. You may find, for instance, that you want to collaborate with more people who might not all have access to the same central repository; that you want light-weight branches to perform experiments in; or that you are unwilling to use Lisp as a data description language.

To the end of more facile development of your projects, we've written this quick-start guide for using an alternative: the Git version control system. This document will serve first as a user's reference, second as an explanation of concepts (although you need not understand all of the concepts to use Git), and third as evangelism for Git and other distributed version control systems (although you need not drink my kool-aid to use Git). In theory, each part should stand alone; you need not know of the concepts to use the reference, and you need not know of the reference to be evangelized to. In practice, you may find it useful to read all three parts to get a deeper understanding of what Git is doing while you aren't looking.

Should you use git, or something simpler (in the 410 context, maybe PRCS)? On the one hand, other things might be simpler and faster to learn right now. On the other hand, time spent learning git will pay off if join a project that already uses git. Because there are so many revision-control systems currently in use, there is no guarantee you won't have to learn something else, but git is among the more popular systems, so it's a plausible investment (especially compared to PRCS!).

Quick-start

Obtaining/installing Git

  • On Andrew UNIX, Git is available for you in the 15-410 bin directory. Please refer to 15-410 course documentaiton for adding our bin directory to your PATH.

  • On non-Andrew systems, the latest version of Git can be obtained from the official download site. It is buildable with the traditional ./configure && make && sudo make install procedure.

Adding your project to Git

  • To create a Git project, run git init in the top level directory of the project. The project should already exist. Git will create a hidden folder called .git, which will store all of Git's state.

    You should only do this once per project -- not once per person, not once per directory, not once per day! If you run git init again, all of Git's state will be reinitialized. To make a copy of Git's view of the world, take a look at the section Cloning Git's state below.

  • To add your files to Git's world for the first time, which should be done immediately after creating the Git project, first make sure that you don't have any files in your project directory that you don't want Git to track; in my case, I'm starting from a fresh untar of the project 1 sources. Inside your Git project directory, run git add ., then run git commit -m "Initial import". Git will acknowledge that you have made a commit by giving a message like: Created initial commit 311b98a: Initial import along with a list of files that it added.

Sharing your project with your partner

In general, these steps should only need to be performed once per project. You may need to perform some of these steps (in particular, to clone a copy of the repository) more than once if you are working on more than one computer.
  • To put a Git repository in your 410 AFS repository directory, first cd into your AFS repository directory. Next, create a "bare" repository (i.e., with only Git's view of the world, not a human-readable view of the world) by running the command git clone --bare path_to_your_working_folder, where path_to_your_working_folder is the path in which you did the git init previously.

    If you are not running on AFS, and you are instead sharing on your own server, you'll need to edit some settings to allow both of you to commit. I'll assume that you understand how to manage your own UNIX system if you're doing this, and you've created a group for you and your OS partner; for the purposes of this tutorial, we'll call that group cs410. Edit the config file created inside the cloned directory, and add the line sharedrepository = 1 at the end of the file. Set appropriate permissions using the command chmod -R ug+rwX p1, where p1 is the cloned directory. Finally, make sure that new files are group-owned by using the command chmod g+s p1/objects. This is not required -- if this seems complicated, you can just use Git on AFS.

  • To clone a copy of the repository, like your partner will want to do for the first time, run git clone path_to_repository, where path_to_repository is the path to the bare repository inside your AFS repository folder. Git will create a folder to clone into.

    If your repository is not on AFS (or your partner is not on AFS!), the clone command will look something like git clone you@some.other.machine:path_to_repository, where you is your username on the remote system, some.other.machine is the other machine's hostname, and path_to_repository is the path to the bare repository. This is not required if you are only working on AFS.

  • To point your original Git instance at the newly cloned repository, edit the .git/config file in your original Git instance with your favorite text editor, and add the following lines to the bottom (you can cut and paste as needed):

    [remote "origin"]
    url = path_to_repository
    fetch = +refs/heads/*:refs/remotes/origin/*
    [branch "master"]
    remote = origin
    merge = refs/heads/master
    
    where path_to_repository is the path to the bare repository. If you do not have any changes in your original repository, it might just be easier to rm it, and do a fresh clone as in the step above.

    You'll need to do this so that when you do a git push or git pull, Git knows where to push or pull from; this is done automatically for you when you do a git clone. If this wasn't set up correctly, then when you do a push or a pull (see below), you will get an error like fatal: 'origin': unable to chdir or not a git archive.

Working with Git on a day-to-day basis

These operations will become your new best friends. You will use them many times per day; it will pay off to become familiar with their operation and their quirks.
  • To record changes to every file that Git is tracking, run git commit -a -m "message", where message is a short message describing the changes you've made. To make the best use of some of Git's other features, you should endeavor to make changes in an order that will make sure that your project still compiles and runs when you commit. This will only record the change in your local repository, and not yet make your changes visible to your partner; see the section on pushing and pulling later.

  • To record changes to just a few files that Git is tracking, run git commit -m "message" file1 file2 ..., where file1... are files that you'd like to record changes to, and file2... are optional.

  • To add a new file to Git, run git add file1 file2 ..., where file1... are files that you'd like to add, and file2... are optional. Then, to record the newly added file, run git commit -m "message". If you do this on a file that already exists, Git will record the changed state at the time you ran the add.

  • To delete a file from Git, run git rm file1 file2 ..., where file1... are files that you'd like to delete, and file2... are optional. Then, to record the newly deleted file, run git commit -m "message". The delete is not permanent; you can still check out older revisions with that file intact.

  • To rename or move a file in Git, run git mv oldname newname, where oldname and newname are the obvious. (git mv also has similar semantics to the UNIX command mv; this is just the most common usage.) To record the moved files, run git commit -m "message".

  • To see what changes you've made to files that Git is tracking, run git diff. Git will produce diff-formatted output about all of the current unrecorded changes in your repository.

  • To make your changes visible to your partner, run git push. This will "push" your changes into the bare repository. If your repository is not already up to date with the bare repository, then git push will fail with a message like remote is not a strict subset of local. See the section on getting your partner's changes.

  • To get your partner's changes, run git pull. This will "pull" changes from the bare repository into your local repository. If you and your partner have changed the same sections of the same file in non-trivial ways that Git could not resolve, then the pull will leave your repository in a "conflicted" state with a message beginning with CONFLICT:. Edit the conflicted files to resolve the conflicts, make sure your project builds, and then run something like git commit -m "Fix merge conflict" to record the fix. Your fix will not be visible to your partner until you push.

Time travel with Git

In an ideal world, we would make no errors while writing code. Sadly, sometimes we wish to travel back to the past and determine what broke. It is generally considered inadvisable to modify history; if you do, you run the risk of killing one or more of your parents, and being in a paradoxical state of existance. If you wish to modify history, you might wish to create an alternate universe; in Git, we call these alternate universes "branches". Luckily, branches aren't needed to just go back and look. You may use these commands somewhat less frequently, but they are no less important.
  • To get a graphical view of your repository's history, run gitk. Ogle at all the pretty colors. Each time in the past that you or your partner recorded changes using the commit command will be represented by a line in the pane at the top. Select a line, and more details about the change will show up in the bottom panes, including the change's SHA1 ID.

  • To get a non-graphical list of all changes in a file's history, run git log file, where file is the file that you wish to get a change list for. Each change will start with a line starting with commit, and ending in the change's SHA1 ID. Some short information about the change will be given to you, including the message that you specified with the -m option to git commit.

  • To go back and view the repository's state as it was after a change in the past, run git checkout sha1, where sha1 consists of enough characters from the change's SHA1 ID to disambiguate it from all other changes. For instance, if the change you're looking for has ID 311b98a0a1c40ad176103ee8026131fcd0fcc919, then you may only need to run git checkout 311b98 to get the change you're looking for.

    Do not make any changes when you are viewing the past like this. If you wish to make changes from the past, use a branch. If you have outstanding changes that you have not run git commit on when you attempt to switch to viewing an old version of the repository, Git will give you an error message like error: You have local changes..., and will refuse to change what version you are viewing.

  • To view the most recent change in the repository (i.e., recover from viewing a change in the past), run git checkout master. Any changes that you may have committed from viewing the past will be lost into the abyss (they are not irrecoverable, but doing so is beyond the scope of this document).

  • To revert one or more files to the state in which they were after you last committed changes or ran a checkout, run git checkout file1 file2 ..., where file1... are files that you'd like to revert changes to, and file2... are optional.

Splitting reality with Git

At some point, you may wish that you could make a change on a previous version of your tree without affecting the current version (yet); or you may wish to split reality in half, and work on an experimental side-project without disrupting main development of your project. Branches in Git are designed to allow you to do just those things; split away from the main view of reality from some point in time (be that time now or the past).
  • To create a new branch from some point in the past, run git checkout -b branchname sha1, where branchname is what you want your new branch to be called (pick something descriptive and without spaces; Experimental_COW might be a good name if you're experimenting with copy-on-write), and sha1 is the SHA1 ID of the change that you wish to branch from (see the section go back and view above). Git will change you over to that branch, and you can begin recording changes on it immediately.

  • To change to a different branch, run git checkout branchname. The branch name that you started on was master; so to return to the version that's in the bare repostiory, run git checkout master. (The astute reader will note that this is the same command as to recover from being in the past.)

  • To merge from one branch to another branch, first change to the branch that you want to merge to, then run git pull . branchname, where branchname is the branch that you want to merge from. This can be done as many times as you like; there are no negative consequences from merging repeatedly. (Git considers your other branch as a 'virtual partner' to pull from.) To publish the pulled and merged changes from your branch to your partner, you can just run a git push when you are on the master branch, as normal.

  • To create a branch based on the current state of your repository, run git checkout -b branchname. The semantics are similar to creating a branch from some point in the past.

  • To create a tag, run git tag tagname. By convention, tag names are capitalized, but this is not enforced by Git. A tag name can be used anywhere a SHA1 ID would otherwise be used; to go back to the point at which you first got your shell running in Project 3, then you might run git checkout SHELL_RUNNING. The usual rules apply if you don't create a branch there; namely, recording changes would be a bad idea unless you proceed to create a branch.

Explanation of Concepts

The above involved some simplifications of the underlying concepts of Git for the purposes of readability and for the purposes of understandability of an introduction. The simplifications are not disastrous in terms of your comprehension of what Git is doing behind your back, but you may find it helpful to know how Git stores data to better work with Git. Tommi Virtanen's excellent page Git for Computer Scientists may provide some insight as well, for those who like to talk about DAGs and are big fans of arrows pointing every which way.

Commits

The basic unit of a point in time stored in Git is a commit. Each time we spoke of recording changes earlier, it would have been more correct to say "creating a commit"; I used the words "recording changes" to distinguish the operation from pushing and publishing your changes to your partner. A commit, by its nature, is comprised of a few pieces of information:

  • A reference to a parent commit: Each commit has one or more parent commits that refer to where the commit was derived from; you can think of the parent commits as previous steps in time from this commit. The very first commit you make (we called this the initial import earlier) has a special referenced parent of all zeroes, which Git takes to mean that a given commit is an initial import.
  • A description: This is the text that you enter in the -m option to git commit.
  • One or more changed files: When files are changed, Git records either a delta -- a binary patch against a file's version in the parent commit -- or a full version of the file in association with the commit. The file is technically not stored in the commit; instead, it is stored as a blob, and the commit contains a reference to the blob. Each blob can be referenced by many commits, but for most purposes, blobs behave as if they are "owned" by a commit.

A commit is identified by the SHA1 hash of all of the information that it contains. This hash is one common form of a refspec -- that is to say, it is one common way to specify a single commit. Recall that when you did a checkout to go back in time, you specified a SHA1 hash; in that case, you were using the SHA1 hash as a refspec.

You may have inferred by now that commits exist in a sort of a tree. Each commit may have one or more parent commits (a commit with more than one parent is called a merge commit), and each commit may have zero or more child commits. You can view the commit tree using gitk, as we saw above; each commit was identified by a dot, and gitk drew lines for us between each commit to explicitly show the branches of the tree.

This tree of cryptographic hashes gives Git a few very useful properties. Git can assure you that nobody has changed the tree that you have based your work on, because every element in the tree, down to the blobs, is identified by its cryptographic hash (its SHA1). If a parent object has changed, either by malicious intent or by disk corruption, Git simply will not be able to find the parent object, instead of giving you the incorrect data. This makes Git relatively immune to AFS corrupting its metadata.

Further, it makes it impossible to throw away history. Some version control systems that we discussed in lecture have versions per file; so deleting a file may delete its version history, or otherwise create a discontinuity in how the file is linked in terms of time. Similarly, renaming a file is not disastrous (although somewhat quirky); the only changes happen locally in the commit object. If a delete required a change of history, then the cryptographic hashes would change, and the entire tree's parent hash would have to change. The cryptographic hash system, then, makes Git resistant to inadvertant deletion of history.

Branches, tags, and refspecs -- oh my!

In this section, until now, you've seen only one kind of refspec -- a SHA1 hash of a commit. But in the quick-start above, you've worked with more types of refspecs; when you checked out a branch, you used the refspec that refers to the branch.


[Last modified Friday September 18, 2009]