Carnegie Mellon

Computer Science Department |
 |
 |
 |
 |
 |
 |
 |
| |
|
|
15-410 Git Quickstart
This document is a work in progress. It may not be complete.
To the best of our knowledge, the information that is here is correct. If
you have issues following the instructions in this document, or you have
suggestions to make this document clearer, please send e-mail to .
At some point over the course of 15-410 (or later!), you may find that PRCS
is insufficiently powerful or usable to perform some tasks that you might
want a source control system to perform for you. You may find, for
instance, that you want to collaborate with more people who might not all
have access to the same central repository; that you want light-weight
branches to perform experiments in; or that you are unwilling to use
Lisp as a data description language.
To the end of more facile development of your projects, we've written
this quick-start guide for using an alternative: the Git version control system. This document
will serve first as a user's reference, second as an explanation
of concepts (although you need not understand all of the concepts to use
Git), and third as evangelism for Git and other distributed version
control systems (although you need not drink my kool-aid to use Git). In
theory, each part should stand alone; you need not know of the concepts to
use the reference, and you need not know of the reference to be evangelized
to. In practice, you may find it useful to read all three parts to
get a deeper understanding of what Git is doing while you aren't looking.
Should you use git, or something simpler (in the 410 context, maybe PRCS)?
On the one hand, other things might be simpler and faster to learn right now.
On the other hand, time spent
learning git will pay off if join a project that already uses git.
Because there are so many revision-control systems currently in use,
there is no guarantee you won't have to learn something else,
but git is among the more popular systems, so it's a plausible investment
(especially compared to PRCS!).
Quick-start
Obtaining/installing Git
On Andrew UNIX, Git is available for you in the 15-410
bin directory. Please refer to 15-410 course documentaiton for
adding our bin directory to your PATH.
On non-Andrew systems, the latest version of Git can be obtained
from the official download site.
It is buildable with the traditional ./configure && make
&& sudo make install procedure.
Adding your project to Git
- To create a Git project, run git init in the top level
directory of the project. The project should already exist. Git will
create a hidden folder called .git, which will store all of Git's
state.
You should only do this once per project -- not once per person,
not once per directory, not once per day! If you run git
init again, all of Git's state will be reinitialized. To make a copy
of Git's view of the world, take a look at the section Cloning Git's
state below.
- To add your files to Git's world for the first time, which should
be done immediately after creating the Git project, first make sure that you
don't have any files in your project directory that you don't want Git to
track; in my case, I'm starting from a fresh untar of the project 1 sources.
Inside your Git project directory, run git add ., then run git
commit -m "Initial import". Git will acknowledge that you
have made a commit by giving a message like: Created initial commit
311b98a: Initial import along with a list of files that it added.
Sharing your project with your partner
In general, these steps should only need to be performed once per project.
You may need to perform some of these steps (in particular, to clone a
copy of the repository) more than once if you are working on more than
one computer.
To put a Git repository in your 410 AFS repository
directory, first cd into your AFS repository
directory. Next, create a "bare" repository (i.e., with only
Git's view of the world, not a human-readable view of the world) by running
the command git clone --bare path_to_your_working_folder,
where path_to_your_working_folder is the path in which you did the
git init previously.
If you are not running on AFS, and you are instead sharing on your own
server, you'll need to edit some settings to allow both of you to commit.
I'll assume that you understand how to manage your own UNIX system if you're
doing this, and you've created a group for you and your OS partner; for the
purposes of this tutorial, we'll call that group cs410. Edit the
config file created inside the cloned directory, and add the line
sharedrepository = 1 at the end of the file. Set appropriate
permissions using the command chmod -R ug+rwX p1, where
p1 is the cloned directory. Finally, make sure that new files are
group-owned by using the command chmod g+s p1/objects. This
is not required -- if this seems complicated, you can just use Git on
AFS.
To clone a copy of the repository, like your partner will want to
do for the first time, run git clone path_to_repository,
where path_to_repository is the path to the bare repository inside
your AFS repository folder. Git will create a folder to clone
into.
If your repository is not on AFS (or your partner is not on AFS!), the
clone command will look something like git clone
you@some.other.machine:path_to_repository, where you is
your username on the remote system, some.other.machine is the other
machine's hostname, and path_to_repository is the path to the bare
repository. This is not required if you are only working on AFS.
To point your original Git instance at the newly cloned
repository, edit the .git/config file in your original Git
instance with your favorite text editor, and add the following lines to the
bottom (you can cut and paste as needed): [remote "origin"]
url = path_to_repository
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master
where path_to_repository is the path to the bare
repository. If you do not have any changes in your original repository, it
might just be easier to rm it, and do a fresh clone as in the step
above.
You'll need to do this so that when you do a git push or git
pull, Git knows where to push or pull from; this is done automatically
for you when you do a git clone. If this wasn't set up correctly,
then when you do a push or a pull (see below), you will get an error like
fatal: 'origin': unable to chdir or not a git archive.
Working with Git on a day-to-day basis
These operations will become your new best friends. You will use them many
times per day; it will pay off to become familiar with their operation and
their quirks.
To record changes to every file that Git is tracking, run
git commit -a -m "message", where message
is a short message describing the changes you've made. To make the best use
of some of Git's other features, you should endeavor to make changes in an
order that will make sure that your project still compiles and runs when you
commit. This will only record the change in your local repository, and not
yet make your changes visible to your partner; see the section on pushing
and pulling later.
To record changes to just a few files that Git is tracking,
run git commit -m "message" file1 file2
..., where file1... are files that you'd like to record
changes to, and file2... are optional.
To add a new file to Git, run git add file1 file2
..., where file1... are files that you'd like to add, and
file2... are optional. Then, to record the newly added file, run
git commit -m "message". If you do this on a file
that already exists, Git will record the changed state at the time you
ran the add.
To delete a file from Git, run git rm file1 file2
..., where file1... are files that you'd like to delete, and
file2... are optional. Then, to record the newly deleted file, run
git commit -m "message". The delete is not
permanent; you can still check out older revisions with that file
intact.
To rename or move a file in Git, run git mv oldname
newname, where oldname and newname are the
obvious. (git mv also has similar semantics to the UNIX command
mv; this is just the most common usage.) To record the moved files,
run git commit -m "message".
To see what changes you've made to files that Git is tracking,
run git diff. Git will produce diff-formatted output
about all of the current unrecorded changes in your repository.
To make your changes visible to your partner, run git
push. This will "push" your changes into the bare
repository. If your repository is not already up to date with the bare
repository, then git push will fail with a message like remote
is not a strict subset of local. See the section on getting your
partner's changes.
To get your partner's changes, run git pull. This
will "pull" changes from the bare repository into your local
repository. If you and your partner have changed the same sections of the
same file in non-trivial ways that Git could not resolve, then the
pull will leave your repository in a "conflicted" state
with a message beginning with CONFLICT:. Edit the conflicted files
to resolve the conflicts, make sure your project builds, and then run
something like git commit -m "Fix merge conflict" to
record the fix. Your fix will not be visible to your partner until you
push.
Time travel with Git
In an ideal world, we would make no errors while writing code. Sadly,
sometimes we wish to travel back to the past and determine what broke. It
is generally considered inadvisable to modify history; if you do, you run
the risk of killing one or more of your parents, and being in a paradoxical
state of existance. If you wish to modify history, you might wish to create
an alternate universe; in Git, we call these alternate universes
"branches". Luckily, branches aren't needed to just go back and
look. You may use these commands somewhat less frequently, but they are no
less important.
To get a graphical view of your repository's history, run
gitk. Ogle at all the pretty colors. Each time in the past that
you or your partner recorded changes using the commit command will
be represented by a line in the pane at the top. Select a line, and more
details about the change will show up in the bottom panes, including the
change's SHA1 ID.
To get a non-graphical list of all changes in a file's
history, run git log file, where file is the file
that you wish to get a change list for. Each change will start with a line
starting with commit, and ending in the change's SHA1 ID.
Some short information about the change will be given to you, including the
message that you specified with the -m option to git
commit.
To go back and view the repository's state as it was after a
change in the past, run git checkout sha1, where
sha1 consists of enough characters from the change's SHA1 ID to
disambiguate it from all other changes. For instance, if the change you're
looking for has ID 311b98a0a1c40ad176103ee8026131fcd0fcc919, then
you may only need to run git checkout 311b98 to get the change
you're looking for.
Do not make any changes when you are viewing the past like this. If you
wish to make changes from the past, use a branch. If you have outstanding
changes that you have not run git commit on when you attempt to
switch to viewing an old version of the repository, Git will give you an
error message like error: You have local changes..., and will
refuse to change what version you are viewing.
To view the most recent change in the repository (i.e.,
recover from viewing a change in the past), run git checkout
master. Any changes that you may have committed from viewing the past
will be lost into the abyss (they are not irrecoverable, but doing so is
beyond the scope of this document).
To revert one or more files to the state in which they were
after you last committed changes or ran a checkout, run
git checkout file1 file2 ..., where file1... are
files that you'd like to revert changes to, and file2... are
optional.
Splitting reality with Git
At some point, you may wish that you could make a change on a previous
version of your tree without affecting the current version (yet); or you
may wish to split reality in half, and work on an experimental side-project
without disrupting main development of your project. Branches in Git are
designed to allow you to do just those things; split away from the main view
of reality from some point in time (be that time now or the past).
To create a new branch from some point in the past, run
git checkout -b branchname sha1, where
branchname is what you want your new branch to be called (pick
something descriptive and without spaces; Experimental_COW might be
a good name if you're experimenting with copy-on-write), and sha1 is
the SHA1 ID of the change that you wish to branch from (see the section
go back and view above). Git will change you over to that branch,
and you can begin recording changes on it immediately.
To change to a different branch, run git checkout
branchname. The branch name that you started on was
master; so to return to the version that's in the bare repostiory,
run git checkout master. (The astute reader will note that this is
the same command as to recover from being in the past.)
To merge from one branch to another branch, first change to
the branch that you want to merge to, then run git pull .
branchname, where branchname is the branch that you want
to merge from. This can be done as many times as you like; there are
no negative consequences from merging repeatedly. (Git considers your other
branch as a 'virtual partner' to pull from.) To publish the pulled and
merged changes from your branch to your partner, you can just run a git
push when you are on the master branch, as normal.
To create a branch based on the current state of your
repository, run git checkout -b branchname. The
semantics are similar to creating a branch from some point in the
past.
To create a tag, run git tag tagname. By
convention, tag names are capitalized, but this is not enforced by Git. A
tag name can be used anywhere a SHA1 ID would otherwise be used; to go back
to the point at which you first got your shell running in Project 3, then
you might run git checkout SHELL_RUNNING. The usual rules apply if
you don't create a branch there; namely, recording changes would be a bad
idea unless you proceed to create a branch.
Explanation of Concepts
The above involved some simplifications of the underlying concepts of Git
for the purposes of readability and for the purposes of understandability of
an introduction. The simplifications are not disastrous in terms of your
comprehension of what Git is doing behind your back, but you may find it
helpful to know how Git stores data to better work with Git. Tommi
Virtanen's excellent page Git for
Computer Scientists may provide some insight as well, for those who like
to talk about DAGs and are big fans of arrows pointing every which way.
Commits
The basic unit of a point in time stored in Git is a commit.
Each time we spoke of recording changes earlier, it would have been more
correct to say "creating a commit"; I used the words
"recording changes" to distinguish the operation from pushing and
publishing your changes to your partner. A commit, by its nature, is
comprised of a few pieces of information:
- A reference to a parent commit: Each commit has one or
more parent commits that refer to where the commit was derived from; you can
think of the parent commits as previous steps in time from this commit. The
very first commit you make (we called this the initial import earlier) has a
special referenced parent of all zeroes, which Git takes to mean that a
given commit is an initial import.
- A description: This is the text that you enter in the -m
option to git commit.
- One or more changed files: When files are changed, Git records
either a delta -- a binary patch against a file's version in the
parent commit -- or a full version of the file in association with the
commit. The file is technically not stored in the commit; instead, it is
stored as a blob, and the commit contains a reference to the blob.
Each blob can be referenced by many commits, but for most purposes, blobs
behave as if they are "owned" by a commit.
A commit is identified by the SHA1 hash of all of the information that it
contains. This hash is one common form of a refspec -- that is to
say, it is one common way to specify a single commit. Recall that when you
did a checkout to go back in time, you specified a SHA1 hash; in
that case, you were using the SHA1 hash as a refspec.
You may have inferred by now that commits exist in a sort of a tree. Each
commit may have one or more parent commits (a commit with more than one
parent is called a merge commit), and each commit may have zero or
more child commits. You can view the commit tree using gitk, as we
saw above; each commit was identified by a dot, and gitk drew lines
for us between each commit to explicitly show the branches of the tree.
This tree of cryptographic hashes gives Git a few very useful properties.
Git can assure you that nobody has changed the tree that you have based your
work on, because every element in the tree, down to the blobs, is identified
by its cryptographic hash (its SHA1). If a parent object has changed,
either by malicious intent or by disk corruption, Git simply will not be
able to find the parent object, instead of giving you the incorrect data.
This makes Git relatively immune to AFS corrupting its metadata.
Further, it makes it impossible to throw away history. Some version
control systems that we discussed in lecture have versions per file; so
deleting a file may delete its version history, or otherwise create a
discontinuity in how the file is linked in terms of time. Similarly,
renaming a file is not disastrous (although somewhat quirky); the only
changes happen locally in the commit object. If a delete required a
change of history, then the cryptographic hashes would change, and the
entire tree's parent hash would have to change. The cryptographic
hash system, then, makes Git resistant to inadvertant deletion of
history.
Branches, tags, and refspecs -- oh my!
In this section, until now, you've seen only one kind of refspec
-- a SHA1 hash of a commit. But in the quick-start above, you've worked
with more types of refspecs; when you checked out a branch, you used the
refspec that refers to the branch.
|