Introduction During my undergraduate courses there were a number of group projects. The students in a group would typically mail files back and forth, lose them, try to arrange who was going to work on which file, work on an old version and accidentally lose changes, and generally waste a lot of time and effort on managing their collaboration. This is because they didn't know about version control, also known as source code management (SCM). In this tutorial, I'll look at the concepts involved in version control, and illustrate them with two open-source systems: Subversion and git. There are two primary goals of a version control system:
Version control is vital to any large project, but even small, short-term projects can benefit. During the TopCoder Open, I used version control for marathon match submissions. This meant that every submission I made, plus other ideas that I tried and discarded, were always there if I needed them, and I could see exactly what changes I made between versions. Version control can also be used for more than just source code; it's just as good for documents, Web pages or anything else, although it works best when there are a lot of small, textual files rather than a big binary blob like a video. Repositories and working copiesTraditionally, the repository is a central store that holds all the files in a project, including every version ever committed and all the history. To avoid confusion, developers don't work directly on the central repository, but on a private copy called the working directory. This might just hold the latest copy of all the files, but in some cases it can be a complete copy of the entire repository with all the history. Developers will make changes in the working copy, hopefully test them, and then commit them to the repository. Once this set of changes (called a revision or a changeset) has been committed, other developers can update their working copies to receive those changes. What makes collaborative development possible is that the downloaded changes are merged with any changes that the developer has made (even within the same source file), instead of just overwriting them. Getting started with SubversionNow that the introduction is done, let's get our hands dirty with Subversion. We need somewhere to put a repository. For distributed development this will need to be on a web-accessible server (for example, Sourceforge provides Subversion hosting for all its projects, but for now let's just put it somewhere local. I have a directory called ~/svn in which I put my repositories. It's generally a good idea to use separate repositories for unrelated projects, so that they can be moved around, backed up, or even deleted independently. Let's create a new repository for a hello world project: hactar:~/svn$ svnadmin create hello hactar:~/svn$ svn mkdir file://$HOME/svn/hello/{trunk,branches,tags} -m "Create conventional directories" Committed revision 1. The first line is straightforward. The next line introduces quite a few new things. Firstly, there is the file:// syntax. This specifies an URL to a repository, in this case the one we just created (Subversion also supports HTTP-based and SSH-tunnelled protocols). We have created three subdirectories inside the repository called trunk, branches and tags. This is a convention specific to Subversion, and the reason for it will become clear later. For now, we will just do everything inside the trunk subdirectory. Finally, there is the "-m" switch. This provides an informational message which is attached to the revision, to tell anyone browsing the repository what it was for. At this stage we don't have a working directory; the mkdir was run directly on the repository, which is why it immediately triggered a commit. Let's check out a working directory from the repository. Since we're going to do everything in trunk, we only need to check out that sub-tree: hactar:~$svn checkout file://$HOME/svn/hello/trunk hello Checked out revision 1. The second parameter is the directory name to use locally. We now have an empty directory, so let's write and compile hello.cpp. An important convention (across all SCM systems) is that source code is placed under control of the SCM, but object files and executables are not. This is because they can be regenerated at any time and would just waste space in the repository, as well as causing extra noise for anyone trying to follow what really changed over time. So we need to tell Subversion that we want it to manage hello.cpp. It will also keep reminding us about hello until we tell it not to: hactar:~/hello$ svn add hello.cpp A hello.cpp hactar:~/hello$ svn propedit svn:ignore . Set new value for property 'svn:ignore' on '.' Different version control systems have different means of storing metadata: information that should be stored in the repository (in this case, a list of wildcards to ignore), but which aren't files. In Subversion the metadata takes the form of properties. When I issued the svn propedit command, Subversion opened an editor and I put hello in the temporary file it opened for me. The special svn:ignore property is now set on the directory itself. Let's see where that puts us: hactar:~/hello$ svn status M . A hello.cpp This is a handy way to see what you've changed, relative to what is saved in the repository. Subversion is telling us that we've made a property alteration on . and added a new file hello.cpp. If you had run this before setting the property, you would have seen another line ? hello To warn you that you hadn't put hello under version control. That all looks good, so let's commit it to the repository: hactar:~/hello$ svn commit -m "First version of hello world" Sending . Adding hello.cpp Transmitting file data . Committed revision 2. The file is now saved in the repository, and anyone else who checks out the repository will be able to see this version. Even if we overwrite it later, we can always retrieve this version. Now, let's edit the file and see what happens: hactar:~/hello$ svn status M hello.cpp hactar:~/hello$ svn diff Index: hello.cpp =================================================================== --- hello.cpp (revision 2) +++ hello.cpp (working copy) @@ -1,5 +1,7 @@ #include <iostream> +using namespace std; int main() { - std::cout << "Hello world\n"; + cout << "Hello world\n"; + return 0; } Here we've introduced a new sub-command: svn diff. By default this shows what you've changed relative to the base version, which is the version that you checked out of the repository (which may not be the same as the latest, or head version, if somebody has since made changes that you haven't downloaded). Many of the Subversion commands, including diff, can also take extra arguments to refer to specific versions in the repository, so that you can compare any two revisions. Now, what if somebody else was working on hello.cpp at the same time? Before we commit this change, I'm going to commit another change to this file from outside. In order to see this change here, we have to update our working copy from the repository. This is as simple as running hactar:~/hello$ svn update G hello.cpp Updated to revision 3. The G here is Subversion's way of telling us that it merged an external change with one of our own. As long as changes do not overlap, version control systems will automatically merge them together, and they will also assist in merging conflicting changes (but note that only changes that overlap are considered to be conflicting; high-level changes like changing the name of a variable might not conflict, but still lead to compilation failures). Working within files is relatively straightforward, but what about rearranging files, creating directories and so on? Subversion was created as a replacement for an older, messier system called CVS (Concurrent Version System) which did a very poor job of this; Subversion handles it quite well. You just have to tell Subversion to do things, rather than using the shell: hactar:~/hello$ svn mv hello.cpp helloworld.cpp A helloworld.cpp D hello.cpp Subversion handles a move as a copy and a deletion. It also handles copies in a special way: internally a copy takes zero space, because it uses a pointer back to the original, and it also remembers that the file was copied and allows the version history to be traced back through the original file. For example: hactar:~/hello$ svn commit -m "Rename hello.cpp to helloworld.cpp" Deleting hello.cpp Adding helloworld.cpp Committed revision 5. hactar:~/hello$ svn log helloworld.cpp ------------------------------------------------------------------------ r5 | bruce | 2007-06-12 18:36:45 +0200 (Tue, 12 Jun 2007) | 1 line Rename hello.cpp to helloworld.cpp ------------------------------------------------------------------------ r4 | bruce | 2007-06-12 18:34:02 +0200 (Tue, 12 Jun 2007) | 1 line Use standard namespace ------------------------------------------------------------------------ r3 | bruce | 2007-06-12 18:26:21 +0200 (Tue, 12 Jun 2007) | 1 line Add a comment ------------------------------------------------------------------------ r2 | bruce | 2007-06-12 18:17:27 +0200 (Tue, 12 Jun 2007) | 1 line First version of hello world ------------------------------------------------------------------------ The log sub-command browses the log messages that I've been attaching to each commit. As you can see here, it remembers the comments I made even when the file was called hello.cpp. Alternative interfaces I've been giving all the examples using the command line svn tool, because it's easy to copy-and-paste into this tutorial, and because I'm happiest on the command line and so this is how I usually interact with Subversion. However, there are third-party interfaces for those that think the shell belongs in the stone age. There are several standalone clients (for example, RapidSVN), a Windows shell extension, TortoiseSVN, and extensions for some IDEs, such as Subclipse for integration into Eclipse. Advanced featuresArmed with only the information you've seen so far, plus a Subversion manual (there is a very good online book (FIXME), you should already be able to improve your productivity in group projects, and give yourself piece of mind in personal projects that you can always go back and recover previous versions of code. However, for large-scale projects, there are even more things that a good version control system can do for you. For the basics, there isn't a lot of difference between the systems, but these advanced features are handled quite differently and these needs will often determine which system is right for your needs. Hooks Another type of hook is a pre-commit hook. This is run when someone attempts to commit, and is usually some form of verifier. It might provide fine-grained access control, validate checked-in web-pages, or ensure that the commit message contains a bug number. Subversion has a number of other types of hooks, and examples for all of them. The hooks live inside the directory structure of the repository. Let's add a simple hook that prevents code from being accidentally committed when it still has FIXME notes in it. Normally we would need some more powerful scripting to find all the source files, but to keep it simple we'll just hardcode it: hactar:~/hello$ cat ~/svn/hello/hooks/pre-commit #!/bin/sh if /usr/bin/svnlook cat -t "$2" "$1" "trunk/helloworld.cpp" | grep -q FIXME; then echo "FIXME's found in helloworld.cpp" 1>&2 exit 1 fi hactar:~/hello$ svn commit -m "Test the pre-commit hook" helloworld.cpp Sending helloworld.cpp Transmitting file data .svn: Commit failed (details follow): svn: 'pre-commit' hook failed with error output: FIXME's found in helloworld.cpp Tagging, branching and merging git handles the basics along similar lines to other version control systems, but there is one fundamental difference in philosophy: there is no distinction between a repository and a working directory. When you check out a working directory from a public repository, you are in fact copying the entire repository. As a result, you have all the advantages of version control, even if you are offline, or have only read access to the original repository. This makes it attractive for highly decentralised projects, where a subgroup can collaborate in a satellite repository and only push changes back to the main repository when they have stabilised. git is best-known as the version control system used to develop Linux. This is all very interesting, but what are tagging, branching, and merging? Tagging is the simplest: it allows one to assign a human-readable name to a specific revision in a repository. You may have noticed that Subversion allocates sequential numbers to revisions, but it's hard to remember that version 0.9.3 of the software corresponds to revision number 1346, and it becomes even worse with git, which identifies revisions by long hex strings. For example, let's suppose that in the past we made a tag v1.0.0 on the initial version. We've now made a lot of changes to the repository, and we're going to release 1.0.1. Sometime in the future, we may wish to see what exactly changed between the releases, for example to isolate a bug that was introduced in v1.0.1: hactar:~/hello-git$ git tag -a -m "Tag the v1.0.1 public release" v1.0.1 Now we make lots more changes hactar:~/hello-fit$ git diff v1.0.0..v1.0.1 diff --git a/helloworld.cpp b/helloworld.cpp index 2b20ba6..3b18807 100644 --- a/helloworld.cpp +++ b/helloworld.cpp @@ -1,6 +1,7 @@ /* This is a hello world program */ -/* FIXME: Fix the whitespace */ + #include <iostream> + using namespace std; int main() { Tagging is fairly straightforward, although git has some extra features (like signing) that I won't elaborate on. Branching is more complex. Let's suppose that you've released version 1.0 of your software, and you're busy rewriting large pieces of it to get them ready for version 2.0. However, you still need to fix bugs in 1.0, because 2.0 isn't going to be ready for several years. At this point, your code development diverges, or branches in two separate directions: a 2.0 branch which is under heavy development, and a 1.0 maintainence branch which only receives bug-fixes. Or perhaps one module in the 2.0 development is going to take a few weeks to write, during which time it will cause problems for other developers working on other pieces of 2.0. In this case, you might work on that module in a side branch until it is ready for inclusion in the main 2.0 branch. At that point you will need to merge those changes into the main branch. While developing the side branch, you may also need to incorporate changes from the main branch, or even bug-fixes being made in the 1.0 branch. That's a fairly extreme case. Let's try something simpler with our hello world program. We're going to use a side branch to convert it to use cstdio. hactar:~/hello-git$ git status # On branch master nothing to commit (working directory clean) hactar:~/hello-git$ git branch cstdio hactar:~/hello-git$ git checkout cstdio Switched to branch "cstdio" hactar:~/hello-git$ vi helloworld.cpp hactar:~/hello-git$ git commit -a -m "Converted to cstdio" Created commit 3d6a30bc37c106d598e6e1fd8a049367b1dac231 1 files changed, 2 insertions(+), 2 deletions(-) Creating a branch is that simple. Let's go back to the main branch (which is called master and make a change there, then merge it over to the side branch: hactar:~/hello-git$ git checkout master Switched to branch "master" hactar:~/hello-git$ vi helloworld.cpp hactar:~/hello-git$ git commit -a -m "Update the comment" Created commit 2c7afa153383d275f27ce3ac04375cc78789e3b0 1 files changed, 1 insertions(+), 1 deletions(-) hactar:~/hello-git$ git checkout cstdio Switched to branch "cstdio" hactar:~/hello-git$ git merge --no-commit master 100% (5/5) done Auto-merged helloworld.cpp Automatic merge went well; stopped before committing as requested bruce@hactar:~/hello-git$ git status # On branch cstdio # Changes to be committed: # (use "git reset HEAD <file>..." to unstage) # # modified: helloworld.cpp # hactar:~/hello-git$ git commit -a -m "Merge the comment change from master" Created commit 74bd3f2107fec7ab0d12d640e29bc901d75f1888 hactar:~/hello-git$ git merge --no-commit master Already up-to-date. The last command indicates that git has remembered which changes on master have already been merged, and it doesn't try to merge them again. This is in contrast to Subversion, which requires the user to keep track of which merges have been applied. On the other hand, git appears to have some trouble with renaming: I initially tried to do this example by renaming the file to helloworld.c in the side branch, but then the merge failed because it tried to apply the change to the .cpp file, which had been deleted. Conclusions If you've never used a version control system, go out and try one for your next project, or even your next marathon match. Once you're comfortable working with it, it will give you incredible piece of mind to know that you can rip out old code and not worry about losing it forever, and you will have learned a vital skill for the workplace.
|
|