Version Control With A Distributed Vcs

I am participating in the study group - Git Crash Course and will be recording my learning and thoughts here.

This study group uses the online book GitRef as the base study material. It is spread across 5 days, with each day having distinct learning goals.

Day 1 - Introduction and creating your first Git repository


Before I begin with the actual work, I would like to note that this course assumes a basic knowledge of what version control is. It would have been nice to offer some links for people who are not familiar with the concept of version control. If anyone is interested, they can read this article on Better Explained. It is a very good introduction to basic version control concepts.

Git is a distributed version control system, which is different from centralized version control systems like svn, and cvs. Here is a nice article which explains distributed version control systems.

I watched this video of Linus Torvalds explaining Git. He explained why we should use a distributed version control system, but I am not sure if it added much value to my understanding of Git. Maybe it would be a better watch after completing the course, than before. However, I did have some take aways:

  1. Centralized systems need a concept of commit access (because we often do not want to give write access to everyone). This means those who do not have write access, cannot version their work.
  2. I have worked with CVS and merging was definitely a pain. In fact I recollect that people would be eager to get their changes into the repository first, so they would not have to merge. (In Git you can either merge, or you can simply commit changes to your repository and ask someone else to do the merge.)
  3. SVN made it easier to branch, but did not do anything on the merge front.
  4. CVS has performance issues, because you have to go over the network for every single commit… which does take a good amount of time.
  5. Centralized branching has one more problem. What do you call your branches? If you call it test, you may just have 5000 test branches in your CVS repo. Now everyone needs to use special naming conventions because branch names use just one namespace.
  6. CVS tracks single files… while Git tracks collections of content… not sure I totally understand this, but one implication is that you can have as many files as you want to in a Git repository…
  7. The Linux Kernel project has 4.5 avg merges a day… not something which would be possible if merging was difficult. In Git it is very easy to apply patches and to merge…

Creating a Git repository

With this introduction, let us start working with Git. I actually have Git installed, but it would have been nice to provide links to Git downloads for those who do not have Git installed on their system. Perhaps a section and an activity on Git installation would also have been useful.

In Git like in most version control systems, we can create a repository in two ways. If a remote repository for the project already exists, then we can simply synch up with that repository, or if a remote repository does not exist, then we can create one on our machine and push it to the remote location.

I created a little screencast which explains both ways of creating a Git repository.

Day 2 - Basic snapshotting

When we work with a version control system, the most basic thing we want to do is track files, and commit changes to the repository.

Let us consider a very basic scenario where we want to put a todo list under version control. The video below explains how we would make our first commit to Git.

Note: Git records the details of the person who made the commit. This information can be provided at the global level in the .gitconfig file in the users home directory. It can also be specified on a per project basis in the projects .git/config file. We can either edit the files by hand, or use the git config command to manipulate them.

In the above video, we added files to the index using the git add command. However, sometimes we want to add all changed files to the index. In such a case we can call the add command with the —all parameter, in which case it will add all changed files to the index.

Well, we really do not need 3 files for our todo list. This is good because it gives me an opportunity to show how to delete files.

Ok, so till now we have seen some basics of committing. However, in the real world, when we stage changes (add to the index), we want to see what we are staging. We want to see a diff of our working directory and the current index (we can add to the index as many times as we want, but the moment we commit everything that was added to the index gets flushed so to say).

Let's take a first glimpse at 'git diff'. In the video below, we first make a few changes by adding some items to the todo list. We want to commit the changes, for which we first need to add them to the index. This is a small project, and we know what changes we have made. However, in a large project, we should not add files to the index without first determining what will be added.

We use the git diff command to see the difference between the current working files and the index.

The astute watcher must have noticed that even though we had already added todo.txt to the index earlier, we again had to add it after adding a few lines more. This is because Git works with snapshots. When we added the file earlier we had put a snapshot of the file as it existed then, to the index. After changing the file, we have to put the current snapshot to the index. Hence we need to add it again.

Very often, we will add stuff to the index, multiple times. We will probably do a git diff every time to see what is being added to the index. But remember, adding to the index is not committing. Adding to the index is simply adding your changes to a staging area, so they can be committed as a whole. When we actually do the commit, we may have forgotten what all we have added to the index. It is a good idea to check what is it what will be committed if we run 'git commit'. To see this, we have to use a different flavor of diff. We use:

Till now, we have used git diff to see the difference between the working directory and index (shows stuff that will be added to the index if we run git add), and between the index and the last commit (shows stuff that will be committed if we run git commit). However, at times we might want to see the difference between the working directory and the last commit. We can see this by running:

git diff HEAD

As you may have already guessed, git diff shows us the difference between 2 things. What those 2 things are can be defined by giving the right arguments to the diff command.

If you do not want to see the entire diff, but just a summary of changes, run

git diff —stat

All the git diff command will also work if we give them a filename or a subdirectory name. This will serve to filter the files which the command will work with.

Activity Recommendation: There is a command which will do add and commit together. This will only work with files that have already been added to the index once. Find out which command this is, and understand how to use it.

Question for reflection: What will happen if you run git commit, then nothing is staged?

Very often we may add all files to the index, and then realize that we do not want certain files to be committed yet. We can remove files from the index with the command: git reset HEAD filename

In the video below, we first changes to two files. Then we realize that we want to commit changes from one file only. This video explains how to use git reset HEAD to remove the second file from the index.

The document which I am reading - GitRef mentions that the git rm command will remove a file from the index and the file system. If we do not want to remove a file from the file system, we can use git rm —cached.

Question: Is there any difference between 'git rm —cached filename' and 'git reset HEAD filename'

Day 3 - Branching and merging

Creating a branch

A branch can be created with the following command:

git branch branch_name

After creating the branch we still need to check it out before starting to work on it

git checkout branch_name

List all branches and verify which branch you are on

git branch

This following command is the shortest way to create and checkout a branch simultaneously

git checkout -b branch_name

After working on the branch, we can move back to master by

git checkout master

We can merge our work from the branch by executing

git merge branch_name

Merges are auto-magical in Git. A conflict is generated only if the same line is changed in multiple branches. In such cases, Git will put merge markers in the code, which are them upto us to resolve and remove. For complex merges, we can also use the Git mergetool to visually see what will be merged and perhaps to merge it also.

git mergetool

Question: Can we change the policy for what Git might consider a conflict and add surrounding lines also?
Question: How do we see the diff of what will be merged if we ran the 'git merge branch_name' command
Question: How do we see outdoing changes (commits that will be pushed to 'origin' if we did a git push origin master)?

Day 4 - Sharing and updating projects

Day 5 - Inspection and comparison