Login With Github

Learn How Git Works in 20 Minutes

Git is the most popular version management tool and one of the necessary skills for programmers.

Even if you use it every day, you may not know how it works under the hood. Why can Git manage the version? What do the basic commands of git add and git commit do exactly?

In this post, I'll use an example to explain the running process of Git to help you understand how Git works.

1. Initialization

Let's create a project directory and go into the directory.

$ mkdir git-demo-project
$ cd git-demo-project

The first thing we should do is to use the git init command for initialization if we intend version management to the project.

$ git init

The only thing that the git init command can do is to create a .git subdirectory under the root directory of the project to hold the version information.

$ ls .git


The above command shows that there are some subdirectories inside .git.

2. Save Objects

Let's create a new empty file test.txt next.

$ touch test.txt

Then, add the file to the Git repository, which is to create a copy for the current contents of test.txt.

$ git hash-object -w test.txt


In the above code, the git hash-object command compresses the current contents of test.txt into a binary file and stores it into Git. The compressed binary, which is called a Git object, is stored in the .git/objects directory.

This command can be also used to calculate the SHA1 hash (a string of length 40) of the current contents as the file name of the object. Let's take a look at the following newly generated Git object file.

$ ls -R .git/objects


As you can see from the above code, there is a one more subdirectory under .git/objects, and the subdirectory name is the first 2 characters of the hash value. There is a file under this subdirectory, and the file name is the last 38 characters of the hash value.

Let's look at the contents of the file again.

$ cat .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391

The contents of the file output by the above code are some binary characters. You may ask, since test.txt is an empty file, why are there contents? This is because some metadata is still stored in the binary object.

If you want to see the original text contents of the file, you should use the git cat-file command.

$ git cat-file -p e69de29bb2d1d6434b8b29ae775ad8c2e48c5391

Because the original file is an empty file, you can't see anything through the above command. Now write something into test.txt.

$ echo 'hello world' > test.txt

The file content has been changed, so you need to save it again as a Git object.

$ git hash-object -w test.txt


As you can see from the code above, the hash value of test.txt has changed as the contents change. And the new file .git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad has been generated at the same time. You can see the contents of the file now.

$ git cat-file -p 3b18e512dba79e4c8300dd08aeb37f8e728b8dad

hello world

3. Update The Index

After the file is saved as a binary object, you need to tell Git which files have changed. Git will record all the changed files in an area called "index" (or stage). And then write the files together in the index into the official version history after waiting until the changes have come to an end.

The git update-index command is used to record a changed file in the index.

$ git update-index --add --cacheinfo 100644 \
3b18e512dba79e4c8300dd08aeb37f8e728b8dad test.txt

The above command writes the file name test.txt, binary object name (the hash value), and file permissions to the index.

The git ls-files command can display the current contents of the index.

$ git ls-files --stage

100644 3b18e512dba79e4c8300dd08aeb37f8e728b8dad 0   test.txt

The above code indicates that there is only one file test.txt in the index, and shows the file's binary object name and the permissions to the file. You can read out the contents of the file in the .git/objects subdirectory after knowing the binary object name.

The git status command can be used to produce more readable results.

$ git status

Changes to submit:
    The new file:   test.txt

The above code indicates that there is only one new file test.txt in the index which is waiting for being written into the history.

4. The git add Command

It will become very troublesome if you operate the above two steps (save objects and update the index) on each file. So Git provides the git add command to simplify operations.

$ git add --all

The above command is equivalent to performing the previous two steps on all the changed files which are in the current project.

5. Commit

The index retains the information for the changed file. And the information will be written into the history when the modification is completed, which is equivalent to generating a snapshot for the current project.

The history of a project consists of snapshots at different points in time. Git can restore the project to any one of the snapshots. There is special term in Git for "snapshot", which is called "commit". So generating a snapshot is also called completing a commit.

All of the following references to "snapshots" refer to commits.

6. Complete a commit

First, let's set the username and email address. When you save the snapshot, it will record who committed it.

$ git config user.name "username" 
$ git config user.email "Email address"

Next, save the current directory structure. Earlier in the article, we know that when saving an object, it just saves a single file and doesn't record the directory relationship between the files.

The git write-tree command is used to generate a Git object from the current directory structure.

$ git write-tree


In the above code, the directory structure is saved as a binary object, and the name of the object is the hash value. It's also stored in the .git/objects directory.

Let's take a look at the contents of this file.

$ git cat-file -p c3b8bb102afeca86037d5b5dd89ceeb0090eae9d

100644 blob 3b18e512dba79e4c8300dd08aeb37f8e728b8dad    test.txt

As you can see, there is only one test.txt file in the current directory.

The so-called snapshot is to save the current directory structure, as well as the binary object corresponding to each file. In the previous operation, the directory structure has been saved, and now you need to write the directory structure along with some metadata into the version history.

The git commit-tree command is used to write the directory tree object into the version history.

$ echo "first commit" | git commit-tree c3b8bb102afeca86037d5b5dd89ceeb0090eae9d


In the above code, you need to have a commit description when committing, and echo "first commit" is used to give a commit description. The git commit-tree command will generate a Git object from the metadata along with the directory tree. Now, take a look at the contents of the object.

$ git cat-file -p c9053865e9dff393fd2f7a92a18f9bd7f2caa7fa

tree c3b8bb102afeca86037d5b5dd89ceeb0090eae9d
author jam  1538889134 +0800
committer jam  1538889134 +0800

first commit

In the above code, the first line of the output is the directory tree object corresponding to this snapshot, the second and third lines are the information of the author and committer, and at the end of the contents is the commit description.

The git log command can also be used to view the information of a certain snapshot.

$ git log --stat c9053865e9dff393fd2f7a92a18f9bd7f2caa7fa

commit c9053865e9dff393fd2f7a92a18f9bd7f2caa7fa
Author: jam 
Date:   Sun Oct 7 13:12:14 2018 +0800

    first commit

 test.txt | 1 +
 1 file changed, 1 insertion(+)

7. The git commit Command

Git provides a git commit command to simplify the commit operation. After saving into the index, as long as you give a git commit command, it will commit the directory structure and description at the same time to generate a snapshot.

$ git commit -m "first commit"

In addition, there are another two commands that are also useful.

The git checkout command is used to switch to a certain snapshot.

$ git checkout c9053865e9dff393fd2f7a92a18f9bd7f2caa7fa

The git show command is used to show all the code changes for a certain snapshot.

$ git show c9053865e9dff393fd2f7a92a18f9bd7f2caa7fa

8. Branch

However, if you use the git log command to view the entire version history, you will not see the newly generated snapshot.

$ git log

There is no output for the above command. Why? Hasn't the snapshot been written into the history?

The fact is that the git log command only shows the changes of the current branch. so although we have already committed the snapshot, it hasn't yet recorded which branch the snapshot belongs to.

A branch is a pointer to a snapshot, and the name of the branch is the name of the pointer. The hash values ​​are unrecognizable, but branches allow users to alias snapshots. What's more, the branch will be updated automatically, and if the current branch has a new snapshot, the pointer will point to it automatically. For example, the master branch has a pointer called master that points to the current snapshot of the master branch.

Users can create a new pointer to any snapshot. For example, if you want to create a new fix-typo branch, you just need to create a pointer called fix-typo which points to a snapshot. Therefore, it's particularly easy to create a new branch in Git and the cost is extremely low.

Git has a special pointer HEAD which always points to the most recent snapshot of the current branch. In addition, Git also provides a shorthand method. For example, HEAD^ points to the previous snapshot of HEAD (the parent node), and HEAD~6 points to the sixth snapshot before HEAD.

Each branch pointer is a text file and stored in the .git/refs/heads/ directory. The contents of the file are the binary object names (hash values) of the snapshot it points to.

9. Update The Branch

The following will demonstrate how to update the branch. First, modify the test.txt.

$ echo "hello world again" > test.txt

Then save the binary object.

$ git hash-object -w test.txt


Next, write the object into the index and save the directory structure.

$ git update-index test.txt
$ git write-tree


Finally, commit the directory structure, and it will generate a snapshot.

$ echo "second commit" | git commit-tree 1552fd52bc14497c11313aa91547255c95728f37 -p c9053865e9dff393fd2f7a92a18f9bd7f2caa7fa


In the above code, the -p parameter of the git commit-tree command is used to specify the parent node, which is the snapshot on which this snapshot is based.

Let's write the hash of this snapshot to the .git/refs/heads/master file, and it will make the master pointer to point to this snapshot.

$ echo 785f188674ef3c6ddc5b516307884e1d551f53ca > .git/refs/heads/master

Now, you can see two snapshots by the git log command.

$ git log

commit 785f188674ef3c6ddc5b516307884e1d551f53ca (HEAD -> master)
Author: jam 
Date:   Sun Oct 7 13:38:00 2018 +0800

    second commit

commit c9053865e9dff393fd2f7a92a18f9bd7f2caa7fa
Author: jam 
Date:   Sun Oct 7 13:12:14 2018 +0800

    first commit

The running process of the git log command is like this:

  1. Find the branch corresponding to the HEAD pointer. It's master in this case.
  2. Find the snapshot pointed to by the master pointer. It's 785f188674ef3c6ddc5b516307884e1d551f53ca in this case.
  3. Find the parent node (the previous snapshot) c9053865e9dff393fd2f7a92a18f9bd7f2caa7fa.
  4. and so on. Finally it will show all the snapshots of the current branch.

By the way, as mentioned earlier, the branch pointer is dynamic, as the following three commands will overwrite the branch pointer automatically.

  • Git commit: The pointer of the current branch will move to the newly created snapshot.
  • Git pull: The pointer will point to the newly created snapshot after the current branch is merged with the remote branch.
  • Git reset [commit_sha]: The current branch pointer will be reset to the specified snapshot.

0 Comment