Git from the Ground Up
If you work as a software engineer in a team, you’re probably familiar with a piece of software called Git. It’s probably a part of your development workflow. You type a few magical incantations and somehow your code is versioned and distributed to your colleagues. Occasionally, something goes wrong and you type a few more commands and if you’re lucky, things sort themselves out.
My goal in this blog post is to help you get Git (try saying that five times fast), to understand what is going on under the hood as you execute a variety of common commands in your Git workflow. But before we go under the hood, we have to travel to the past and learn a little bit about the history of version control systems. This history isn't going to cover every tool, just the ones that I think are indicative of the technological changes in version control systems.
We’ll start by traveling all the way to Bell Labs in 1972 and the dawn of one of the first version control systems, Source Code Control System. The architecture of the tool consisted of three different parts: a delta table, control and tracking flags, and a set of control records. The delta table, as you might expect, is a table that stores each of the changes made to a file. Control and tracking flags were used to set permissions and control releases. And control records were used to keep track of when lines of code were deleted or inserted into a file by storing those insertions and deletions into special records. In this way, this version control system pioneered some of the early principles that we'll come to see in later version control systems.
SCCS, as it was known, was popular until the year 1982 when its successor, Revision Control System came to prominence. The RCS system was not distributed at all, so it wasn't possible to store copies of the code that you were versioning on a central server or another machine. Multiple people couldn't edit the same file at the same time, so merge conflicts weren't really a thing that happened. It had one simple job: store different versions of code.
Next, we'll travel to 1990, and the release of the Concurrent Versions System, commonly known as CVS. Unlike RCS, CVS employed a client-server model. A copy of the repository was stored on a central server and several clients could make copies of it. At this point in time, it was possible for multiple people to be editing the same file, so it was possible for two individuals to make conflicting changes. To work around this issue, CVS required that you fetch and merge the latest changes from the server into your code before making any commits.
Finally, we’ll travel to a little over a decade ago to the year 2005, where a tool that we’re all familiar with came to fruition: Git. Git was different from its predecessors in a lot of ways. For one, it didn't require you to be on the latest version of a file before making changes. You could make a change to a file and then pull in any updates that happened after you made the change. Furthermore, Git was distributed in a decentralized nature, Git repositories could exist in a first-class nature on developer's machines, GitHub's servers, your company's CI build, and so on.
Why did git get so popular?
It was for a couple of reasons, and everyone has a perspective on what those reasons were. Tools like GitHub certainly made Git a little more popular by providing a centralized space for developers to discover and share code. Git also had a merge strategy that was a lot easier to navigate than its predecessors. And finally, after development, Git was used as the version control system for the Linux kernel codebase, giving it an immediate large adopter.
People's opinions on the reasons for Git's rapid adoption differ but in any case, here we are. Most software teams use Git to version and collaborate on their codebase.
So how do most people use Git? Well, you've probably run a command like this to get the latest copy of the codebase from a remote server onto your machine. What did this command just do?
$ git clone https://github.com/nteract/nteract.git Cloning into 'nteract'... remote: Enumerating objects: 310, done. remote: Counting objects: 100% (310/310), done. remote: Compressing objects: 100% (98/98), done. remote: Total 49241 (delta 218), reused 255 (delta 208), pack-reused 48931 Receiving objects: 100% (49241/49241), 16.00 MiB | 3.21 MiB/s, done. Resolving deltas: 100% (34612/34612), done.
There's all this business about objects and deltas and compressing and enumerating and resolving and oh my goodness! There's quite a lot going on in such a few lines of standard output.
To dive a little bit more into this, we're going to need to poke into a directory that exists on every Git-versioned project: the
.git directory. Here's what its contents look like in our newly cloned directory.
$ ls .git HEAD branches config description hooks index info logs objects packed-refs refs
objects directory in there. Let's poke around it and see if we can get a sense of what Git might've been enumerating and counting and compressing when we cloned our directory.
$ ls .git/objects/ info pack
Hm. There are only two directories in there:
pack. Let's dive into them and see what we can find out!
$ ls .git/objects/info $ ls .git/objects/pack pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack
OK! Now we're getting somewhere a little bit more interesting. The
info directory is empty but the
pack directory contains two files. One with a
.idx extension and another with a
.pack extension. Now, we could try to
cat these files to look into their contents, but they're binary files so looking at that output won't be much help.
$ file .git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx .git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx: Git pack index, version 2 $ file .git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack .git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack: Git pack, version 2, 49241 objects
You can see from the output above that the
.pack file contains about 50,000 “objects.” Thankfully, I spent some time looking into this and will tell you right now what these objects are. Hurrah for sharing!
Objects, in the Git context, consist of a type, a size, and some contents. There are four types of objects.
- Blobs: An object that is used to store file data.
- Trees: An that object that is used to reference multiple blobs or other tree objects.
- Commits: An object that contains a reference to a particular tree, the timestamp on which a commit was made, the creator of the commit, and other metadata.
- Tags: Annotated tags are stored as objects in the git. Similar to commits, they contain a timestamp, an author, and an associated message.
So what just happened when we cloned? Well, Git pulled all of the objects associated with our project: including commits, files diffs, and tags. That's what those 49,241objects that were pulled in from the server on GitHub were. Once they were pulled in, Git compresses them into a single packfile. How big is the compressed file?
$ du -sh .git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack 17M .git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack
It's about 17 megabytes. Those 17 megabytes contain every commit, the files associated with that commit, and every tag on the project. To get a sense of all of the objects that have been compressed into a Pack file we can run
git unpack-objects -n.
Now that you've got a copy of the code base on your local machine, you'll likely make a new branch on which you'll start to make changes.
$ git checkout -b safia/my-new-branch Switched to a new branch 'safia/my-new-branch'
Standard output says that we switch to a new branch, but what actually happened under the hood? To answer this, we will need to look inside the
.git directory located inside every git-versioned repository.
$ ls .git HEAD branches config description hooks index info logs objects packed-refs refs
Let's take a look at the contents of that HEAD file. If you're familiar with Git, you've probably executed a command like:
git push origin HEAD to push your updates to a centralized server. What are we actually referencing there?
$ cat ./git/HEAD ref: refs/heads/safia/my-new-branch
Let's see what's inside the file that the ref is pointing to here.
$ cat .git/refs/heads/safia/my-new-branch e35102c15bd63698b6dcb721e161c4d630e2d6cc
Oh! We've got a hash in here. What is this hash referencing? There's a useful command:
git cat-file that allows us to print out details about the object that is referenced by a SHA-1 hash.
$ git cat-file -p e35102c15bd63698b6dcb721e161c4d630e2d6cc tree 6cf0113d017fc604de9481758fc6a578a4067dd2 parent aad3eac9629ee28c4d6030e1091e8089dee66cd9 parent b7d058a50b14ea9b45c128da2af2e260b0ef1ea1 author Kyle Kelley <email@example.com> 1537647280 -0400 committer GitHub <firstname.lastname@example.org> 1537647280 -0400 Merge pull request #3341 from nteract/renovate/next-7.x Update dependency next to v7.0.0
Cool! So it turns out that that hash is a reference to a tree object. In this case, the tree object is a reference to a set of changes under a commit.
So, we're going to make a change, stage it, and commit it. You might have heard those words used in the context of Git before. What do they mean?
$ tree .git .git ├── HEAD ├── branches ├── config ├── description ├── hooks │ ├── applypatch-msg.sample │ ├── commit-msg.sample │ ├── post-update.sample │ ├── pre-applypatch.sample │ ├── pre-commit.sample │ ├── pre-push.sample │ ├── pre-rebase.sample │ ├── pre-receive.sample │ ├── prepare-commit-msg.sample │ └── update.sample ├── index ├── info │ └── exclude ├── logs │ ├── HEAD │ └── refs │ ├── heads │ │ ├── master │ │ └── safia │ │ └── my-new-branch │ └── remotes │ └── origin │ └── HEAD ├── objects │ ├── info │ └── pack │ ├── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx │ └── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack ├── packed-refs └── refs ├── heads │ ├── master │ └── safia │ └── my-new-branch ├── remotes │ └── origin │ └── HEAD └── tags 18 directories, 25 files
Now, let's run the
git add command and observe what changed about our
$ git add README.md $ tree .git .git ├── HEAD ├── branches ├── config ├── description ├── hooks │ ├── applypatch-msg.sample │ ├── commit-msg.sample │ ├── post-update.sample │ ├── pre-applypatch.sample │ ├── pre-commit.sample │ ├── pre-push.sample │ ├── pre-rebase.sample │ ├── pre-receive.sample │ ├── prepare-commit-msg.sample │ └── update.sample ├── index ├── info │ └── exclude ├── logs │ ├── HEAD │ └── refs │ ├── heads │ │ ├── master │ │ └── safia │ │ └── my-new-branch │ └── remotes │ └── origin │ └── HEAD ├── objects │ ├── dc │ │ └── ce97ef3d92d70d1385952ba7a9988908f3f23e │ ├── info │ └── pack │ ├── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx │ └── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack ├── packed-refs └── refs ├── heads │ ├── master │ └── safia │ └── my-new-branch ├── remotes │ └── origin │ └── HEAD └── tags 19 directories, 26 files
Oh! Look at that! There's something new in our
.git/objects directory. Let's take a look inside and see if we can find out more.
$ git cat-file -p dcce97ef3d92d70d1385952ba7a9988908f3f23e # nteract <img src="https://cloud.githubusercontent.com/assets/836375/15271096/98e4c102-19fe-11e6-999a-a74ffe6e2000.gif" alt="nteract animated logo" height="80px" align="right" /> [!(https://img.shields.io/badge/version-latest-blue.svg)](https://github.com/nteract/nteract) [!(https://img.shields.io/badge/version-stable-blue.svg)](https://github.com/nteract/nteract/releases) [![codecov.io](https://codecov.io/github/nteract/nteract/coverage.svg?branch=master)](https://codecov.io/github/nteract/nteract?branch=master)[![slack in](https://slack.nteract.io/badge.svg)](https://slack.nteract.io) [![lerna](https://img.shields.io/badge/maintained%20with-lerna-cc00ff.svg)](https://lernajs.io/) [![Circle CI Status Shield](https://circleci.com/gh/nteract/nteract/tree/master.svg?style=shield)](https://circleci.com/gh/nteract/nteract/tree/master) || [**Basics**](<a href="https://captainsafia.writeas.com/tag:basics" class="hashtag"><span>#</span><span class="p-category">basics</span></a>) • [**Users**](<a href="https://captainsafia.writeas.com/tag:users" class="hashtag"><span>#</span><span class="p-category">users</span></a>) || [**Contributors**](<a href="https://captainsafia.writeas.com/tag:contributors" class="hashtag"><span>#</span><span class="p-category">contributors</span></a>) • [**Development**](<a href="https://captainsafia.writeas.com/tag:development" class="hashtag"><span>#</span><span class="p-category">development</span></a>) • [**Maintainers**](<a href="https://captainsafia.writeas.com/tag:maintainers" class="hashtag"><span>#</span><span class="p-category">maintainers</span></a>) || [**Sponsors**](<a href="https://captainsafia.writeas.com/tag:sponsors" class="hashtag"><span>#</span><span class="p-category">sponsors</span></a>) • [**Made possible by**](<a href="https://captainsafia.writeas.com/tag:made" class="hashtag"><span>#</span><span class="p-category">made</span></a>-possible-by) || ## Basics test **nteract** is first and foremost a dynamic tool to give you flexibility when writing code, [exploring data](https://github.com/nteract/nteract/tree/master/packages/transform-dataresource), and authoring text to share insights about the data.
I've truncated it above, but the new object that has been stored in the objects directory is a blob object that contains the entirety of the contents of our README.md file.
You might've noticed in the directory structure above that the first two characters in the SHA-1 hash are used as the name for the directory that the blob object is stored in. This seems like a strange thing to do but there's a couple of reasons that this is done.
- There's an operating system defined limit on the number of files that can be stored in a single directory. For example, if you're using a macOS system, you can only have 2.1 billion items within a single folder. This seems like plenty of space, but older operating systems have greater restrictions on the number of files you can store.
- Operating systems generally execute a linear scan on the file system when looking for files. This search is done on a per-directory basis, by breaking up the thousands of objects that exist in an average git repository into several folders, git reduces the bottleneck associated with searching for these files and loading them into memory.
OK! Now that we've staged our change, we actually need to commit it. Similar to last time, we'll run
tree before and after the operation to figure out what changed in the
.git directory that we can track down. I'll avoid pasting the full output of
tree here and just show you the difference.
$ git commit -m "Update README" $ tree .git ├── objects │ ├── 26 │ │ └── 64cf0af38ae9428fca337c12868c4f5e41ca01 │ ├── dc │ │ └── ce97ef3d92d70d1385952ba7a9988908f3f23e │ ├── ec │ │ └── 2122429f0a087825c4ae41a2d16777066e3f0a │ ├── info │ └── pack │ ├── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx │ └── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack
So it looks like two objects were added to our .git directory. Let's see if we can find out what they were.
$ git cat-file -p 2664cf0af38ae9428fca337c12868c4f5e41ca01 040000 tree 70616f745fe0f17582b0b608c92ae6b16ad86b5a .circleci 100644 blob b3e97e8818845bcdf12b89f00449ef552ac58de6 .eslintignore 100644 blob c730a944e1d1c8079457c60abdce5bf053004809 .eslintrc 100644 blob 29e6b3480db284b7bddb3b0e0eb99929b2f63cdc .flowconfig 100644 blob a502c988b3992e35630bd7bf30ebb5e0d1f12248 .gitattributes 040000 tree ce862566e7c21f3b3c7f8c7ed7a3c3c7013310e2 .github 100644 blob 4e0a33a5adc65416dba8b819da7558ce7ddee936 .gitignore 100644 blob 296e8837f9c29d21807034139c1d2da795d0fdf5 .npmignore 100644 blob 5c8aef89b1892faed5a2318c04fd326b2d4f1ae8 .npmrc 100644 blob ec6d3cdd7f5b083403ae78073054bb0854c0227f .prettierignore 100644 blob 0967ef424bce6791893e9a57bb952f80fd536e93 .prettierrc 100644 blob 3fdd5f091b70b845c61d1b6e301ea39c7fb16135 .travis.yml 100644 blob d3dda9eed93c0ee69cd18ef400da77d8ec64213a CHANGELOG.md 100644 blob 77a476cb5498b9e128ed0d988ba08481de5f640a CODE_OF_CONDUCT.md 100644 blob 8829fc28b6e50207f9cbef985a4084d11dce57db CONTRIBUTING.md 100644 blob 79d2e86e15b4097311ba62f264a5d099b4bc5a21 LICENSE 100644 blob dcce97ef3d92d70d1385952ba7a9988908f3f23e README.md 100644 blob 723f12e5c6df15fbec1efc615835e8b38da74faf RELEASING.md 100644 blob df688362897275b533dd1c22b22b0c8615834017 USER_GUIDE.md 040000 tree f5f81ee9258d45667c0f94438ef5bdfeadc66f52 applications 100644 blob 2df851d31bcc94e9d1668232e35a758707df60d8 appveyor.yml 100644 blob 98c1e1d5032a1a8055a6f7d20168aa33661c1d47 babel.config.js 100644 blob 0e822958a7cc9efc77b53937d4a220ff939f5942 codecov.yml 040000 tree 2d95cfcfd2fb396b8ec44382502701d6a3d4c406 doc 040000 tree e85bea502dbd0e559b2eb3fea8508f5216b58975 flow-typed 040000 tree 21015beb05071ca6967489d75b3a23ff049e6b61 initiatives 100644 blob 89c5be33c4965067f8cbe5c1f29b206c689b94cc lerna.json 100644 blob c168aa74cd657dd3e4daec5494247ad11c299bae nbformat.v4.json 100644 blob 39364fcb1e44e212d3e15ad66914778c5fa5ed96 package.json 040000 tree 48c30c6468f73a808e9b44c69d1d65d42cd49b83 packages 100644 blob 23389f9333a333fac308305c428c004320c5eb1a renovate.json 040000 tree c2c1a5e20498a97af89d0045471183db31855904 scripts 100644 blob 6a1dbab2959bfd0981c3670b1018acde0a989043 styleguide.config.js 040000 tree 10dbc180a024fce8fdefb75e78ede8b833119af0 styleguide 100644 blob f90d9d5981d286dbd58411dc21b822f44451de7d yarn.lock
$ $ git cat-file -p ec2122429f0a087825c4ae41a2d16777066e3f0a tree 2664cf0af38ae9428fca337c12868c4f5e41ca01 parent e35102c15bd63698b6dcb721e161c4d630e2d6cc author Safia Abdalla <email@example.com> 1537835421 -0400 committer Safia Abdalla <firstname.lastname@example.org> 1537835421 -0400 Update README
Interesting! So the second object is a commit object. As mentioned earlier, it contains the commit message, the committer, and the timestamp. It also includes a reference to the tree, the one that we see in the first object. So what just happened here?
- We staged our change to the README file. This created a blob object for that instance of the file.
- We created a commit. This commit referenced our latest README blob object and the most recent blob and tree objects available for other items in our repository.
Finally, we're going to push our change up to our branch.
$ git push origin HEAD Counting objects: 100, done. Delta compression using up to 4 threads. Compressing objects: 100% (69/69), done. Writing objects: 100% (100/100), 15.36 KiB | 1.02 MiB/s, done. Total 100 (delta 61), reused 62 (delta 28) remote: Resolving deltas: 100% (61/61), completed with 20 local objects. To https://github.com/nteract/nteract.git * [new branch] HEAD -> safia/my-new-branch
What did this just do? Well, a couple of things happened. Our Git client sends any new objects created between the time of the last push to a remote git instance. But there's also this business about compressing and counting all over again. What's going on? Well, it turns out that here, Git has compressed some of the objects that were loose in our directory into a packfile, similar to the one that we saw earlier. How does this compression work? git sorts the objects by type, then name, then size then computes just the deltas between adjacent versions. Sorting by size exploits a principle known as Linus's law, which states that file sizes grow with time. In this sense, by sorting the objects by size, you're implicitly sorting them by the order in which they were most recently modified.
Once the objects are compressed, the index file that we saw earlier serves as a table of contents, pointing from an object hash to its location within the compressed packfile.
Why does Git generate a packfile when we did this push? It's likely that at that time, there are multiple unneeded loose objects in your objects directory. For example, if you add and commit a file then add and commit it again, you have two blob objects of that same file with a potentially small difference. Before pushing those objects to the server, it helps to compress them into a packfile.
Now, you might find yourself in a situation where changes have been made to a file that you're editing by another person. In this case, you need to bring in those changes into your file. There are two ways to do this in git: through a rebase or a merge.
So what is the difference between a rebase and a merge? Let's start by talking about what how a merge works.
Let's try to merge the change that we made in
safia/my-new-branch onto our master branch.
git merge safia/my-new-branch Updating e35102c1..ec212242 Fast-forward README.md | 2 ++ 1 file changed, 2 insertions(+)
Since there were no changes made in master that did not exist in our feature branch, our merge simply moved the pointer at the HEAD of the master branch to match the pointer at the HEAD of our feature branch.
But what happens if there are changes in our master branch that don't exist in our feature branch? Let's try to simulate this by creating a new branch, staging and committing a change on master, then staging and committing a change to our new branch and attempting to merge.
$ git checkout -b safia/branch-that-will-be-behind safia/branch-that-will-be-behind $ git checkout master Switched to branch 'master' $ echo "Test" > test.txt $ git add test.txt && git commit -m "Added test file" [master 00aa5891] Added test file 1 file changed, 1 insertion(+) create mode 100644 test.txt $ git checkout safia/branch-that-will-be-behind Switched to branch 'safia/branch-that-will-be-behind' $ echo "Test 2" > test-2.txt $ git add test-2.txt && git commit -m "Added other test file" [safia/branch-that-will-be-behind e1bceaba] Added other test file 1 file changed, 1 insertion(+) create mode 100644 test-2.txt $ git checkout master Switched to branch 'master' $ git merge safia/branch-that-will-be-behind Merge made by the 'recursive' strategy. test-2.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 test-2.txt
There's a lot going on here, but the most important thing to pay attention to is the output of the last command. You'll notice that unlike last time,
git used the recursive strategy to bring our changes from
safia/branch-that-will-be-behind into master. What does this mean? Well, for one, it means that a merge commit was created for our change.
$ git log commit 5b347a036d5b27d5e11c13b0b88a34565db8dcc3 (HEAD -> master) Merge: 00aa5891 e1bceaba Author: Safia Abdalla <email@example.com> Date: Mon Sep 24 21:53:59 2018 -0400 Merge branch 'safia/branch-that-will-be-behind'
If we print out the contents of the commit object associated with this commit, we will notice something exciting.
$ git cat-file -p 5b347a036d5b27d5e11c13b0b88a34565db8dcc3 tree d94b7f867953d7a3335bcc23732ded883e0a24e7 parent 00aa5891eef1a21d384ca4eabca735f79c2c3bdf parent e1bceaba273dec2db897053053533f72f7d6447e author Safia Abdalla <firstname.lastname@example.org> 1537840439 -0400 committer Safia Abdalla <email@example.com> 1537840439 -0400 Merge branch 'safia/branch-that-will-be-behind'
Interesting! This merge commit has two parent commits. What are they?
$ git cat-file -p 00aa5891eef1a21d384ca4eabca735f79c2c3bdf tree d738ac3037823bf69abba275977592b87e430046 parent ec2122429f0a087825c4ae41a2d16777066e3f0a author Safia Abdalla <firstname.lastname@example.org> 1537840379 -0400 committer Safia Abdalla <email@example.com> 1537840379 -0400 Added test file $ git cat-file -p e1bceaba273dec2db897053053533f72f7d6447e tree 58f0c5c85c6c93e8f8943c7212ec3c6f0ba9fa43 parent ec2122429f0a087825c4ae41a2d16777066e3f0a author Safia Abdalla <firstname.lastname@example.org> 1537840430 -0400 committer Safia Abdalla <email@example.com> 1537840430 -0400 Added other test file
Great! Now let's cover how rebasing works. We'll start by setting up the same structure that we set up in our merge example.
$ git checkout -b safia/branch-to-rebase-with Switched to a new branch 'safia/branch-to-rebase-with' $ git checkout master Switched to branch 'master' $ echo "Another test" > test-3.txt $ git add test-3.txt && git commit -m "Added test-3.txt" [master 12f1569b] Added test-3.txt 1 file changed, 1 insertion(+) create mode 100644 test-3.txt $ git checkout safia/branch-to-rebase-with Switched to branch 'safia/branch-to-rebase-with' $ echo "Another test in another branch" > test-4.txt $ git add test-4.txt && git commit -m "Added test-4.txt" [safia/branch-to-rebase-with f864bfbc] Added test-4.txt 1 file changed, 1 insertion(+) create mode 100644 test-4.txt $ git checkout master Switched to branch 'master' $ git rebase safia/branch-to-rebase-with First, rewinding head to replay your work on top of it... Applying: Added test-3.txt
If we take a look at the first two commits in our master branch, we'll see the following.
$ git log commit badeaff4c25c768d2b296f65952ff9ae14413577 (HEAD -> master) Author: Safia Abdalla <firstname.lastname@example.org> Date: Mon Sep 24 22:08:04 2018 -0400 Added test-3.txt commit f864bfbc841f5056fac081da39b97fd9944863cf (safia/branch-to-rebase-with) Author: Safia Abdalla <email@example.com> Date: Mon Sep 24 22:08:18 2018 -0400 Added test-4.txt
Using the handy-dandy
cat-file command, we can see that the commits have a parent-child relationship.
$ git cat-file -p f864bfbc841f5056fac081da39b97fd9944863cf tree a9012c9d01aadc083ac095885c4340954f49b0bf parent 5b347a036d5b27d5e11c13b0b88a34565db8dcc3 author Safia Abdalla <firstname.lastname@example.org> 1537841298 -0400 committer Safia Abdalla <email@example.com> 1537841298 -0400 Added test-4.txt $ git cat-file -p badeaff4c25c768d2b296f65952ff9ae14413577 tree a6c43fe433c254b30fdc46315107025d19bd1951 parent f864bfbc84156fac081da39b97fd9944863cf author Safia Abdalla <firstname.lastname@example.org> 1537841284 -0400 committer Safia Abdalla <email@example.com> 1537841304 -0400 Added test-3.txt
So, essentially, when we rebased, we created an interleaved the commits together to ensure that they had the proper linear hierarchy. When we merged, we formed a single commit that joined two parent commits into a new linear hierarchy. And that summarizes the differences between a rebase and a merge when we merge we keep an explicit reference to the two branches that we merged from at the expense of losing our linear history. On the other hand, rebasing allows us to maintain a linear history but lose the explicit references to our two branches.
Neat-o! That's a lot of new information about Git. Let's do a quick recap of what we learned in this blog post.
- Git represents key information as objects stored in the file system
- Git compresses loose objects into packfiles to increase space efficiency
- Rebases and merges differ in whether they give preference to maintaining a linear history or explicit branches