Bug Repellent

Git from the Ground Up

October 5, 2018

If you work as a software engineer in a team, you’re probably familiar with a piece of software called Git. It’s probably a part of your development workflow. You type a few magical incantations and somehow your code is versioned and distributed to your colleagues. Occasionally, something goes wrong and you type a few more commands and if you’re lucky, things sort themselves out.

My goal in this blog post is to help you get Git (try saying that five times fast), to understand what is going on under the hood as you execute a variety of common commands in your Git workflow. But before we go under the hood, we have to travel to the past and learn a little bit about the history of version control systems. This history isn't going to cover every tool, just the ones that I think are indicative of the technological changes in version control systems.

We’ll start by traveling all the way to Bell Labs in 1972 and the dawn of one of the first version control systems, Source Code Control System. The architecture of the tool consisted of three different parts: a delta table, control and tracking flags, and a set of control records. The delta table, as you might expect, is a table that stores each of the changes made to a file. Control and tracking flags were used to set permissions and control releases. And control records were used to keep track of when lines of code were deleted or inserted into a file by storing those insertions and deletions into special records. In this way, this version control system pioneered some of the early principles that we'll come to see in later version control systems.

SCCS, as it was known, was popular until the year 1982 when its successor, Revision Control System came to prominence. The RCS system was not distributed at all, so it wasn't possible to store copies of the code that you were versioning on a central server or another machine. Multiple people couldn't edit the same file at the same time, so merge conflicts weren't really a thing that happened. It had one simple job: store different versions of code.

Next, we'll travel to 1990, and the release of the Concurrent Versions System, commonly known as CVS. Unlike RCS, CVS employed a client-server model. A copy of the repository was stored on a central server and several clients could make copies of it. At this point in time, it was possible for multiple people to be editing the same file, so it was possible for two individuals to make conflicting changes. To work around this issue, CVS required that you fetch and merge the latest changes from the server into your code before making any commits.

Finally, we’ll travel to a little over a decade ago to the year 2005, where a tool that we’re all familiar with came to fruition: Git. Git was different from its predecessors in a lot of ways. For one, it didn't require you to be on the latest version of a file before making changes. You could make a change to a file and then pull in any updates that happened after you made the change. Furthermore, Git was distributed in a decentralized nature, Git repositories could exist in a first-class nature on developer's machines, GitHub's servers, your company's CI build, and so on.

Why did git get so popular?

It was for a couple of reasons, and everyone has a perspective on what those reasons were. Tools like GitHub certainly made Git a little more popular by providing a centralized space for developers to discover and share code. Git also had a merge strategy that was a lot easier to navigate than its predecessors. And finally, after development, Git was used as the version control system for the Linux kernel codebase, giving it an immediate large adopter.

People's opinions on the reasons for Git's rapid adoption differ but in any case, here we are. Most software teams use Git to version and collaborate on their codebase.

So how do most people use Git? Well, you've probably run a command like this to get the latest copy of the codebase from a remote server onto your machine. What did this command just do?

$ git clone https://github.com/nteract/nteract.git
Cloning into 'nteract'...
remote: Enumerating objects: 310, done.
remote: Counting objects: 100% (310/310), done.
remote: Compressing objects: 100% (98/98), done.
remote: Total 49241 (delta 218), reused 255 (delta 208), pack-reused 48931
Receiving objects: 100% (49241/49241), 16.00 MiB | 3.21 MiB/s, done.
Resolving deltas: 100% (34612/34612), done.

There's all this business about objects and deltas and compressing and enumerating and resolving and oh my goodness! There's quite a lot going on in such a few lines of standard output.

To dive a little bit more into this, we're going to need to poke into a directory that exists on every Git-versioned project: the .git directory. Here's what its contents look like in our newly cloned directory.

$ ls .git
HEAD            branches        config          description     hooks           index           info            logs            objects         packed-refs     refs

There's an objects directory in there. Let's poke around it and see if we can get a sense of what Git might've been enumerating and counting and compressing when we cloned our directory.

$ ls .git/objects/
info    pack

Hm. There are only two directories in there: info and pack. Let's dive into them and see what we can find out!

$ ls .git/objects/info
$ ls .git/objects/pack
pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx
pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack

OK! Now we're getting somewhere a little bit more interesting. The info directory is empty but the pack directory contains two files. One with a .idx extension and another with a .pack extension. Now, we could try to cat these files to look into their contents, but they're binary files so looking at that output won't be much help.

$ file .git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx
.git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx: Git pack index, version 2
$ file .git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack
.git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack: Git pack, version 2, 49241 objects

You can see from the output above that the .pack file contains about 50,000 “objects.” Thankfully, I spent some time looking into this and will tell you right now what these objects are. Hurrah for sharing!

Objects, in the Git context, consist of a type, a size, and some contents. There are four types of objects.

Blobs: An object that is used to store file data.
Trees: An that object that is used to reference multiple blobs or other tree objects.
Commits: An object that contains a reference to a particular tree, the timestamp on which a commit was made, the creator of the commit, and other metadata.
Tags: Annotated tags are stored as objects in the git. Similar to commits, they contain a timestamp, an author, and an associated message.

So what just happened when we cloned? Well, Git pulled all of the objects associated with our project: including commits, files diffs, and tags. That's what those 49,241objects that were pulled in from the server on GitHub were. Once they were pulled in, Git compresses them into a single packfile. How big is the compressed file?

$ du -sh .git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack
 17M    .git/objects/pack/pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack

It's about 17 megabytes. Those 17 megabytes contain every commit, the files associated with that commit, and every tag on the project. To get a sense of all of the objects that have been compressed into a Pack file we can run git unpack-objects -n.

Now that you've got a copy of the code base on your local machine, you'll likely make a new branch on which you'll start to make changes.

$ git checkout -b safia/my-new-branch
Switched to a new branch 'safia/my-new-branch'

Standard output says that we switch to a new branch, but what actually happened under the hood? To answer this, we will need to look inside the .git directory located inside every git-versioned repository.

$ ls .git
HEAD            branches        config          description     hooks           index           info            logs            objects         packed-refs     refs

Let's take a look at the contents of that HEAD file. If you're familiar with Git, you've probably executed a command like: git push origin HEAD to push your updates to a centralized server. What are we actually referencing there?

$ cat ./git/HEAD
ref: refs/heads/safia/my-new-branch

Let's see what's inside the file that the ref is pointing to here.

$ cat .git/refs/heads/safia/my-new-branch
e35102c15bd63698b6dcb721e161c4d630e2d6cc

Oh! We've got a hash in here. What is this hash referencing? There's a useful command: git cat-file that allows us to print out details about the object that is referenced by a SHA-1 hash.

$ git cat-file -p e35102c15bd63698b6dcb721e161c4d630e2d6cc
tree 6cf0113d017fc604de9481758fc6a578a4067dd2
parent aad3eac9629ee28c4d6030e1091e8089dee66cd9
parent b7d058a50b14ea9b45c128da2af2e260b0ef1ea1
author Kyle Kelley <rgbkrk@gmail.com> 1537647280 -0400
committer GitHub <noreply@github.com> 1537647280 -0400

Merge pull request #3341 from nteract/renovate/next-7.x

Update dependency next to v7.0.0

Cool! So it turns out that that hash is a reference to a tree object. In this case, the tree object is a reference to a set of changes under a commit.

So, we're going to make a change, stage it, and commit it. You might have heard those words used in the context of Git before. What do they mean?

$ tree .git
.git
├── HEAD
├── branches
├── config
├── description
├── hooks
│   ├── applypatch-msg.sample
│   ├── commit-msg.sample
│   ├── post-update.sample
│   ├── pre-applypatch.sample
│   ├── pre-commit.sample
│   ├── pre-push.sample
│   ├── pre-rebase.sample
│   ├── pre-receive.sample
│   ├── prepare-commit-msg.sample
│   └── update.sample
├── index
├── info
│   └── exclude
├── logs
│   ├── HEAD
│   └── refs
│       ├── heads
│       │   ├── master
│       │   └── safia
│       │       └── my-new-branch
│       └── remotes
│           └── origin
│               └── HEAD
├── objects
│   ├── info
│   └── pack
│       ├── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx
│       └── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack
├── packed-refs
└── refs
    ├── heads
    │   ├── master
    │   └── safia
    │       └── my-new-branch
    ├── remotes
    │   └── origin
    │       └── HEAD
    └── tags

18 directories, 25 files

Now, let's run the git add command and observe what changed about our .git directory.

$ git add README.md
$ tree .git
.git
├── HEAD
├── branches
├── config
├── description
├── hooks
│   ├── applypatch-msg.sample
│   ├── commit-msg.sample
│   ├── post-update.sample
│   ├── pre-applypatch.sample
│   ├── pre-commit.sample
│   ├── pre-push.sample
│   ├── pre-rebase.sample
│   ├── pre-receive.sample
│   ├── prepare-commit-msg.sample
│   └── update.sample
├── index
├── info
│   └── exclude
├── logs
│   ├── HEAD
│   └── refs
│       ├── heads
│       │   ├── master
│       │   └── safia
│       │       └── my-new-branch
│       └── remotes
│           └── origin
│               └── HEAD
├── objects
│   ├── dc
│   │   └── ce97ef3d92d70d1385952ba7a9988908f3f23e
│   ├── info
│   └── pack
│       ├── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx
│       └── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack
├── packed-refs
└── refs
    ├── heads
    │   ├── master
    │   └── safia
    │       └── my-new-branch
    ├── remotes
    │   └── origin
    │       └── HEAD
    └── tags

19 directories, 26 files

Oh! Look at that! There's something new in our .git/objects directory. Let's take a look inside and see if we can find out more.

$ git cat-file -p dcce97ef3d92d70d1385952ba7a9988908f3f23e
# nteract <img src="https://cloud.githubusercontent.com/assets/836375/15271096/98e4c102-19fe-11e6-999a-a74ffe6e2000.gif" alt="nteract animated logo" height="80px" align="right" />

[![](https://img.shields.io/badge/version-latest-blue.svg)](https://github.com/nteract/nteract)
[![](https://img.shields.io/badge/version-stable-blue.svg)](https://github.com/nteract/nteract/releases)
[![codecov.io](https://codecov.io/github/nteract/nteract/coverage.svg?branch=master)](https://codecov.io/github/nteract/nteract?branch=master)[![slack in](https://slack.nteract.io/badge.svg)](https://slack.nteract.io)
[![lerna](https://img.shields.io/badge/maintained%20with-lerna-cc00ff.svg)](https://lernajs.io/) [![Circle CI Status Shield](https://circleci.com/gh/nteract/nteract/tree/master.svg?style=shield)](https://circleci.com/gh/nteract/nteract/tree/master)

|| [**Basics**](#basics) • [**Users**](#users) || [**Contributors**](#contributors) • [**Development**](#development) • [**Maintainers**](#maintainers) || [**Sponsors**](#sponsors) • [**Made possible by**](#made-possible-by) ||

## Basics

test

**nteract** is first and foremost a dynamic tool to give you flexibility when
writing code, [exploring data](https://github.com/nteract/nteract/tree/master/packages/transform-dataresource), and authoring text to share insights about the
data.

I've truncated it above, but the new object that has been stored in the objects directory is a blob object that contains the entirety of the contents of our README.md file.

You might've noticed in the directory structure above that the first two characters in the SHA-1 hash are used as the name for the directory that the blob object is stored in. This seems like a strange thing to do but there's a couple of reasons that this is done.

There's an operating system defined limit on the number of files that can be stored in a single directory. For example, if you're using a macOS system, you can only have 2.1 billion items within a single folder. This seems like plenty of space, but older operating systems have greater restrictions on the number of files you can store.
Operating systems generally execute a linear scan on the file system when looking for files. This search is done on a per-directory basis, by breaking up the thousands of objects that exist in an average git repository into several folders, git reduces the bottleneck associated with searching for these files and loading them into memory.

OK! Now that we've staged our change, we actually need to commit it. Similar to last time, we'll run tree before and after the operation to figure out what changed in the .git directory that we can track down. I'll avoid pasting the full output of tree here and just show you the difference.

$ git commit -m "Update README"
$ tree .git
├── objects
│   ├── 26
│   │   └── 64cf0af38ae9428fca337c12868c4f5e41ca01
│   ├── dc
│   │   └── ce97ef3d92d70d1385952ba7a9988908f3f23e
│   ├── ec
│   │   └── 2122429f0a087825c4ae41a2d16777066e3f0a
│   ├── info
│   └── pack
│       ├── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.idx
│       └── pack-457ca7bffdeceb504dbf91cd58e3602f47a56ced.pack

So it looks like two objects were added to our .git directory. Let's see if we can find out what they were.

$ git cat-file -p 2664cf0af38ae9428fca337c12868c4f5e41ca01
040000 tree 70616f745fe0f17582b0b608c92ae6b16ad86b5a    .circleci
100644 blob b3e97e8818845bcdf12b89f00449ef552ac58de6    .eslintignore
100644 blob c730a944e1d1c8079457c60abdce5bf053004809    .eslintrc
100644 blob 29e6b3480db284b7bddb3b0e0eb99929b2f63cdc    .flowconfig
100644 blob a502c988b3992e35630bd7bf30ebb5e0d1f12248    .gitattributes
040000 tree ce862566e7c21f3b3c7f8c7ed7a3c3c7013310e2    .github
100644 blob 4e0a33a5adc65416dba8b819da7558ce7ddee936    .gitignore
100644 blob 296e8837f9c29d21807034139c1d2da795d0fdf5    .npmignore
100644 blob 5c8aef89b1892faed5a2318c04fd326b2d4f1ae8    .npmrc
100644 blob ec6d3cdd7f5b083403ae78073054bb0854c0227f    .prettierignore
100644 blob 0967ef424bce6791893e9a57bb952f80fd536e93    .prettierrc
100644 blob 3fdd5f091b70b845c61d1b6e301ea39c7fb16135    .travis.yml
100644 blob d3dda9eed93c0ee69cd18ef400da77d8ec64213a    CHANGELOG.md
100644 blob 77a476cb5498b9e128ed0d988ba08481de5f640a    CODE_OF_CONDUCT.md
100644 blob 8829fc28b6e50207f9cbef985a4084d11dce57db    CONTRIBUTING.md
100644 blob 79d2e86e15b4097311ba62f264a5d099b4bc5a21    LICENSE
100644 blob dcce97ef3d92d70d1385952ba7a9988908f3f23e    README.md
100644 blob 723f12e5c6df15fbec1efc615835e8b38da74faf    RELEASING.md
100644 blob df688362897275b533dd1c22b22b0c8615834017    USER_GUIDE.md
040000 tree f5f81ee9258d45667c0f94438ef5bdfeadc66f52    applications
100644 blob 2df851d31bcc94e9d1668232e35a758707df60d8    appveyor.yml
100644 blob 98c1e1d5032a1a8055a6f7d20168aa33661c1d47    babel.config.js
100644 blob 0e822958a7cc9efc77b53937d4a220ff939f5942    codecov.yml
040000 tree 2d95cfcfd2fb396b8ec44382502701d6a3d4c406    doc
040000 tree e85bea502dbd0e559b2eb3fea8508f5216b58975    flow-typed
040000 tree 21015beb05071ca6967489d75b3a23ff049e6b61    initiatives
100644 blob 89c5be33c4965067f8cbe5c1f29b206c689b94cc    lerna.json
100644 blob c168aa74cd657dd3e4daec5494247ad11c299bae    nbformat.v4.json
100644 blob 39364fcb1e44e212d3e15ad66914778c5fa5ed96    package.json
040000 tree 48c30c6468f73a808e9b44c69d1d65d42cd49b83    packages
100644 blob 23389f9333a333fac308305c428c004320c5eb1a    renovate.json
040000 tree c2c1a5e20498a97af89d0045471183db31855904    scripts
100644 blob 6a1dbab2959bfd0981c3670b1018acde0a989043    styleguide.config.js
040000 tree 10dbc180a024fce8fdefb75e78ede8b833119af0    styleguide
100644 blob f90d9d5981d286dbd58411dc21b822f44451de7d    yarn.lock

$ $ git cat-file -p ec2122429f0a087825c4ae41a2d16777066e3f0a
tree 2664cf0af38ae9428fca337c12868c4f5e41ca01
parent e35102c15bd63698b6dcb721e161c4d630e2d6cc
author Safia Abdalla <safia@safia.rocks> 1537835421 -0400
committer Safia Abdalla <safia@safia.rocks> 1537835421 -0400

Update README

Interesting! So the second object is a commit object. As mentioned earlier, it contains the commit message, the committer, and the timestamp. It also includes a reference to the tree, the one that we see in the first object. So what just happened here?

We staged our change to the README file. This created a blob object for that instance of the file.
We created a commit. This commit referenced our latest README blob object and the most recent blob and tree objects available for other items in our repository.

Finally, we're going to push our change up to our branch.

$ git push origin HEAD
Counting objects: 100, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (69/69), done.
Writing objects: 100% (100/100), 15.36 KiB | 1.02 MiB/s, done.
Total 100 (delta 61), reused 62 (delta 28)
remote: Resolving deltas: 100% (61/61), completed with 20 local objects.
To https://github.com/nteract/nteract.git
 * [new branch]        HEAD -> safia/my-new-branch

What did this just do? Well, a couple of things happened. Our Git client sends any new objects created between the time of the last push to a remote git instance. But there's also this business about compressing and counting all over again. What's going on? Well, it turns out that here, Git has compressed some of the objects that were loose in our directory into a packfile, similar to the one that we saw earlier. How does this compression work? git sorts the objects by type, then name, then size then computes just the deltas between adjacent versions. Sorting by size exploits a principle known as Linus's law, which states that file sizes grow with time. In this sense, by sorting the objects by size, you're implicitly sorting them by the order in which they were most recently modified.

Once the objects are compressed, the index file that we saw earlier serves as a table of contents, pointing from an object hash to its location within the compressed packfile.

Why does Git generate a packfile when we did this push? It's likely that at that time, there are multiple unneeded loose objects in your objects directory. For example, if you add and commit a file then add and commit it again, you have two blob objects of that same file with a potentially small difference. Before pushing those objects to the server, it helps to compress them into a packfile.

Now, you might find yourself in a situation where changes have been made to a file that you're editing by another person. In this case, you need to bring in those changes into your file. There are two ways to do this in git: through a rebase or a merge.

So what is the difference between a rebase and a merge? Let's start by talking about what how a merge works.

Let's try to merge the change that we made in safia/my-new-branch onto our master branch.

git merge safia/my-new-branch
Updating e35102c1..ec212242
Fast-forward
 README.md | 2 ++
 1 file changed, 2 insertions(+)

Since there were no changes made in master that did not exist in our feature branch, our merge simply moved the pointer at the HEAD of the master branch to match the pointer at the HEAD of our feature branch.

But what happens if there are changes in our master branch that don't exist in our feature branch? Let's try to simulate this by creating a new branch, staging and committing a change on master, then staging and committing a change to our new branch and attempting to merge.

$ git checkout -b safia/branch-that-will-be-behind
safia/branch-that-will-be-behind
$ git checkout master
Switched to branch 'master'
$ echo "Test" > test.txt
$ git add test.txt && git commit -m "Added test file"
[master 00aa5891] Added test file
 1 file changed, 1 insertion(+)
 create mode 100644 test.txt
$ git checkout safia/branch-that-will-be-behind
Switched to branch 'safia/branch-that-will-be-behind'
$ echo "Test 2" > test-2.txt
$ git add test-2.txt && git commit -m "Added other test file"
[safia/branch-that-will-be-behind e1bceaba] Added other test file
 1 file changed, 1 insertion(+)
 create mode 100644 test-2.txt
$ git checkout master
Switched to branch 'master'
$ git merge safia/branch-that-will-be-behind
Merge made by the 'recursive' strategy.
 test-2.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 test-2.txt

There's a lot going on here, but the most important thing to pay attention to is the output of the last command. You'll notice that unlike last time, git used the recursive strategy to bring our changes from safia/branch-that-will-be-behind into master. What does this mean? Well, for one, it means that a merge commit was created for our change.

$ git log
commit 5b347a036d5b27d5e11c13b0b88a34565db8dcc3 (HEAD -> master)
Merge: 00aa5891 e1bceaba
Author: Safia Abdalla <safia@safia.rocks>
Date:   Mon Sep 24 21:53:59 2018 -0400

    Merge branch 'safia/branch-that-will-be-behind'

If we print out the contents of the commit object associated with this commit, we will notice something exciting.

$ git cat-file -p 5b347a036d5b27d5e11c13b0b88a34565db8dcc3
tree d94b7f867953d7a3335bcc23732ded883e0a24e7
parent 00aa5891eef1a21d384ca4eabca735f79c2c3bdf
parent e1bceaba273dec2db897053053533f72f7d6447e
author Safia Abdalla <safia@safia.rocks> 1537840439 -0400
committer Safia Abdalla <safia@safia.rocks> 1537840439 -0400

Merge branch 'safia/branch-that-will-be-behind'

Interesting! This merge commit has two parent commits. What are they?

$ git cat-file -p 00aa5891eef1a21d384ca4eabca735f79c2c3bdf
tree d738ac3037823bf69abba275977592b87e430046
parent ec2122429f0a087825c4ae41a2d16777066e3f0a
author Safia Abdalla <safia@safia.rocks> 1537840379 -0400
committer Safia Abdalla <safia@safia.rocks> 1537840379 -0400

Added test file
$ git cat-file -p e1bceaba273dec2db897053053533f72f7d6447e
tree 58f0c5c85c6c93e8f8943c7212ec3c6f0ba9fa43
parent ec2122429f0a087825c4ae41a2d16777066e3f0a
author Safia Abdalla <safia@safia.rocks> 1537840430 -0400
committer Safia Abdalla <safia@safia.rocks> 1537840430 -0400

Added other test file

Great! Now let's cover how rebasing works. We'll start by setting up the same structure that we set up in our merge example.

$ git checkout -b safia/branch-to-rebase-with
Switched to a new branch 'safia/branch-to-rebase-with'
$ git checkout master
Switched to branch 'master'
$ echo "Another test" > test-3.txt
$ git add test-3.txt && git commit -m "Added test-3.txt"
[master 12f1569b] Added test-3.txt
 1 file changed, 1 insertion(+)
 create mode 100644 test-3.txt
$ git checkout safia/branch-to-rebase-with
Switched to branch 'safia/branch-to-rebase-with'
$ echo "Another test in another branch" > test-4.txt
$ git add test-4.txt && git commit -m "Added test-4.txt"
[safia/branch-to-rebase-with f864bfbc] Added test-4.txt
 1 file changed, 1 insertion(+)
 create mode 100644 test-4.txt
$ git checkout master
Switched to branch 'master'
$ git rebase safia/branch-to-rebase-with
First, rewinding head to replay your work on top of it...
Applying: Added test-3.txt

If we take a look at the first two commits in our master branch, we'll see the following.

$ git log
commit badeaff4c25c768d2b296f65952ff9ae14413577 (HEAD -> master)
Author: Safia Abdalla <safia@safia.rocks>
Date:   Mon Sep 24 22:08:04 2018 -0400

    Added test-3.txt

commit f864bfbc841f5056fac081da39b97fd9944863cf (safia/branch-to-rebase-with)
Author: Safia Abdalla <safia@safia.rocks>
Date:   Mon Sep 24 22:08:18 2018 -0400

    Added test-4.txt

Using the handy-dandy cat-file command, we can see that the commits have a parent-child relationship.

$ git cat-file -p f864bfbc841f5056fac081da39b97fd9944863cf
tree a9012c9d01aadc083ac095885c4340954f49b0bf
parent 5b347a036d5b27d5e11c13b0b88a34565db8dcc3
author Safia Abdalla <safia@safia.rocks> 1537841298 -0400
committer Safia Abdalla <safia@safia.rocks> 1537841298 -0400

Added test-4.txt
$ git cat-file -p badeaff4c25c768d2b296f65952ff9ae14413577
tree a6c43fe433c254b30fdc46315107025d19bd1951
parent f864bfbc84156fac081da39b97fd9944863cf
author Safia Abdalla <safia@safia.rocks> 1537841284 -0400
committer Safia Abdalla <safia@safia.rocks> 1537841304 -0400

Added test-3.txt

So, essentially, when we rebased, we created an interleaved the commits together to ensure that they had the proper linear hierarchy. When we merged, we formed a single commit that joined two parent commits into a new linear hierarchy. And that summarizes the differences between a rebase and a merge when we merge we keep an explicit reference to the two branches that we merged from at the expense of losing our linear history. On the other hand, rebasing allows us to maintain a linear history but lose the explicit references to our two branches.

Neat-o! That's a lot of new information about Git. Let's do a quick recap of what we learned in this blog post.

Git represents key information as objects stored in the file system
Git compresses loose objects into packfiles to increase space efficiency
Rebases and merges differ in whether they give preference to maintaining a linear history or explicit branches