Makram's Thoughts

How to Use Git

#git #sourcecontrol #tutorial

Every once in a while, there is a post that makes it to the top 10 of either HackerNews or /r/programming that has Git as a topic. It's either about Git itself, perhaps some obscure trick, or some huge flaw in the CLI, or about how much more complicated it is than the good-ole-days of Subversion, Perforce, and how Mercurial is so much simpler.

The comments section almost always devolves into a Git bashing fest. Not that I'm in love with Git - it's kind of like a carpenter saying he's in love with his hammer. I have been using Git daily since 2012, and around 2015, I felt like I finally reached a level of productivity that I was comfortable with.

Before that point, I was all aboard the Git-bashing train. It's CLI UI sucks! checkout does so many things! There's nothing as good as TortoiseSVN (great software by the way - highly recommend if you're using Subversion on Windows)! The man pages are like reading rocket science papers!

Before Git, I used Subversion, both in university and in a professional setting. I mention this because most people say if you understand Git, you never really used any other SCM system, and this is simply false, since the human mind is more than capable at understanding many different methods to doing "the same thing" (in this context, this "thing" is source code control).

In this post, I'd like to share the knowledge of Git needed to be quite productive. I'm not going to explain the internals of Git - because I don't know them! I have no idea how Git is implemented. I read an article about it some time ago, but knowledge of Git's internals is completely unnecessary to understanding how to use Git effectively. Do you think a carpenter knows exactly what their hammer's metal alloy is? Or handle's source of wood is?

Note: this will not be a fully self-contained tutorial. I expect the reader, as someone who uses Git, to be familiar with looking up and reading Git documentation, at least the manpages that are available via the --help flag for each Git command.

Create a New Repository: init

If you want to create your very own Git repository, I have just one word for you: init.

$ mkdir -p myrepo && cd myrepo
$ git init

Voila! You have yourself your own Git repository.

Read-Only Interaction: clone, pull, fetch

These are likely the first commands you will ever use with Git. You will want to pull the contents of a Git repository so that you can interact with it on your machine.

clone is used when you want to, well, clone the repository that exists on the Git server to your local machine:

$ git clone https://github.com/BurntSushi/bstr.git

pull is used when you want to pull the latest changes made to the repository to your local machine.

# Inside the bstr repository folder, on your machine.
$ git pull

fetch is used when you want to fetch meta-information about existing branches in the git repository.

# Inside the bstr repository folder, on your machine.
$ git fetch

There are many flags for each of these commands, and I highly recommend you run the following:

$ git clone --help
$ git pull --help
$ git fetch --help

And read through at least the first few paragraphs of the manpage that is presented to you. Most of the time, you will only ever use these without any flags. But just in case, here are some flags that I've used, quite rarely:

For clone:

  • --depth
  • --bare

For pull:

  • --ff-only

For fetch:

  • --prune
  • --all

Interacting With Repository History: log, whatchanged

A Git repository is valuable because it shows you an immutable log of all changes that ever occurred to a codebase (well, not really immutable, but we'll talk about re-writing history later).

Each contribution to a repository is called a commit, and each commit has a unique identifier, which in Git is a hash code that is displayed in hex format.

To get a view of the history of a repository, starting from the latest commit, you can use the log command:

# Inside the bstr repository folder, on your machine.
$ git log

The log command is highly configurable, but in general the default settings are usable. Here are some flags that you might find useful:

  • --stat
  • --oneline
  • --decorate
  • --graph

Searching the interwebs for git log aliases will give you a large list to choose from.

whatchanged is similar to git log, but it also shows you the exact changes that took place to produce a particular commit.

Some useful flags:

  • -p
  • --abbrev-commit
  • --pretty=medium

Writing To A Local Repository: checkout, status, diff, add, commit

So now you want to contribute. In order to do so, create a new branch:

# Inside the bstr repository folder, on your machine.
$ git checkout -b my-awesome-change

checkout -b means I want to check out a new branch in the repository tree.

Adding New Content

Now, make your changes. First, let's add a new file that's not present in the repository.

# Inside the bstr repository folder, on your machine.
$ echo "This is my cool doc!" >> mycooldoc.md

Now, before you finalize your contribution, lets check the status of the repository:

# Inside the bstr repository folder, on your machine.
$ git status

Reading the output of the command, you should see that mycooldoc.md is what Git calls an "untracked file" - which means exactly what it sounds. We need to tell Git to track it.

# Inside the bstr repository folder, on your machine.
$ git add mycooldoc.md

The add command is used to add any changes that you've made - whether it's a new file, or changes to an existing file, to the repository. But you're not out of the woods yet. In order to finalize your changes and tell Git to add your new file to the repository, and, it's history, you need to commit:

# Inside the bstr repository folder, on your machine.
$ git commit -m "add my cool doc - only touch it if you're cool"

commit means that you're binding these changes to the repository - the content of these changes, as well as the history of these changes, including the message that you provide. The -m flag stands for "message". There are also good practices regarding commit messages that you should learn to make your commit messages effective and useful for other developers.

Modifying Existing Content

Now that we've added mycooldoc.md - lets edit it and add some more useful information.

$ echo "You can contact me at psych.get.lost@cooldude.io" >> mycooldoc.md

In general, before you add your changes, you'll want to check the diff-erence between what the state of things were before your change and after your change. That's what diff does:

$ git diff

Reading a diff quickly takes some practice, but essentially, a line with a "+" sign is an addition, and a line with a "-" sign is a deletion. For non-contrived contributions (unlike the current session) you will likely see a mix of both in your adventures.

Now that we're sure our changes are good, the process for add-ing and commit-ing is the same:

$ git add mycooldoc.md
$ git commit -m "update mycooldoc.md with my e-mail!"

Some flags you might be interested in:

add:

  • -A: add changes made to all files. Use this sparingly and only when you're sure that you're not adding secret files to your repository (big no-no).

Merging Your Changes: merge

This whole time, we've been working on a non-master/non-release branch. This is intentional. In general, it is good practice to add all of your changes to a branch that is not the master branch, unless you're the only contributor.

But at some point, you want to merge your stuff to master.

There are many variations on doing this, what-with GitHub having pull requests, a web UI, and so on, but here's how to do it for pure Git, irrespective of web platform.

First, go back to the branch that we want to merge our changes into. Let's say that this branch is called master:

$ git checkout master

Now, merge!

$ git merge my-awesome-change

You should see that the merge completed successfully.

Updating Your Branch: rebase, merge

So far, we've been working in isolation, and all of our changes, even the merge, have been local. That's one of the great things about Git - all the stuff you do, it's on your own working copy 99% of the time. We'll talk about publishing your changes to the world (i.e, the Git server) next.

But one important thing to keep in mind is, as you're working on something, so are your fellow devs. That means that your branch is likely to go out of date as time passes, especially if there's a lot of active development in that particular repository.

So how do you stay up to date?

There are two ways. One is the way of the Light Side of the Force, and the Other is the way of the Dark Side. I can't tell you which to choose - I can only show you the way.

Light Side: fetch and merge

Say we're working on a branch supersecure on the Linux kernel. You've been going along for a week but you're working on some code that 100s of other developers work on.

In order to make sure your changes are still valid with the current state of the kernel's master, you'd like to pull in their changes.

# Inside the repository, on your supersecure branch
$ git fetch
$ git merge master

fetch gets the latest meta-updates from all branches on your repository, even the master branch. Then, merge merges any changes from master to supersecure, from the commit at which you branched out from master.

Now, most of the time, Git does merges very well. Exceptionally well, in fact.

However, when it doesn't, it's your job to fix it. Git calls merges that it can't figure out merge conflicts.

Most editors have support for recognizing merge conflicts. Since you know your code better than Git, Git trusts you to fix the conflict.

In the event of a conflict, a merge is actually still in progress. If you check git status, Git will tell you exactly what's wrong.

Once you've fixed your conflicts, you add those fixed files to the tree, as usual. Then, you commit, as usual.

$ git status
--- Output Snipped, But We Have a Conflict! ---
$ vim conflicted_file.c
$ git add conflicted_file.c
$ git commit -m "Merge master into supersecure"

Dark Side: fetch and rebase

There's no easy way to say it - rebases are a tricky beast.

Most of the time, rebasing is more time-consuming than merging. The reason being is that you will likely spend more time resolving conflicts.

The way rebasing works, roughly, is as follows.

Say you branched supersecure from the master branch at commit A1.

You committed 10 times to supersecure, from commit S1 to S10.

But now, master is at commit A50 (49 commits later). And you decide you want to rebase, because you want the sweet new changes on there.

rebase does literally exactly what it says:

  • re: to do something again
  • base: the verb base, as in, "base this on X", "Einstein based it on Y" - in the context of Git, the "base" is the commit at which you branched out. In our case A1.

Taken together - rebase means we are basing our branch off of the latest commit - A50.

To do this, Git effectively takes your commits - S1 to S10 - and applies them, one by one, to an imaginary new branch that it creates temporarily, that is branched off of A50.

Imagine a literal branch that you slide up a tree - upwards means forward in time in this case - that's what Git is doing. And since Git applies the commits one by one - in order to not lose any history - at each stage of the rebase, you could run into merge conflicts.

The history re-write is precisely the fact that the base of your branch is now different.

In the worst case, you could end up resolving conflicts 10 different times - one for each commit of S1 to S10.

In the best case, however, there won't be any conflicts, and you get rid of those Pesky Merge Commits (PMCs).

With the explanation out of the way, here's the basics:

$ git fetch
$ git rebase master

Walk this line with caution. In general, rebasing is more dangerous, because it usually involves more work. The ROI on it, however, that I'll leave up to you.

Why is this the Dark Side? Well, you're essentially re-writing history at this point.

You originally branched off of A1. That's what I saw before you rebased. But now, you're saying you're branched off of A50. What are you, some kind of liar?

Rebasing, along with squashing commits (not going to explain that here - but if you're curious, it's a DuckDuckGo search away) is essentially mutating the history of the Git repository.

To purists like myself, I think Git is more effective as a tool if it is viewed as a set of transformative operations, with the history of these operations as immutable. However, rebasing exists, and it is generally up to a particular repository's best practices whether you should be using it or not.

Publishing Your Changes: push

Up until now, we haven't affected the outside world. All of our changes are completely local. In order to publish our changes, we need to push them to the Git server.

This is most commonly done on a non-master branch.

$ git checkout -b superfun
$ git push origin superfun

The word origin here refers to what Git calls a remote - it's basically a Git server. The remote can have any name. To check what your remotes are, you can do:

$ git remote

And Git will list all the configured remotes. Note that you don't have to type an explicit URL - that's done during configuration time, when you're first setting up the repository using init, or when you clone the repository for the first time.

Reverting Your Changes: revert, checkout, reset

The above operations are probably 90% of what you will be doing in Git. You will be either browsing code in a repository, browsing the repository's history, or making changes to the repository via branching, adding, committing, and merging.

But there are still some common cases, like reverting commits, undoing changes that you haven't yet add-ed, and undoing commits that haven't been pushed yet, that we'll cover.

Reverting A Pushed Commit

At some point you'll make a mistake (unless you're Linus Torvalds) and commit something that breaks tests, or is flat out wrong, and that's fine. Git has an easy way to remedy this.

You narrowed down the faulty commit to hash A0. Copy this commit hash to your clipboard, and do the following:

$ git revert --no-commit A0

(Of course, the full hash will be longer, this is simply for illustration purposes).

The --no-commit flag is optional, but it basically means, revert all the changes I made in this commit, but don't create a new commit automatically for these changes. The default behavior is for Git to create a brand new commit that is basically a reversal of the commit you specified, with an automatically generated message along the lines of "Revert commit <commit-hash>". If you want this behavior, simply omit --no-commit.

Sometimes, reverts can fail. This can happen if the change you're trying to revert has itself been changed in future commits. In such cases, you have two options:

  • Revert all commits, in reverse order, until you reach the commit you want to reverse. This can be an extreme approach, especially if there are valid commits past the commit you want to revert.

  • Manually revert the changes yourself in an editor. This is usually the superior method, and you should opt for it if you want to keep your history clean.

Undoing an Un-pushed Commit

Sometimes, you commit some stuff by mistake, but luckily, you haven't pushed it.

This is easy to remedy.

The reset command will do it for you.

If you want to basically "delete" the last commit that you added, do the following:

$ git reset HEAD~1

HEAD is a special variable that basically refers to the latest commit on the branch you are currently working on. The ~1 roughly means "minus 1". In all, you're saying, "reset me 1 commit before current HEAD".

If you check git status, you will see your changes un-staged. You can do this with as many commits as you like - you can say HEAD~3 to "delete" 3 commits behind. Note that this doesn't actually delete your changes - the commits no longer exist, but your changes are still present, just un-staged.

Undoing Un-Staged Changes

Sometimes, you mess around and add some code to a file, but that code isn't really useful, and you want to reset it to what it was before.

Additionally, you didn't add or commit any of these changes.

You can delete these lines of code manually, but why bother, since Git knows exactly what lines were added?

To do this, do the following:

$ git checkout -- thefileyoutouched.c

Notice the -- - that isn't a typo. It's two dashes right after each other.

Conclusion

This post is by no means a comprehensive description of the capabilities of Git - in fact, it's meant as the opposite, more of a "Git at a Glance".

It's also possible that you'll run into some things that I haven't described here in this post. That's certainly possible.

But my hope was to describe to you what 99% of your time using Git will be like.

Bear blog doesn't have comment capabilities just yet, but I'm sure I'll be getting feedback some way or other. Happy coding!

- 1 toasts