How to Use Git
#git #sourcecontrol #tutorial
Every once in a while, there is a post that makes it to the top 10 of either HackerNews or /r/programming that has Git as a topic. It's either about Git itself, perhaps some obscure trick, or some huge flaw in the CLI, or about how much more complicated it is than the good-ole-days of Subversion, Perforce, and how Mercurial is so much simpler.
The comments section almost always devolves into a Git bashing fest. Not that I'm in love with Git - it's kind of like a carpenter saying he's in love with his hammer. I have been using Git daily since 2012, and around 2015, I felt like I finally reached a level of productivity that I was comfortable with.
Before that point, I was all aboard the Git-bashing train. It's CLI UI sucks!
checkout does so many things! There's nothing as good as TortoiseSVN (great software by the way - highly recommend if you're using Subversion on Windows)! The man pages are like reading rocket science papers!
Before Git, I used Subversion, both in university and in a professional setting. I mention this because most people say if you understand Git, you never really used any other SCM system, and this is simply false, since the human mind is more than capable at understanding many different methods to doing "the same thing" (in this context, this "thing" is source code control).
In this post, I'd like to share the knowledge of Git needed to be quite productive. I'm not going to explain the internals of Git - because I don't know them! I have no idea how Git is implemented. I read an article about it some time ago, but knowledge of Git's internals is completely unnecessary to understanding how to use Git effectively. Do you think a carpenter knows exactly what their hammer's metal alloy is? Or handle's source of wood is?
Note: this will not be a fully self-contained tutorial. I expect the reader, as someone who uses Git, to be familiar with looking up and reading Git documentation, at least the manpages that are available via the
--help flag for each Git command.
Create a New Repository:
If you want to create your very own Git repository, I have just one word for you:
$ mkdir -p myrepo && cd myrepo $ git init
Voila! You have yourself your own Git repository.
These are likely the first commands you will ever use with Git. You will want to pull the contents of a Git repository so that you can interact with it on your machine.
clone is used when you want to, well, clone the repository that exists on the Git server to your local machine:
$ git clone https://github.com/BurntSushi/bstr.git
pull is used when you want to pull the latest changes made to the repository to your local machine.
# Inside the bstr repository folder, on your machine. $ git pull
fetch is used when you want to fetch meta-information about existing branches in the git repository.
# Inside the bstr repository folder, on your machine. $ git fetch
There are many flags for each of these commands, and I highly recommend you run the following:
$ git clone --help $ git pull --help $ git fetch --help
And read through at least the first few paragraphs of the manpage that is presented to you. Most of the time, you will only ever use these without any flags. But just in case, here are some flags that I've used, quite rarely:
Interacting With Repository History:
A Git repository is valuable because it shows you an immutable log of all changes that ever occurred to a codebase (well, not really immutable, but we'll talk about re-writing history later).
Each contribution to a repository is called a commit, and each commit has a unique identifier, which in Git is a hash code that is displayed in hex format.
To get a view of the history of a repository, starting from the latest commit, you can use the
# Inside the bstr repository folder, on your machine. $ git log
log command is highly configurable, but in general the default settings are usable. Here are some flags that you might find useful:
Searching the interwebs for
git log aliases will give you a large list to choose from.
whatchanged is similar to
git log, but it also shows you the exact changes that took place to produce a particular commit.
Some useful flags:
Writing To A Local Repository:
So now you want to contribute. In order to do so, create a new branch:
# Inside the bstr repository folder, on your machine. $ git checkout -b my-awesome-change
checkout -b means I want to check out a new branch in the repository tree.
Adding New Content
Now, make your changes. First, let's add a new file that's not present in the repository.
# Inside the bstr repository folder, on your machine. $ echo "This is my cool doc!" >> mycooldoc.md
Now, before you finalize your contribution, lets check the
status of the repository:
# Inside the bstr repository folder, on your machine. $ git status
Reading the output of the command, you should see that
mycooldoc.md is what Git calls an "untracked file" - which means exactly what it sounds. We need to tell Git to track it.
# Inside the bstr repository folder, on your machine. $ git add mycooldoc.md
add command is used to add any changes that you've made - whether it's a new file, or changes to an existing file, to the repository. But you're not out of the woods yet. In order to finalize your changes and tell Git to add your new file to the repository, and, it's history, you need to
# Inside the bstr repository folder, on your machine. $ git commit -m "add my cool doc - only touch it if you're cool"
commit means that you're binding these changes to the repository - the content of these changes, as well as the history of these changes, including the message that you provide. The
-m flag stands for "message". There are also good practices regarding commit messages that you should learn to make your commit messages effective and useful for other developers.
Modifying Existing Content
Now that we've added
mycooldoc.md - lets edit it and add some more useful information.
$ echo "You can contact me at firstname.lastname@example.org" >> mycooldoc.md
In general, before you
add your changes, you'll want to check the
diff-erence between what the state of things were before your change and after your change. That's what
$ git diff
Reading a diff quickly takes some practice, but essentially, a line with a "+" sign is an addition, and a line with a "-" sign is a deletion. For non-contrived contributions (unlike the current session) you will likely see a mix of both in your adventures.
Now that we're sure our changes are good, the process for
commit-ing is the same:
$ git add mycooldoc.md $ git commit -m "update mycooldoc.md with my e-mail!"
Some flags you might be interested in:
-A: add changes made to all files. Use this sparingly and only when you're sure that you're not adding secret files to your repository (big no-no).
Merging Your Changes:
This whole time, we've been working on a non-master/non-release branch. This is intentional. In general, it is good practice to add all of your changes to a branch that is not the master branch, unless you're the only contributor.
But at some point, you want to merge your stuff to master.
There are many variations on doing this, what-with GitHub having pull requests, a web UI, and so on, but here's how to do it for pure Git, irrespective of web platform.
First, go back to the branch that we want to merge our changes into. Let's say that this branch is called
$ git checkout master
$ git merge my-awesome-change
You should see that the merge completed successfully.
Updating Your Branch:
So far, we've been working in isolation, and all of our changes, even the
merge, have been local. That's one of the great things about Git - all the stuff you do, it's on your own working copy 99% of the time. We'll talk about publishing your changes to the world (i.e, the Git server) next.
But one important thing to keep in mind is, as you're working on something, so are your fellow devs. That means that your branch is likely to go out of date as time passes, especially if there's a lot of active development in that particular repository.
So how do you stay up to date?
There are two ways. One is the way of the Light Side of the Force, and the Other is the way of the Dark Side. I can't tell you which to choose - I can only show you the way.
Say we're working on a branch
supersecure on the Linux kernel. You've been going along for a week but you're working on some code that 100s of other developers work on.
In order to make sure your changes are still valid with the current state of the kernel's
master, you'd like to pull in their changes.
# Inside the repository, on your supersecure branch $ git fetch $ git merge master
fetch gets the latest meta-updates from all branches on your repository, even the
master branch. Then,
merge merges any changes from
supersecure, from the commit at which you branched out from
Now, most of the time, Git does merges very well. Exceptionally well, in fact.
However, when it doesn't, it's your job to fix it. Git calls merges that it can't figure out merge conflicts.
Most editors have support for recognizing merge conflicts. Since you know your code better than Git, Git trusts you to fix the conflict.
In the event of a conflict, a merge is actually still in progress. If you check
git status, Git will tell you exactly what's wrong.
Once you've fixed your conflicts, you
add those fixed files to the tree, as usual. Then, you
commit, as usual.
$ git status --- Output Snipped, But We Have a Conflict! --- $ vim conflicted_file.c $ git add conflicted_file.c $ git commit -m "Merge master into supersecure"
There's no easy way to say it - rebases are a tricky beast.
Most of the time, rebasing is more time-consuming than merging. The reason being is that you will likely spend more time resolving conflicts.
The way rebasing works, roughly, is as follows.
Say you branched
supersecure from the master branch at commit A1.
You committed 10 times to
supersecure, from commit S1 to S10.
master is at commit A50 (49 commits later). And you decide you want to rebase, because you want the sweet new changes on there.
rebase does literally exactly what it says:
- re: to do something again
- base: the verb base, as in, "base this on X", "Einstein based it on Y" - in the context of Git, the "base" is the commit at which you branched out. In our case A1.
Taken together -
rebase means we are basing our branch off of the latest commit - A50.
To do this, Git effectively takes your commits - S1 to S10 - and applies them, one by one, to an imaginary new branch that it creates temporarily, that is branched off of A50.
Imagine a literal branch that you slide up a tree - upwards means forward in time in this case - that's what Git is doing. And since Git applies the commits one by one - in order to not lose any history - at each stage of the rebase, you could run into merge conflicts.
The history re-write is precisely the fact that the base of your branch is now different.
In the worst case, you could end up resolving conflicts 10 different times - one for each commit of S1 to S10.
In the best case, however, there won't be any conflicts, and you get rid of those Pesky Merge Commits (PMCs).
With the explanation out of the way, here's the basics:
$ git fetch $ git rebase master
Walk this line with caution. In general, rebasing is more dangerous, because it usually involves more work. The ROI on it, however, that I'll leave up to you.
Why is this the Dark Side? Well, you're essentially re-writing history at this point.
You originally branched off of A1. That's what I saw before you rebased. But now, you're saying you're branched off of A50. What are you, some kind of liar?
Rebasing, along with squashing commits (not going to explain that here - but if you're curious, it's a DuckDuckGo search away) is essentially mutating the history of the Git repository.
To purists like myself, I think Git is more effective as a tool if it is viewed as a set of transformative operations, with the history of these operations as immutable. However, rebasing exists, and it is generally up to a particular repository's best practices whether you should be using it or not.
Publishing Your Changes:
Up until now, we haven't affected the outside world. All of our changes are completely local. In order to publish our changes, we need to
push them to the Git server.
This is most commonly done on a non-master branch.
$ git checkout -b superfun $ git push origin superfun
origin here refers to what Git calls a remote - it's basically a Git server. The remote can have any name. To check what your remotes are, you can do:
$ git remote
And Git will list all the configured remotes. Note that you don't have to type an explicit URL - that's done during configuration time, when you're first setting up the repository using
init, or when you
clone the repository for the first time.
Reverting Your Changes:
The above operations are probably 90% of what you will be doing in Git. You will be either browsing code in a repository, browsing the repository's history, or making changes to the repository via branching, adding, committing, and merging.
But there are still some common cases, like reverting commits, undoing changes that you haven't yet
add-ed, and undoing commits that haven't been pushed yet, that we'll cover.
Reverting A Pushed Commit
At some point you'll make a mistake (unless you're Linus Torvalds) and commit something that breaks tests, or is flat out wrong, and that's fine. Git has an easy way to remedy this.
You narrowed down the faulty commit to hash A0. Copy this commit hash to your clipboard, and do the following:
$ git revert --no-commit A0
(Of course, the full hash will be longer, this is simply for illustration purposes).
--no-commit flag is optional, but it basically means, revert all the changes I made in this commit, but don't create a new commit automatically for these changes. The default behavior is for Git to create a brand new commit that is basically a reversal of the commit you specified, with an automatically generated message along the lines of "Revert commit
<commit-hash>". If you want this behavior, simply omit
Sometimes, reverts can fail. This can happen if the change you're trying to revert has itself been changed in future commits. In such cases, you have two options:
Revert all commits, in reverse order, until you reach the commit you want to reverse. This can be an extreme approach, especially if there are valid commits past the commit you want to revert.
Manually revert the changes yourself in an editor. This is usually the superior method, and you should opt for it if you want to keep your history clean.
Undoing an Un-pushed Commit
Sometimes, you commit some stuff by mistake, but luckily, you haven't pushed it.
This is easy to remedy.
reset command will do it for you.
If you want to basically "delete" the last commit that you added, do the following:
$ git reset HEAD~1
HEAD is a special variable that basically refers to the latest commit on the branch you are currently working on. The
~1 roughly means "minus 1". In all, you're saying, "reset me 1 commit before current HEAD".
If you check
git status, you will see your changes un-staged. You can do this with as many commits as you like - you can say
HEAD~3 to "delete" 3 commits behind. Note that this doesn't actually delete your changes - the commits no longer exist, but your changes are still present, just un-staged.
Undoing Un-Staged Changes
Sometimes, you mess around and add some code to a file, but that code isn't really useful, and you want to reset it to what it was before.
Additionally, you didn't
commit any of these changes.
You can delete these lines of code manually, but why bother, since Git knows exactly what lines were added?
To do this, do the following:
$ git checkout -- thefileyoutouched.c
-- - that isn't a typo. It's two dashes right after each other.
This post is by no means a comprehensive description of the capabilities of Git - in fact, it's meant as the opposite, more of a "Git at a Glance".
It's also possible that you'll run into some things that I haven't described here in this post. That's certainly possible.
But my hope was to describe to you what 99% of your time using Git will be like.
Bear blog doesn't have comment capabilities just yet, but I'm sure I'll be getting feedback some way or other. Happy coding!
- 1 toasts