Git: Evolved

13 minute read

Now that we have a basic understanding of how to use git, we will take a deeper look at how git works, and start to see how it really scales. In other words, it’s time to git gud.

Relevant xkcd (1597) for reference.

XKCD 1597

Collaboration

Where Version Control Systems really shine, is when you start to collaborate on projects. Git has some neat features to help keep your code organized.

.gitignore

Some files or directories are temporary or contain secrets, which are not meant for publication on your repository. Thus, we can tell git to automatically ignore these files, using a .gitignore file.

This file contains a list of all the patterns in file or directory names we wish to ignore when staging files.

Let’s look at an example - lines starting with # are comments, and not evaluated.

# ignore any .tmp files
*.tmp

# ignore anything starting with secret
secret*

# ignore any directory or file named hidden
hidden

# ignore anything inside the docs directory.
docs/

# but include docs/important.txt despite the above rule
!docs/important.txt

# ignore anything named a inside of directory b (works recursively)
b/**/a

Notes

  • Any entry containing a slash (/), the path is relative to the .gitignore file itself.
  • A single asterisk * only expands filenames, but not multiple directories.
  • To expand directory names, use double asterisks **.
  • A question mark ? matches a single character (except /).
  • Regex-like ranges can be used [1-9], [a-zA-Z].

Lastly, if you want to add a file, despite it being ignored, use the -f/--force option when staging files.

git add -f secret.txt

A pre-written .gitignore file for almost any language you could imagine can be found on github/gitignore.

Submodules

Submodules allows for git repositories to be included inside of your main repository.

Say I want my sweet Tinggaard/pathfindig repository cloned inside of an existing repository, as I have some code that depends on it.

git submodule add [email protected]:Tinggaard/pathfinding [target-path]

This command creates a new file: .gitmodules. Furthermore it cretes a the directory pathfinding/, unless another path has been specified.

To specify a branch run the command with the -b option, proceeded by the branch name.

Note: text in brackets ([]) are optional arguments, and text in angle brackets (chevrons) (<>) are placeholders for a value that must be given.


When cloning a repo with submodules, these submodules can be initialized automatically using the --recurse-submodules option.

git clone --recurse-submodules https://example.com/git-repo.git

To update submodules (defaults to all submodules, unless told otherwise), we use.

git submodule update --remote [submodule-path]

Git uses the default branch for submodules, unless told otherwise. Specifying anyther branch using the .gitmodules file, means this branch will also be the branch used for anyone else cloning the main repository.

To checkout the assignment branch of the pathfinding submodule after having added it, we use the following.

git submodule set-branch -b assignment -- pathfinding

Submodules are not the cleanest way of sourcing other repositories. If possible, always opt for using other modules, using some kind of package manager, instead of including the entire source. Submodules does not update dynamically across branches, meaning if one branch has a submodule, and another does not, you will have to remove the directory and re-update it every time you switch branch.

Forking

Forking is used to create a copy of a repository, And make yourself the owner instead. This is useful, when you have some feature you want implemented on the project.

Note: Forking is not a feature of git itself, but rather of GitHub, allowing for you to have your own repository, identical to the one forked.

The main purpose of forking, is for implementing some feature, and then creating a Pull Request of the original repository. It can also be used, for creating your own version of the project, like how neovim is a fork of vim.

On GitHub, forking can be done by clicking the “Fork” button on the upper right corner of the repository, or adding /fork to the end of the base URL of the repo.

Tagging

Tags are pointers to a certain point in the projects history, and can be seen as a way to version your code.

There are two types of tags, lightweight and annotated.

A lightweight tag is a basic pointer to a specific commit, and nothing else. Whereas an annotated tag is a git object containing a tagger name, email and date, furthermore an annotated tag can be signed using GPG (more on that later). It is recommended to create annotated tags, unless it’s a temporary tag.

Lightweight tagging is done, using the git tag command.

git tag v1.0

Here, HEAD is marked with the lightweight tag v1.0.

Creating an annotated tag is done, using the -a flag.

git tag -a v1.1 -m "Version 1.1"

As with commits, a message must be included for annotated tags.

To tag a previous commit (not HEAD), simply specify the commit checksum (or at least part of it) after the command.

git tag -a v0.1 -m "Version 0.1" e287fa6

To delete a tag again, we use the -d option.

git tag -d v0.1

A list of all tags can be retrieved, by running the tag command without any arguments.

git tag

To show detailed information about a specific tag, we run

git show v1.0

Git does not automatically push tags, we have to explicitly tell to either push a specific tag, or all tags.

git push origin v1.1
# or push all tags
git push origin --tags

Deleting a tag locally does not delete it on the remote, again, we need to specify

git push origin --delete v0.1

A specific tag can be checked out, like checking out a commit (as it’s simply a pointer to such commit)

git checkout v1.1

Remark: Checking out previous tags will result in a detached HEAD state, meaning new commits from here, does not belong to any branch, and will thus be lost. To counter this we have to create a branch pointer to the commit/tag.

git switch -c bugfix v0.1

This command creates a new branch based on the commit from v0.1 , and switches to it.

Co-Authoring

It’s possible to have multiple authors for a single commit.

This is done, by adding two newlines at the end of the commit message, followed by the other commiters name and email.

# note how the quote is not closed until the last line.
git commit -m "Changed stuff
>
>
>Co-authored-by: name <[email protected]>"

To add another co-author, simply add another entry, before closing the commit message quote.

History

We will now look at some neat commands for viewing the history of our git project.

log

The most basic command is

git log

This shows the commit log, along with some basic information about each commit.

This command has a lot of options available, depending on how you wish to format and filter the output. I will only cover some of the basic options here.

Option Description
-n Only show the last n commits.
-p Show diff for each commit.
--stat Show stats for each commit.
--oneline Each entry takes up a single line.
--graph ASCII graph with branch merges shown.
--relative-date Show date as relative of now.
--grep Commit message must match pattern.

These commands can of course be combined, like such

git log --oneline --graph

Git can also show a range of commits, using the dot notation.

git log commit1..commit2
# or
git log branch1..branch2

This will show all the commits made since 1 up until 2.

diff

diff is used to tell exactly which changes has been made between two different stages of the project.

In it’s simplest form, it shows the difference between HEAD and the current working tree.

git diff

But we can also specify a range (commits, branches, tags) to diff.

git diff master..development

Lastly, if the changes themselves are irrelevant, but we only need to see which files have changes and not what have changed, we pass the --name-only argument. Similarly, --stat shows how many lines have been added and deleted for each file, like in git log.

Security

We will now look at some features for maintaining the integrity of your git user.

SSH

Instead of storing your GitHub credentials locally, creating an SSH key, allows for private repositories to be accessed without compromising on security.

Note: If you are using Windows, make sure to run the following commands in Git Bash, that came along with your installation of git.

To generate a new key

ssh-keygen -t ed25519 -C "[email protected]"

The -C option is simply a comment, for you to distinguish this key (like an email).

The above command generates a key, and asks for a place to save it to (default should be fine). And lastly a passphrase for the key, which is not mandatory.

Now, we have to tell the SSH agent about our key.

eval "$(ssh-agent -s)"

ssh-add ~/.ssh/id_ed25519

We will now add the key to GitHub.

Copy the output of the following command to you clipboard.

cat ~/.ssh/id_ed25519.pub

This is the public key.

Now, open your settings on your GitHub profile, related to SSH and GPG keys, and click “New SSH key”

Or simply follow this link: https://github.com/settings/ssh/new.

Paste the key into the “key” box, and give your key a proper name, such as the device your created it on, and click “Add SSH key”.

To test your new key, run the following command.

Which should tell you that you have authenticated as your user.

GPG

In a previous blogpost I covered how to do some basic tasks using GPG. If you don’t know, GPG is a cryptographic standard for encrypting, signing and verifying. If you have not heard about GPG before, I would recommend you go and read the post now.


Adding a GPG key to git (and GitHub) allows us to verify that our commits and tags are actually our work. This is another security measure, as git allows us to change the author of commits (see jayphelps/git-blame-someone-else), we now have more integrity, by signing our commits.

We will start of by telling git about our GPG key. First, we list our secret keys.

gpg --list-secret-keys --keyid-format=long

Copy the key ID of the key you wish to use, and use it in the following command.

git config --global user.signingkey <keyid>

Now, we only need to tell GitHub that this key belongs to us, by adding the publickey in the settings, like we did with the SSH key.

gpg --export --armor [email protected]

Navigate here: https://github.com/settings/gpg/new and paste the key from the above command.

Now, next time you commit, you can sign the commit, by passing the -S/--gpg-sign option.

git commit -S -m "Yay, this commit is signed"

Signing a tag, is almost the same, except the “s” is lowercase.

git tag -s v2.0 -m "Version 2 - signed"

Both of these two types of signed work can also be verified.

# verify a single commit
git verify-commit 32d9e71
# verify all commits shown in git log
git log --show-signature
# veify tag
git tag -v v2.0

Note: In order to verify a signature, you need to have the signing key in your keyring.
For any GitHub user, their publickeys can be found at https://github.com/username.gpg

Git can also be set up to automatically sign your commits

git config --global commit.gpgsign true

Now any commit you create, will be signed by the key set previously. To only change these settings for a single repository, omit the --global flag from any of the config commands.

Nice to know

Lastly I’d like to show you some the “nice to knows” when working with git.

Stashing

Stashing your work is a godsend, for when you started working on one branch and find out your code actually should be put in another. Or when you simply want to switch branch to change or view some stuff, but your current work is not ready to be committed just yet.

Stashing stores all of your unsaved work, and leaves your working directory clean (as HEAD).

git stash

This means you can now saftely switch branch, as git does not complain that you have unsaved work.

Note: by default stashing only saves tracked files, to also include untracked files, add the -u flag.

List all stashes

git stash list

To apply the most recent stash (add --index to also reapply the index).

git stash apply

To apply a specific stash, pass it’s name (note how the indexing starts at 0).

git stash apply 2

Delete the stash, now that it’s applied

git stash drop 2

There is a shorthand for applying and dropping a stash in one command (defaults to 0)

git stash pop [stash]

Or if you wish to create a branch from a stash

git stash branch <branch name>

Stashing is really handy for quickly switching braches, and jumping back to continue working.

Aliases

Git allows for alias creation, so you can optimize your workflow even more!

Let’s say I want git sw to be a shorthand for git switch, I simply write out.

git config --global alias.sw switch

Some other neat aliases include

# shorthand
git config --global alias.ci commit
# shorthand
git config --global alias.st status
# unstage a file
git config --global alias.unstage 'restore --staged --' 
# last commit
git config --global alias.last 'log -1 HEAD' 
# pretty graph log
git config --global alias.logbr 'log --oneline --graph'
# fancy branch viewer
git config --global alias.br 'branch --format="%(HEAD) %(color:yellow)%(refname:short)%(color:reset) - %(contents:subject) %(color:green)(%(committerdate:relative)) [%(authorname)]" --sort=-committerdate'

Or if you are bad at coming up with good commit messages, for your private projects.

git config --global alias.yolo '!git commit -m "$(curl -s whatthecommit.com/index.txt)"'

- From ngerakines/commitment

Git backend

Git works by saving everything as an object, compressing it, and storing it as the SHA-1 checksum of itself. This way, git only needs a pointer to the latest commit, to be able to backtrack everything. The three types of objects are:

  • blob - A file
  • tree - A directory listing of files (blob) and directories (tree), and their associated hash
  • commit - Information about the commit, including root tree, parent commit, author, committer, and commit message

All of this data is stored in the .git directory at the root of the project. Since everything in git is referenced by it’s checksum, it’s also impossible to change any content, without git knowing about it. And since all commits are referenced backwards, git ensures integrity all the way through - it’s part of the building blocks of git.

For a more in-depth of the internals of git, I highly recommend you read Chapter 10.2 of the git book.