Version Control Systems: in Practice, now, Git

Organisation:Copyright (C) 2021-2024 Olivier Boudeville
Contact:about (dash) howtos (at) esperide (dot) com
Creation date:Saturday, November 20, 2021
Lastly updated:Wednesday, June 19, 2024

Overview

No real software development shall happen without the use of a VCS - standing for Version Control System - of some sorts, notably in order to track the versions of the source files involved and to ease the collaborative work on them.

Many solutions have been defined for this purpose (CVS, Clearcase, SVN, Mercurial, etc.), but now a single tool is the de facto standard: Git, which is a distributed version control system available as free software; refer to its website for more details.

Git Usage

Beyond the documentation relative to its general use, projects have to adopt their own set of conventions - regarding the management of branches, commits, tags, etc - based on their preferences and context.

Base Know-How

  • specifying a target revision:
    • a revision is generally the name of a commit object; it can be its SHA-1, or a symbolic name like HEAD or master@{yesterday}
    • the path that it designates may be either "absolute" (i.e. relative to the root of the clone, like PREFIX:a/b/c/my-file.txt) or relative to the current directory in the clone (provided they start with ./, like PREFIX:./c/my-file.txt if being in the previous b directory)
    • suffixes may be added:
      • ~[n] (e.g. v1.5.1~3) allows to designate the generation ancestor (parent, grand-parent, etc.) of a revision; for example: git show HEAD~:a/b/c/my-file.txt shows the first parent of the current commit of that file
      • ^[n] (e.g. HEAD^ or HEAD^2) allows to designate the n-th parent (at the same level, should a commit have multiple parents, like after a merge) of the current commit of that file
  • managing branches:
    • creating branches is done thanks to the checkout command, often abbreviated as co here
    • to create a branch deriving from the current one (the current HEAD) and switching to it at the same time (performing its co, while inheriting any local changes): git co -b my_new_branch
    • to create a local branch corresponding to a remote one (let's suppose it is named some_branch), assuming that a remote server (e.g. my_remote, possibly origin) has already been declared (e.g. git remote add my_remote URL):
      • first step is to update the remote-tracking branches with git fetch my_remote, then to create the target local branch tracking that remote (upstream) one with: git co -b some_branch my_remote/some_branch (also switching to it here)
      • a shortcut is to use git co --track my_remote/some_branch instead
      • even shorter, if the name of the target local branch name does not exist yet and matches exactly matches a name on only one remote, git co some_branch will suffice
    • to delete a local branch (while another one is checked out): git branch -d my_branch; to force-delete it (typically if not fully merged), use git branch -d my_branch; to delete it from a specified remote as well: git push origin --delete my_branch
  • managing tags (note that they are repository-level, for example not related to any current branch):
    • to list all (local) tags: git tag
    • to list all (annotated) tags, from the oldest one to the latest one: use Ceylan-Hull's list-tags-by-date.sh script
    • to have information about an already-existing tag: git show my_tag
    • to set a new annotated tag: git tag -a foobar-version-2.4.0 -m "Release of the version 2.4.0 of Foobar."; prefer naming tags differently from branches (e.g. foobar-version-2.4.0 rather than foobar-2.4.0) to spare ambiguities to Git
    • a set tag must be specifically pushed on a remote, for example: git push origin my_tag; all tags can be pushed with git push --tags (the remote can be implied)
    • to delete a tag that was not pushed: git tag --delete my_tag
  • determining whether a file is in VCS, knowing that due to .gitignore rules, update-index --skip-worktree, etc. it is not always obvious:
# Target file is tracked iff is listed by:
$ git ls-files | grep my_file

# Or, in order to trigger an error if this target file is not tracked:
git ls-files --error-unmatch my_file
  • getting the version of a file as it was at a given revision:
# Replaces the current version of that file by the designated one:
$ git checkout COMMIT_ID path/to/the/target/file

# So, for example, in order to revert/replace the current version of a file by
# the one at which is was *prior* to a given commit:
#
$ git checkout COMMIT_ID~ path/to/the/target/file

# Outputs on the console the designated version:
$ git show COMMIT_ID:path/to/the/target/file

# Outputs on the console the diff between the designated
# version and the current one:
#
$ git diff COMMIT_ID:path/to/the/target/file
  • listing the files modified by a given commit: git show --name-only MY_COMMIT_ID will print relevant extra information; if wanting only the list of modified files: git diff-tree --no-commit-id --name-only -r --pretty COMMIT_ID

Managing Branches

Creating branches allows to separate threads of work (while preserving their lineage) and progress concurrently. Yet often their content will have to converge ultimately; depending on the intent, two use cases can be considered, resulting in different Git uses.

Merge versus Rebase

Here one may want:

  • either to integrate back a development branch (e.g. my-feature) in a shared, parent one (e.g. master): then one shall prefer using merge, in order to keep separate histories and not affect the past one of the shared branch
  • or to resynchronise a development branch (e.g. my-feature) on the last version of a shared branch and continue these developments: then one shall prefer rebase, so that the history of the development branch contains only its own changes (less noise, linear history)

In practice, in order to transfer the changes of a branch A in a branch B:

$ git co B

# Either first case (integrate development A in master B):
$ git merge A # or: git pull A

# Or second one (resynchronise development B on master A):
$ git rebase A # or: git pull -rebase A

How such a last rebase of branch A in branch B is done? The bifurcation point of B compared to A is moved from its initial position to the current head of A, on which all changes recorded in B are applied; the resulting history of B looks like if these changes had been directly performed from the version of A designated in this rebase, and thus B can be then directly fast-forwarded to its tip, which comprises both the changes synchronised from A and, then, the ones specifically introduced in B.

Then, to update the remote with these post-rebase commits, git push --force-with-lease shall be used [1].

[1]Rather than just performing just a push, having it fail, pulling, and ending up with duplicates of the changes. Should this happen, rewind these changes, for example with: git reset --hard <full_hash_of_commit_to_reset_to>.

More information: [1] or, in French: [2], [3], [4].

Directly Transferring Changes

Sometimes, one may want to directly transfer the changes of a derivate branch B in a parent branch A. When one knows for sure that the versions in B shall be preferred in all cases to their counterparts in A (note that a classical merge is already fully able to manage fast-forwards), one may use:

$ git checkout A
$ git merge -X theirs B

No conflict should arise (source); note that this does not imply that the contents of the two branches match.

The same is possible with rebase; for example: git rebase -X theirs B.

Note that -X a strategy option, whereas -s would be a merge strategy option.

Using here ours rather than theirs :

  • -X ours uses "our" version of a change only when there is a conflict
  • whereas -s ours ignores the content of the other branch entirely (in all cases), and use "our" version instead; -s theirs does not exist

Another (brutal but sure) way of forcing the content of a branch B to be the same as the one of a branch A is, while B is checked-out, to execute: git reset --hard A. As mentioned previously, push shall be done then with git push --force-with-lease.

Sometimes, we know for sure that a given file must be transferred as it is on a given branch (some_branch), as a whole, to the current branch (current_branch). Cherry-picking is not exactly what is needed, as git cherry deals with commits, not actual contents. Here, what we presented with checkout may be more appropriate:

$ git checkout current_branch
# The current location on the branch matters:
$ git checkout some_branch path/to/the/target/file
Updated 1 path from 6f154364

Merging with no Auto-Commit

Often a bit anxious to acknowledge an automatic merge with so little control on the corresponding changes?

One approach is, when being in a branch A, to execute git merge --no-commit --no-ff B: the merge of B in A will be done, but not committed, leaving the possibility to review - and possibly correct - it.

The staged changes can indeed then be inspected with git diff --cached (or our difs script) and, if finding a file whose merge is not satisfactory, just correct it (possibly git restore its version from A), and possibly further fix the merge, before committing it.

If there was at least one conflict, run git merge --abort to get rid of the 'MERGING' state.

Common Procedures

Overcoming auto-signed SSL certificate issues

To avoid, typically in a company internal setting, errors like:

Cloning into 'XXX'...
fatal: unable to access 'https://foo.bar.org/XX/XXX/': SSL certificate problem: self signed certificate in certificate chain

the http.sslVerify=false option may be used, even if it weakens the overall security.

This is typically useful initially:

$ git -c http.sslVerify=false clone https://foo.bar.org/XX/XXX

In order that the next operations (e.g. future pushes) overcome too this problem for the current repository, use from within the current clone:

$ git config http.sslVerify false

Setting the right metadata for the next commits

Doing so prevent from having to amend commits a posteriori.

If these information apply for all projects:

$ git config --global user.name "John Doe"
$ git config --global user.email john.doe@foobar.org

Otherwise shall be done at least on a per-project basis with:

$ git config user.name "John Doe"
$ git config user.email john.doe@foobar.org

Also git config --global --edit may be of use (beware to trigger a vi by accident...).

Performing operation on remotes with no systematic authentication

Using a SSH key pair, hence with its public key declared on said remote, is a relevant approach, safer than from example using a ~/.netrc file.

Updating One's Fork from its Upstream

So you forked a repository (let's say it is in https://github.com/some_project/some_repo.git) and made progress - yet in the meantime the upstream repository may also have been updated, and you want to integrate these changes in yours.

First step is to ensure that this repository (designated here as upstream for convenience) is locally known:

$ git remote add upstream https://github.com/some_project/some_repo.git

Then, from a fully-committed clone of your fork (let's suppose we are using the main branch in all repositories):

$ git fetch upstream

# More appropriate than a merge:
$ git rebase upstream/main

# Repeatedly, as long as conflicts are found:
$ git rebase --continue

# Forced, as otherwise the current branch will deemed to be behind our remote:
# (hopefully your branch at origin is not protected by a hook; otherwise:
# 'git checkout -b some_branch', etc.)

$ git push -f origin main

Creating an empty branch

Rather than creating it from a pre-existing branch and removing all inherited content, prefer:

$ git checkout --orphan my_new_branch

(typically useful for GitHub Pages branches; may then be followed by some adds and git commit --allow-empty -m "Initial website.")

Listing differences with prior versions of a file

In order to list the differences of a given file with the previous commits (precisely: of a set of pathspecs), one may use our dif-prev.sh script, which by default reports the differences with the last committed version. With the --all option, it lists all differences, until the first addition of this file.

Performing a Merge with Emacs

Our procedure is to rely on our configuration of Emacs, which configures the smerge-mode (which is automatically triggered in the case of files containing conflicts; see the SMerge menu) so that it relies on the C-c v [2] smerge command prefix (that we found more convenient).

[2]Hence: press and hold the "Ctrl" key, hit the "c" key, then release all, then press the "v" key, and release. This is obtained thanks to (setq smerge-command-prefix "\C-cv").

Then the following main commands are useful:

  • to move between conflicts:
    • go to next one: smerge-command-prefix n (for smerge-next), which thus corresponds to C-c v n
    • go to previous one: smerge-command-prefix p (for smerge-previous), which thus corresponds to C-c v p
  • to select a version:
    • the one the cursor is currently on: smerge-command-prefix RET (for smerge-keep-current), which thus corresponds to C-c v RET
    • that was in our current merge-target branch: smerge-command-prefix m (for smerge-keep-mine), which thus corresponds to C-c v m
    • that is in the other branch, to merge in our current one: smerge-command-prefix o (for smerge-keep-other), which thus corresponds to C-c v o

See also our more general Using Emacs section.

Using the Stash

The stash allows to record the current state of the working directory and the index while going back to a clean working directory: the command saves the local modifications away, and reverts the working directory to match the HEAD commit.

The stash is convenient in order to switch branches without having to perform an arbitrary (meaningless) commit just for the sake of switching.

A basic use of the stash is the following:

  • to store the currently modified files in the stash: git stash (equivalent of git stash push)
  • to restore the (by default lastly) stored state (hence the reciprocal of the previous command), i.e. to apply it to the current files (with no review and no changes to add afterwards) and then drop that stashed entry: git stash pop [STASH_INDEX]
  • to list the various sets of modifications stashed away: git stash list
  • to show the changes recorded in the stash entry as a diff between the stashed contents and the commit back when the stash entry was first created:
    • with just one synthetic line per file changed: git stash show [STASH_INDEX]
    • with all actual changes: git stash show -p [STASH_INDEX]
  • to drop a stash entry: git stash drop [STASH_INDEX]

Refer to the git stash documentation for more information.

Preventing the commit of a file in VCS that is often locally modified

One should use this method:

$ git update-index --skip-worktree <file-list>

The opposite operation is:

$ git update-index --no-skip-worktree <file-list>

Listing the files managed in VCS from the current directory

Use git ls-files to determine the files that are already managed in VCS, recursively from the current directory.

To list the untracked files (i.e. the files not in VCS), use git ls-files --others.

Reducing the size of a repository

One may use our list-largest-vcs-blobs.sh script to detect any larger files that should not be in VCS (e.g. should a colleague have committed by mistake a third-party archive, or unexpected data such as CSV files).

Then install BFG Repo-Cleaner:

$ mkdir -p ~/Software/bfg-repo-cleaner/
$ cd $_
$ mv ~/bfg-1.14.0.jar .
$ ln -s bfg-1.14.0.jar bfg.jar
# For example in ~/.bashrc:
$ alias bfg="java -jar ~/Software/bfg-repo-cleaner/bfg.jar"

All developers should be asked to commit their sources (git add + push), to archive their clone (e.g. in a timestamped .xz file like 20220412-archive-clone-foobar.tar.xz), and to wait until notified that they can create a new clone.

The repository may be then cleaned up (e.g. from large, unnecessary CSV files) in isolation, with:

$ git clone --mirror XXX/foobar.git
$ bfg --delete-files '*.csv' foobar.git
$ cd foobar
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ git push

Then all developers shall be requested to perform a new clone and to check the fetched content (e.g. with regard to the content of the last branch in which they committed).

Fixing LF vs CRLF End of Line Problems

Use Git Attributes to specify proper files and paths attributes.

One may define a .gitattributes file for example with *.js eol=lf, * text=auto, or:

# No CRLF conversion for DOS/Windows batch files.
# They should be stored with the CRLF line terminators.
#
*.bat -crlf

Fixing a commit message

If no push was done, it is as simple as replacing the former message by a new one, like in:

$ git commit --amend -m "This is a fixed commit message."

Removing a commit (rollback)

The goal here is to withdraw (revert) a (presumably faulty) commit.

If it has not been pushed to remote, use git reset HEAD~1; otherwise (already pushed), use git revert HEAD (then a push can be made).

Restoring a branch to a past state

Sometimes mistakes are made, committed and pushed - typically when messing up some merge.

Various operations can be of use to correct them:

  • to consult a past state (e.g. based on a SHA1) and possibly create a dedicated branch out of it: use checkout
  • to selectively remove the changes introduced by specific commits, by adding reverting commits (hence not losing history); use revert; such an operation must be validated thanks to a well-documented commit
  • to come back to a past overall state (e.g. a SHA1), losing all changes done since then (hard delete, rewriting history): use reset, with the --soft option to keep intermediary changes as non-committed, or with the --hard option to remove all changes; no commit is involved here (newer commits being just forgotten); if satisfied with this new state, it may be validated thanks to git push --force origin HEAD - provided that the current branch is not protected

See also these exchanges.

Listing local or list the remote branches by their last modified date

For local ones:

$ git for-each-ref --sort='-committerdate:iso8601' --format=' %(committerdate:iso8601)%09%(refname)' refs/heads

2024-05-24 18:02:52 +0200  refs/heads/17-xxx
2024-05-24 18:02:52 +0200  refs/heads/main
2024-05-24 14:51:15 +0200  refs/heads/36-yyy
2024-05-24 12:19:51 +0200  refs/heads/zzz
2024-04-19 18:03:37 +0200  refs/heads/aaa

For remote ones:

$ git for-each-ref --sort='-committerdate:iso8601' --format=' %(committerdate:iso8601)%09%(refname)' refs/remotes

See also:

Tools

On Most Platforms

At least on UNIX, the command-line Git client (git) is certainly the best tool. In difficult situations, graphical tools such as gitk may be of help.

Some distributions (e.g. Debian) do not come with a relevant Git autocompletion (for commands, branches, etc.) regarding one's shell of interest (like Bash). A solution is to download git-completion.bash, store it for example in ~/Software/Git/, and source it automatically from one's ~/.bashrc.

See also our Ceylan-Hull section about VCS-related scripts.

On Windows

Tools like TortoiseGit may foster a view on the usage of Git that is a bit particular, conflating concepts or introducing extra ones (e.g. a sync command). Apparently also at least some pulls did not reintroduce files just removed from the working directory.

More generally, cloning on a Windows host an UNIX-originating repository comprising symbolic links may induce oddities (e.g. a symlink named S pointing to Foobar resulting, on a Windows clone, in a file named S whose content is, literally, the text "Foobar", instead of the expected content of the Foobar file).

Another option is to use Visual Studio Code (vscode), which supports natively Git (provided that the command-line version is already installed). One may select View -> SCM (or Ctrl-Shift-G) for that. Clicking on the "VCS" icon (three rings links by two curves; the third from the top) displays a contextual view offering various associated operations (here based on Git).

We finally preferred using MSYS2 + Git rather than Git Bash, named "Git for Windows"; hints to speed up these tools may apply.

Inner Workings

Git stores internally every version of every file separately (not as a diff with a parent version) as a blob (an opaque binary content) identified by its (SHA1) hash.

A commit is the identifier of a tree representing the filesystem of interest at a given moment (snapshot). This tree references the files through their SHA1, similarly to a Merkle tree.

A branch is thus nothing but a pointer on a given commit, and HEAD designates the current branch. Git stores natively only blobs, trees and commits.

The reported differences in the content of a file or a tree are thus only recreated (established dynamically) by Git commands, they are not natively tracked.

Translations

From English to French:

Documentation

Many pointers exist, doing a great job in unveiling how Git is to be used.

In English, Pro GIT is surely a reference.

In French: