Organisation: | Copyright (C) 2021-2024 Olivier Boudeville |
---|---|
Contact: | about (dash) howtos (at) esperide (dot) com |
Creation date: | Saturday, November 20, 2021 |
Lastly updated: | Wednesday, June 19, 2024 |
No real software development shall happen without the use of a VCS - standing for Version Control System - of some sorts, notably in order to track the versions of the source files involved and to ease the collaborative work on them.
Many solutions have been defined for this purpose (CVS, Clearcase, SVN, Mercurial, etc.), but now a single tool is the de facto standard: Git, which is a distributed version control system available as free software; refer to its website for more details.
Beyond the documentation relative to its general use, projects have to adopt their own set of conventions - regarding the management of branches, commits, tags, etc - based on their preferences and context.
These are certainly very basic and possibly idiosyncratic, yet any VCS content (typically a release branch) should, in our opinion, meet a few criteria in terms of quality.
The ones to which we try to stick are:
# Target file is tracked iff is listed by: $ git ls-files | grep my_file # Or, in order to trigger an error if this target file is not tracked: git ls-files --error-unmatch my_file
# Replaces the current version of that file by the designated one: $ git checkout COMMIT_ID path/to/the/target/file # So, for example, in order to revert/replace the current version of a file by # the one at which is was *prior* to a given commit: # $ git checkout COMMIT_ID~ path/to/the/target/file # Outputs on the console the designated version: $ git show COMMIT_ID:path/to/the/target/file # Outputs on the console the diff between the designated # version and the current one: # $ git diff COMMIT_ID:path/to/the/target/file
Creating branches allows to separate threads of work (while preserving their lineage) and progress concurrently. Yet often their content will have to converge ultimately; depending on the intent, two use cases can be considered, resulting in different Git uses.
Here one may want:
In practice, in order to transfer the changes of a branch A in a branch B:
$ git co B # Either first case (integrate development A in master B): $ git merge A # or: git pull A # Or second one (resynchronise development B on master A): $ git rebase A # or: git pull -rebase A
How such a last rebase of branch A in branch B is done? The bifurcation point of B compared to A is moved from its initial position to the current head of A, on which all changes recorded in B are applied; the resulting history of B looks like if these changes had been directly performed from the version of A designated in this rebase, and thus B can be then directly fast-forwarded to its tip, which comprises both the changes synchronised from A and, then, the ones specifically introduced in B.
Then, to update the remote with these post-rebase commits, git push --force-with-lease shall be used [1].
[1] | Rather than just performing just a push, having it fail, pulling, and ending up with duplicates of the changes. Should this happen, rewind these changes, for example with: git reset --hard <full_hash_of_commit_to_reset_to>. |
Sometimes, one may want to directly transfer the changes of a derivate branch B in a parent branch A. When one knows for sure that the versions in B shall be preferred in all cases to their counterparts in A (note that a classical merge is already fully able to manage fast-forwards), one may use:
$ git checkout A $ git merge -X theirs B
No conflict should arise (source); note that this does not imply that the contents of the two branches match.
The same is possible with rebase; for example: git rebase -X theirs B.
Note that -X a strategy option, whereas -s would be a merge strategy option.
Using here ours rather than theirs :
Another (brutal but sure) way of forcing the content of a branch B to be the same as the one of a branch A is, while B is checked-out, to execute: git reset --hard A. As mentioned previously, push shall be done then with git push --force-with-lease.
Sometimes, we know for sure that a given file must be transferred as it is on a given branch (some_branch), as a whole, to the current branch (current_branch). Cherry-picking is not exactly what is needed, as git cherry deals with commits, not actual contents. Here, what we presented with checkout may be more appropriate:
$ git checkout current_branch # The current location on the branch matters: $ git checkout some_branch path/to/the/target/file Updated 1 path from 6f154364
Often a bit anxious to acknowledge an automatic merge with so little control on the corresponding changes?
One approach is, when being in a branch A, to execute git merge --no-commit --no-ff B: the merge of B in A will be done, but not committed, leaving the possibility to review - and possibly correct - it.
The staged changes can indeed then be inspected with git diff --cached (or our difs script) and, if finding a file whose merge is not satisfactory, just correct it (possibly git restore its version from A), and possibly further fix the merge, before committing it.
If there was at least one conflict, run git merge --abort to get rid of the 'MERGING' state.
To avoid, typically in a company internal setting, errors like:
Cloning into 'XXX'... fatal: unable to access 'https://foo.bar.org/XX/XXX/': SSL certificate problem: self signed certificate in certificate chain
the http.sslVerify=false option may be used, even if it weakens the overall security.
This is typically useful initially:
$ git -c http.sslVerify=false clone https://foo.bar.org/XX/XXX
In order that the next operations (e.g. future pushes) overcome too this problem for the current repository, use from within the current clone:
$ git config http.sslVerify false
Doing so prevent from having to amend commits a posteriori.
If these information apply for all projects:
$ git config --global user.name "John Doe" $ git config --global user.email john.doe@foobar.org
Otherwise shall be done at least on a per-project basis with:
$ git config user.name "John Doe" $ git config user.email john.doe@foobar.org
Also git config --global --edit may be of use (beware to trigger a vi by accident...).
Using a SSH key pair, hence with its public key declared on said remote, is a relevant approach, safer than from example using a ~/.netrc file.
So you forked a repository (let's say it is in https://github.com/some_project/some_repo.git) and made progress - yet in the meantime the upstream repository may also have been updated, and you want to integrate these changes in yours.
First step is to ensure that this repository (designated here as upstream for convenience) is locally known:
$ git remote add upstream https://github.com/some_project/some_repo.git
Then, from a fully-committed clone of your fork (let's suppose we are using the main branch in all repositories):
$ git fetch upstream # More appropriate than a merge: $ git rebase upstream/main # Repeatedly, as long as conflicts are found: $ git rebase --continue # Forced, as otherwise the current branch will deemed to be behind our remote: # (hopefully your branch at origin is not protected by a hook; otherwise: # 'git checkout -b some_branch', etc.) $ git push -f origin main
Rather than creating it from a pre-existing branch and removing all inherited content, prefer:
$ git checkout --orphan my_new_branch
(typically useful for GitHub Pages branches; may then be followed by some adds and git commit --allow-empty -m "Initial website.")
In order to list the differences of a given file with the previous commits (precisely: of a set of pathspecs), one may use our dif-prev.sh script, which by default reports the differences with the last committed version. With the --all option, it lists all differences, until the first addition of this file.
Our procedure is to rely on our configuration of Emacs, which configures the smerge-mode (which is automatically triggered in the case of files containing conflicts; see the SMerge menu) so that it relies on the C-c v [2] smerge command prefix (that we found more convenient).
[2] | Hence: press and hold the "Ctrl" key, hit the "c" key, then release all, then press the "v" key, and release. This is obtained thanks to (setq smerge-command-prefix "\C-cv"). |
Then the following main commands are useful:
See also our more general Using Emacs section.
The stash allows to record the current state of the working directory and the index while going back to a clean working directory: the command saves the local modifications away, and reverts the working directory to match the HEAD commit.
The stash is convenient in order to switch branches without having to perform an arbitrary (meaningless) commit just for the sake of switching.
A basic use of the stash is the following:
Refer to the git stash documentation for more information.
One should use this method:
$ git update-index --skip-worktree <file-list>
The opposite operation is:
$ git update-index --no-skip-worktree <file-list>
Use git ls-files to determine the files that are already managed in VCS, recursively from the current directory.
To list the untracked files (i.e. the files not in VCS), use git ls-files --others.
One may use our list-largest-vcs-blobs.sh script to detect any larger files that should not be in VCS (e.g. should a colleague have committed by mistake a third-party archive, or unexpected data such as CSV files).
Then install BFG Repo-Cleaner:
$ mkdir -p ~/Software/bfg-repo-cleaner/ $ cd $_ $ mv ~/bfg-1.14.0.jar . $ ln -s bfg-1.14.0.jar bfg.jar # For example in ~/.bashrc: $ alias bfg="java -jar ~/Software/bfg-repo-cleaner/bfg.jar"
All developers should be asked to commit their sources (git add + push), to archive their clone (e.g. in a timestamped .xz file like 20220412-archive-clone-foobar.tar.xz), and to wait until notified that they can create a new clone.
The repository may be then cleaned up (e.g. from large, unnecessary CSV files) in isolation, with:
$ git clone --mirror XXX/foobar.git $ bfg --delete-files '*.csv' foobar.git $ cd foobar $ git reflog expire --expire=now --all && git gc --prune=now --aggressive $ git push
Then all developers shall be requested to perform a new clone and to check the fetched content (e.g. with regard to the content of the last branch in which they committed).
Use Git Attributes to specify proper files and paths attributes.
One may define a .gitattributes file for example with *.js eol=lf, * text=auto, or:
# No CRLF conversion for DOS/Windows batch files. # They should be stored with the CRLF line terminators. # *.bat -crlf
If no push was done, it is as simple as replacing the former message by a new one, like in:
$ git commit --amend -m "This is a fixed commit message."
The goal here is to withdraw (revert) a (presumably faulty) commit.
If it has not been pushed to remote, use git reset HEAD~1; otherwise (already pushed), use git revert HEAD (then a push can be made).
Sometimes mistakes are made, committed and pushed - typically when messing up some merge.
Various operations can be of use to correct them:
See also these exchanges.
For local ones:
$ git for-each-ref --sort='-committerdate:iso8601' --format=' %(committerdate:iso8601)%09%(refname)' refs/heads 2024-05-24 18:02:52 +0200 refs/heads/17-xxx 2024-05-24 18:02:52 +0200 refs/heads/main 2024-05-24 14:51:15 +0200 refs/heads/36-yyy 2024-05-24 12:19:51 +0200 refs/heads/zzz 2024-04-19 18:03:37 +0200 refs/heads/aaa
For remote ones:
$ git for-each-ref --sort='-committerdate:iso8601' --format=' %(committerdate:iso8601)%09%(refname)' refs/remotes
See also:
At least on UNIX, the command-line Git client (git) is certainly the best tool. In difficult situations, graphical tools such as gitk may be of help.
Some distributions (e.g. Debian) do not come with a relevant Git autocompletion (for commands, branches, etc.) regarding one's shell of interest (like Bash). A solution is to download git-completion.bash, store it for example in ~/Software/Git/, and source it automatically from one's ~/.bashrc.
See also our Ceylan-Hull section about VCS-related scripts.
Tools like TortoiseGit may foster a view on the usage of Git that is a bit particular, conflating concepts or introducing extra ones (e.g. a sync command). Apparently also at least some pulls did not reintroduce files just removed from the working directory.
More generally, cloning on a Windows host an UNIX-originating repository comprising symbolic links may induce oddities (e.g. a symlink named S pointing to Foobar resulting, on a Windows clone, in a file named S whose content is, literally, the text "Foobar", instead of the expected content of the Foobar file).
Another option is to use Visual Studio Code (vscode), which supports natively Git (provided that the command-line version is already installed). One may select View -> SCM (or Ctrl-Shift-G) for that. Clicking on the "VCS" icon (three rings links by two curves; the third from the top) displays a contextual view offering various associated operations (here based on Git).
We finally preferred using MSYS2 + Git rather than Git Bash, named "Git for Windows"; hints to speed up these tools may apply.
Git stores internally every version of every file separately (not as a diff with a parent version) as a blob (an opaque binary content) identified by its (SHA1) hash.
A commit is the identifier of a tree representing the filesystem of interest at a given moment (snapshot). This tree references the files through their SHA1, similarly to a Merkle tree.
A branch is thus nothing but a pointer on a given commit, and HEAD designates the current branch. Git stores natively only blobs, trees and commits.
The reported differences in the content of a file or a tree are thus only recreated (established dynamically) by Git commands, they are not natively tracked.
From English to French:
Many pointers exist, doing a great job in unveiling how Git is to be used.
In English, Pro GIT is surely a reference.
In French: