Solved Git deep clone vs. shallow clone

I was struggling with that one too. Basically it's about using the --depth <n> option to limit the history to the last n commits, so with --depth 1 you will only get the most recent commits, but no history. Use of the --depth option also implies --single-branch (unless explicitly told otherwise) so git will in addition only fetch the specified branch but no other branches. The advantage is a saving in required disk space and probably bandwidth too. Just compare a deep clone vs a shallow clone using du(1). If you are not actively working on the sources and are just interested in say keeping your /usr/src or /usr/ports up to date, to build your system from those, a shallow clone will probably work just fine. I was however having a hard time switching my shallow clone of releng/12.2 /usr/src to releng/13.0 and ended up just doing a fresh clone at some point, cause looking for a solution had already taken more time then doing a fresh clone.
 
Shallow clones aren't recommended for two reasons:
  • Accessing a different branch will be cumbersome (in a nutshell, you have to update your refs to include the branch you want and issue a fetch)
  • The commit count that's computed for inclusion in uname will not work correctly.
That said, you can use a shallow clone of course to save even more disk space. But a full clone with all branches will not consume more space than a svn working copy of a single branch did.
 
  • Thanks
Reactions: a6h
I was however having a hard time switching my shallow clone of releng/12.2 /usr/src to releng/13.0 and ended up just doing a fresh clone at some point,
Could you please post the "du" of your complete/finale Deep Clone?
 
...I was however having a hard time switching my shallow clone of releng/12.2 /usr/src to releng/13.0 and ended up just doing a fresh clone at some point...
This makes sense if you think about it. The commits for releng/13.0 were not fetched by your initial shallow clone because they were not referenced in the branch you shallow-cloned.
 
The commit count that's computed for inclusion in uname will not work correctly.
Where is this commit count in the uname output? Looking at my uname -a, I don't see any. Just some hash that's included and the usual kernel build number that increases if you rebuild a kernel. And also that reproducible builds seems to be off by default now on 13.0, i.e. I get the build host, time and other information.
This makes sense if you think about it. The commits for releng/13.0 were not fetched by your initial shallow clone because they were not referenced in the branch you shallow-cloned.
Yes in a way it makes sense, but there still has to be a git way of switching a shallow clone to another branch? I have tried git checkout as well as git switch and some git fetch command I found mentioned somewhere, but ultimately none of it worked. So I just did a fresh clone, which was done in a matter of seconds. git is fast, I give you that.

As for the disk space requirements... interestingly, a bare mirror of the FreeBSD src repository only takes up about 1.42G of disk space but doesn't really compress well:
Code:
% du -hs /home/git/repos/freebsd/src/
1,4G    /home/git/repos/freebsd/src/

% zfs get used,logicalused,compression,compressratio sys/home/git
NAME          PROPERTY       VALUE           SOURCE
sys/home/git  used           1.40G           -
sys/home/git  logicalused    1.42G           -
sys/home/git  compression    lz4             local
sys/home/git  compressratio  1.02x           -
I believe my previous SVN mirror, updated through svnsync was way over 6G in size when I switched to git.
 
Where is this commit count in the uname output?
Here:
FreeBSD [...] 13.0-RELEASE #15 config-n244734-3873806c629: Fri Apr 9 [...]
Blue: branch from which was built
Red: commit count
Green: commit hash

Yes in a way it makes sense, but there still has to be a git way of switching a shallow clone to another branch?
It involves adding the missing refs, so you can fetch the other branch you want. See for example this answer on SO: https://stackoverflow.com/a/17937889

As for the disk space requirements... interestingly, a bare mirror of the FreeBSD src repository only takes up about 1.42G of disk space
See also my answer here:

Resorting to only cloning a single branch, or even doing a "shallow" clone, only makes sense if you really can't afford the disk space. A full git clone (including a working copy) is still smaller than a svn working copy of a single branch.
 
Here:
FreeBSD [...] 13.0-RELEASE #15 config-n244734-3873806c629: Fri Apr 9 [...]
Blue: branch from which was built
Red: commit count
Green: commit hash
Then I dont have that commit count. My output looks like:
Code:
FreeBSD [...]13.0-RELEASE FreeBSD 13.0-RELEASE #0 releng/13.0-ea31abc26 [...build info...]
It involves adding the missing refs, so you can fetch the other branch you want. See for example this answer on SO: https://stackoverflow.com/a/17937889


See also my answer here:
That looks cumbersome, to put it politely :oops:
Resorting to only cloning a single branch, or even doing a "shallow" clone, only makes sense if you really can't afford the disk space. A full git clone (including a working copy) is still smaller than a svn working copy of a single branch.
The required disk space is not exactly what I am concerned about. Required bandwidth on the other hand is, that's why I am keeping an on-premises mirror of the FreeBSD src git repo, that other machines use to update their /usr/src from. I am not working on those sources, they are merely there to pull updates and then rebuild the system (kernel/world) from it. There is hardly any point in ever going back, the only direction is forward (for example when 13.0-p1 will become available). So I just dont see the need to have that kind of history in something that is merely supposed to be a working copy.

Newer versions of git seem to imply --single-branch when using --depth <n> unless you explicitly use --no-single-branch. But would it make any difference in regards to the ability of switching to another branch when using git clone --no-single-branch --depth 1 ...?
 
Full clone makes sense for you then. Git is very efficient at sending you only new changes when you do a fetch.
Basically what I have is a full clone minus the checkout of any branches, a bare mirror of the FreeBSD git repo which I update through git remote update --prune that is then made available to my local network via git-daemon(1). All my other machines clone/update their /usr/src from this mirror, so it doesn't consume any external bandwidth at all. I just don't think each /usr/src on every machine really needs to be a full clone, when all I do is build from it.
 
Basically what I have is a full clone minus the checkout of any branches, a bare mirror of the FreeBSD git repo which I update through git remote update --prune...
It's a clone done with the -mirror option?
...I just don't think each /usr/src on every machine really needs to be a full clone, when all I do is build from it.
So do a shallow clone on the leaf nodes.
 
It's a clone done with the -mirror option?
Yes. When the switch to git came, I wanted to have an on-premises mirror just like I had with SVN, updated through svnsync before and CVS using cvsup before that.
So do a shallow clone on the leaf nodes.
Can you elaborate on what exactly those leaf nodes are? So far what I've used to clone /usr/src on my machines was git clone -o freebsd --depth 1 -b releng/13.0 <url> <dir>, just that for the <url> part I dont use git.freebsd.org but my internal mirror instead.
 
Can you elaborate on what exactly those leaf nodes are? So far what I've used to clone /usr/src on my machines was git clone -o freebsd --depth 1 -b releng/13.0 <url> <dir>, just that for the <url> part I dont use git.freebsd.org but my internal mirror instead.
You're doing a shallow clone on your leaf nodes. What I'm proposing is do a fresh shallow clone when you want to switch branches.
Code:
rm -Rf <dir>
git clone -o freebsd --depth 1 -b releng/13.1 <url> <dir>

If I understand your topology correctly, you have one machine on which you care about bandwidth, but not disk space. You have a full clone --mirror there. You have many machines on which you care about disk space, but not bandwidth. Do a fresh shallow clone on those when you need to switch branches.
 
You're doing a shallow clone on your leaf nodes. What I'm proposing is do a fresh shallow clone when you want to switch branches.
Code:
rm -Rf <dir>
git clone -o freebsd --depth 1 -b releng/13.1 <url> <dir>
That's what I ended up doing when switching from releng/12.2 to releng/13.0. Seems this is the quick and painless way of doing it then.
If I understand your topology correctly, you have one machine on which you care about bandwidth, but not disk space. You have a full clone --mirror there. You have many machines on which you care about disk space, but not bandwidth. Do a fresh shallow clone on those when you need to switch branches.
As this is my home network I would not exactly call it many machines, but in principle yes. My internet connection with ~7Mbps downstream is rather slow compared to any modern standards whatsoever and I'd hate to see bandwidth wasted, so it seems logical to do it that way.
 
Security

I should not recommend shallow clones.

Git, shallow clone hashes, commit counts and system/security updates (2021-03-02)
  • advice from Kevin Oberman
  • tl;dr "… shallow clone hash will not include the commit count which will be used in future security updates …"
More recently, from the FreeBSD Handbook (24.4.3. The N-number), with added emphasis:

Usually this number is not all that important. However, when bug fixes are committed, this number makes it easy to quickly determine whether the fix is present in the currently running system. Developers will often refer to the hash of the commit (or provide a URL which has that hash), but not the n-number since the hash is the easily visible identifier for a change while the n-number is not. Security advisories and errata notices will also note an n-number, which can be directly compared against your system. When you need to use shallow Git clones, you cannot compare n-numbers reliably as the git rev-list command counts all the revisions in the repository which a shallow clone omits.

Thanks to Warner Losh for his June 2021 update to the Handbook.
 
Back
Top