What is a bare repository and why would I need one?

Is there the whole data of a repository always within .git directory (or in a bare repo), in some kind of format which is able to render all files at any time?

Yes, those files and their complete history are stored in .git/packed-refs and .git/refs, and .git/objects.

When you clone a repo (bare or not), you always have the .git folder (or a folder with a .git extension for bare repo, by naming convention) with its Git administrative and control files. (see glossary)

Git can unpack at any time what it needs with git unpack-objects.

The trick is:

From a bare repo, you can query the logs (git log in a git bare repo works just fine: no need for a working tree), or list files in a bare repo.
Or show the content of a file from a bare repo.
That is how GitHub can render a page with files without having to checkout the full repo.

I don’t know that GitHub does exactly that though, as the sheer number of repos forces GitHub engineering team to do all kind of optimization.
See for instance how they optimized cloning/fetching a repo.
With DGit, those bare repos are actually replicated across multiple servers.

Is this the reason of bare repository, while working copy only has the files at a given time?

For GitHub, maintaining a working tree would cost too much in disk space, and in update (when each user request a different branch). It is best to extract from the unique bare repo what you need to render a page.

In general (outside of GitHub constraint), a bare repo is used for pushing, in order to avoid having a working tree out of sync with what has just been pushed. See “but why do I need a bare repo?” for a concrete example.

That being said:

since git 2.3 you could push to a non-bare repo (that would update the working tree accordingly)
since git 2.4, you can “push-to-deploy” (ie, it works for unborn branch as well)

But that would not be possible for GitHub, which cannot maintain one (or server) working tree(s) for each repo it has to store.

The article “Using a bare Git repo to get version control for my dotfiles ” from Greg Owen, originally reported by aifusenno1 adds:

A bare repository is a Git repository that does not have a snapshot.
It just stores the history. It also happens to store the history in a slightly different way (directly at the project root), but that’s not nearly as important.

A bare repository will still store your files (remember, the history has enough data to reconstruct the state of your files at any commit).
You can even create a non-bare repository from a bare repository: if you git clone a bare repository, Git will automatically create a snapshot for you in the new repository (if you want a bare repository, use git clone --bare).

And Greg adds:

So why would we use a bare Git repository?Permalink

Almost every explanation I found of bare repositories mentioned that they’re used for centralized storage of a repository that you want to share between multiple users.

See Git repository layout:

a <project>.git directory that is a bare repository (i.e. without its own working tree), that is typically used for exchanging histories with others by pushing into it and fetching from it.

Basically, if you wanted to write your own GitHub/GitLab/BitBucket, your centralized service would store each repo as a bare repository.
But why? How does not having a snapshot connect to sharing?

The answer is that there’s no need to have a snapshot if the only service that’s interacting with your repo is Git.
Basically, the snapshot is a convenience for humans and non-Git tools, but Git only interacts with the history. Your centralized Git hosting service will only interact with the repos through Git commands, so why bother materializing snapshots all the time? The snapshots only take up extra space for no gain.

GitHub generates that snapshot on the fly when you access that page, rather than storing it permanently with the repo (this means that GitHub only needs to generate a snapshot when you ask for it, rather than keeping one updated every time anybody pushes any changes).

With Git 2.38 (Q3 2022) introduces a safe.bareRepository configuration variable that allows users to forbid discovery of bare repositories.

See commit 8d1a744, commit 6061601, commit 5b3c650, commit 779ea93, commit 5f5af37 (14 Jul 2022) by Glen Choo (chooglen).
^{(Merged by Junio C Hamano — gitster — in commit 18bbc79, 22 Jul 2022)}

setup.c: create safe.bareRepository

^{Signed-off-by: Glen Choo}

There is a known social engineering attack that takes advantage of the fact that a working tree can include an entire bare repository, including a config file.
A user could run a Git command inside the bare repository thinking that the config file of the ‘outer’ repository would be used, but in reality, the bare repository’s config file (which is attacker-controlled) is used, which may result in arbitrary code execution.
See this thread for a fuller description and deeper discussion.

A simple mitigation is to forbid bare repositories unless specified via --git-dir or GIT_DIR.
In environments that don’t use bare repositories, this would be minimally disruptive.

Create a config variable, safe.bareRepository, that tells Git whether or not to die() when working with a bare repository.
This config is an enum of:

“all”: allow all bare repositories (this is the default)

“explicit”: only allow bare repositories specified via --git-dir or GIT_DIR.

If we want to protect users from such attacks by default, neither value will suffice – “all” provides no protection, but “explicit” is impractical for bare repository users.
A more usable default would be to allow only non-embedded bare repositories (this thread contains one such proposal), but detecting if a repository is embedded is potentially non-trivial, so this work is not implemented in this series.

git config now includes in its man page:

safe.bareRepository

Specifies which bare repositories Git will work with. The currently
supported values are:

all: Git works with all bare repositories. This is the default.

explicit: Git only works with bare repositories specified via
the top-level --git-dir command-line option, or the GIT_DIR
environment variable.

If you do not use bare repositories in your workflow, then it may be
beneficial to set safe.bareRepository to explicit in your global
config. This will protect you from attacks that involve cloning a
repository that contains a bare repository and running a Git command
within that directory.

This config setting is only respected in protected configuration (see definition). This prevents the untrusted repository from tampering with this value.

What is a bare repository and why would I need one?

So why would we use a bare Git repository?Permalink

`setup.c`: create `safe.bareRepository`

`safe.bareRepository`

Leave a Comment Cancel reply

So why would we use a bare Git repository?Permalink

setup.c: create safe.bareRepository

safe.bareRepository

More Related Contents:

Leave a Comment Cancel reply

`setup.c`: create `safe.bareRepository`

`safe.bareRepository`