What are the file limits in Git (number and size)?

This message from Linus himself can help you with some other limits

[…] CVS, ie it really ends up being pretty much oriented to a “one file
at a time” model.

Which is nice in that you can have a million files, and then only check
out a few of them – you’ll never even see the impact of the other
999,995 files.

Git
fundamentally never really looks at less than the whole repo. Even if you
limit things a bit (ie check out just a portion, or have the history go
back just a bit), git ends up still always caring about the whole thing,
and carrying the knowledge around.

So git scales really badly if you force it to look at everything as one
huge repository. I don’t think that part is really fixable, although we
can probably improve on it.

And yes, then there’s the “big file” issues. I really don’t know what to
do about huge files. We suck at them, I know.

See more in my other answer: the limit with Git is that each repository must represent a “coherent set of files“, the “all system” in itself (you can not tag “part of a repository”).
If your system is made of autonomous (but inter-dependent) parts, you must use submodules.

As illustrated by Talljoe’s answer, the limit can be a system one (large number of files), but if you do understand the nature of Git (about data coherency represented by its SHA-1 keys), you will realize the true “limit” is a usage one: i.e, you should not try to store everything in a Git repository, unless you are prepared to always get or tag everything back. For some large projects, it would make no sense.

For a more in-depth look at git limits, see “git with large files“
(which mentions git-lfs: a solution to store large files outside the git repo. GitHub, April 2015)

The three issues that limits a git repo:

huge files (the xdelta for packfile is in memory only, which isn’t good with large files)
huge number of files, which means, one file per blob, and slow git gc to generate one packfile at a time.
huge packfiles, with a packfile index inefficient to retrieve data from the (huge) packfile.

A more recent thread (Feb. 2015) illustrates the limiting factors for a Git repo:

Will a few simultaneous clones from the central server also slow down other concurrent operations for other users?

There are no locks in server when cloning, so in theory cloning does not affect other operations. Cloning can use lots of memory though (and a lot of cpu unless you turn on reachability bitmap feature, which you should).

Will ‘git pull‘ be slow?

If we exclude the server side, the size of your tree is the main factor, but your 25k files should be fine (linux has 48k files).

‘git push‘?

This one is not affected by how deep your repo’s history is, or how wide your tree is, so should be quick..

Ah the number of refs may affect both git-push and git-pull.
I think Stefan knows better than I in this area.

‘git commit‘? (It is listed as slow in reference 3.)
‘git status‘? (Slow again in reference 3 though I don’t see it.)
(also git-add)

Again, the size of your tree. At your repo’s size, I don’t think you need to worry about it.

Some operations might not seem to be day-to-day but if they are called frequently by the web front-end to GitLab/Stash/GitHub etc then they can become bottlenecks. (e.g. ‘git branch --contains‘ seems terribly adversely affected by large numbers of branches.)

git-blame could be slow when a file is modified a lot.

More Related Contents:

Leave a Comment Cancel reply