What are git's thin packs?

For the record, the man page (index-pack) states:

It is possible for git-pack-objects to build “thin” pack, which records objects in deltified form based on objects not included in the pack to reduce network traffic.
Those objects are expected to be present on the receiving end and they must be included in the pack for that pack to be self contained and indexable.

That would complete the git push man page of the --thin option:

Thin transfer spends extra cycles to minimize the number of objects to be sent and meant to be used on slower connection

So a “slow network” in this case is a connection where you want to send the lowest amount of data as possible.

See more at “Git fetch for many files is slow against a high-latency disk“.

In this thread, Jakub Narębski explains a bit more (in the context on using git gc on the remote side as well as on the local side):

Git does deltification only in packfiles.
But when you push via SSH, git would generate a pack file with commits the other side doesn’t have, and those packs are thin packs, so they also have deltas…
but the remote side then adds bases to those thin packs making them standalone.

More precisely:

On the local side:
git-commit creates loose (compressed, but not deltified) objects. git-gc packs and deltifies.

On the remote side (for smart protocols, i.e. git and ssh):
git creates thin pack, deltified;
on the remote side git either makes pack thick/self contained by adding base objects (object + deltas), or explodes pack into loose object (object).
You need git-gc on remote server to fully deltify on remote side. But transfer is fully
deltified.

On the remote side (for dumb protocols, i.e. rsync and http):
git finds required packs and transfers them whole.
So the situation is like on local side, but git might transfer more than really needed because it transfers packs in full.

The problem above was related to the use (or non-use) of git push --thin: when do you use it or not?
Turns out you do need to carefully manage your binary objects if you want git to take advantage of those thin packets:

Create the new filename by just copying the old (so the old blob is used)

commit

PUSH

copy the real new file

commit

PUSH.

If you omit the middle PUSH in step 3, neither “git push“, nor “git push --thin”
can realize that this new file can be “incrementally built” on the remote side (even though git-gc totally squashes it in the pack).

In fact, the way thin packs work is to store delta against a base object which is not included in the pack.
Those objects which are not included but used as delta base are currently only the previous version of a file which is part of the update to be pushed/fetched.
In other words, there must be a previous version under the same name for this to work.
Doing otherwise wouldn’t scale if the previous commit had thousands of files to test against.

Those thin packs were designed for different versions of the same file in mind, not different files with almost the same content. The issue is to decide what preferred delta base to add to the list of objects. Currently only objects with the same path as those being modified are considered.

What are git’s thin packs?

Leave a Comment Cancel reply

More Related Contents:

Leave a Comment Cancel reply