How does Git’s transfer protocol work

Note: starting Git 2.18 (Q2 2018), the git transfer protocol evolves with a v2 which is implemented.
With Git 2.26 (Q1 2020), it is the default. It is not in 2.27 (Q2 2020, see the end of this answer, and the follow-up answer). It is again in 2.28 (Q3 2020)

See commit a4d78ce, commit 0f1dc53, commit 237ffed, commit 884e586, commit 8ff14ed, commit 49e85e9, commit f08a5d4, commit f1f4d8a, commit edc9caf, commit 176e85c, commit b1c2edf, commit 1aa8dde, commit 40fc51e, commit f7e2050, commit 685fbd3, commit 3145ea9, commit 5b872ff, commit 230d7dd, commit b4be741, commit 1af8ae1 (15 Mar 2018) by Brandon Williams (mbrandonw).
(Merged by Junio C Hamano — gitster in commit 9bfa0f9, 08 May 2018)

The full specification is in Documentation/technical/protocol-v2.txt:

Protocol v2 will improve upon v1 in the following ways:

  • Instead of multiple service names, multiple commands will be
    supported by a single service
  • Easily extendable as capabilities are moved into their own section
    of the protocol, no longer being hidden behind a NUL byte and
    limited by the size of a pkt-line
  • Separate out other information hidden behind NUL bytes (e.g. agent
    string as a capability and symrefs can be requested using ‘ls-refs’)
  • Reference advertisement will be omitted unless explicitly requested
  • ls-refs command to explicitly request some refs
  • Designed with http and stateless-rpc in mind. With clear flush
    semantics the http remote helper can simply act as a proxy

In protocol v2 communication is command oriented.
When first contacting a server a list of capabilities will advertised. Some of these capabilities will be commands which a client can request be executed. Once a command has completed, a client can reuse the connection and request that other commands be executed.

info/refs remains server endpoint to be queried by a client, as explained in HTTP Transport section:

When using the http:// or https:// transport a client makes a “smart”
info/refs request as described in http-protocol.txt and requests that
v2 be used by supplying “version=2” in the Git-Protocol header.

C: Git-Protocol: version=2
C:
C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0

A v2 server would reply:

   S: 200 OK
   S: <Some headers>
   S: ...
   S:
   S: 000eversion 2\n
   S: <capability-advertisement>

Subsequent requests are then made directly to the service
$GIT_URL/git-upload-pack. (This works the same for git-receive-pack).

The goal is to have more capabilities:

There are two different types of capabilities:

  • normal capabilities, which can be used to to convey information or alter the behavior of a request, and
  • commands, which are the core actions that a client wants to
    perform (fetch, push, etc).

Protocol version 2 is stateless by default.
This means that all commands must only last a single round and be stateless from the perspective of the server side, unless the client has requested a capability indicating that state should be maintained by the server.

Clients MUST NOT require state management on the server side in order to function correctly.
This permits simple round-robin load-balancing on the server side, without
needing to worry about state management.

Finally:

ls-refs is the command used to request a reference advertisement in v2.
Unlike the current reference advertisement, ls-refs takes in arguments
which can be used to limit the refs sent from the server.

And:

fetch is the command used to fetch a packfile in v2.
It can be looked at as a modified version of the v1 fetch where the ref-advertisement is stripped out (since the ls-refs command fills that role) and the message format is tweaked to eliminate redundancies and permit easy
addition of future extensions.


Since that commit (May 10th), the protocol V2 has officially been announced (May 28th) in the Google blog post “Introducing Git protocol version 2” by Brandon Williams.

In both cases:

Additional features not supported in the base command will be advertised
as the value of the command in the capability advertisement in the form
of a space separated list of features: “<command>=<feature 1> <feature 2>


See also commit 5e3548e, commit ff47322, commit ecc3e53 (23 Apr 2018) by Brandon Williams (mbrandonw).
(Merged by Junio C Hamano — gitster in commit 41267e9, 23 May 2018)

serve: introduce the server-option capability

Introduce the “server-option” capability to protocol version 2.
This enables future clients the ability to send server specific options in
command requests when using protocol version 2.

fetch: send server options when using protocol v2

Teach fetch to optionally accept server options by specifying them on
the cmdline via ‘-o‘ or ‘--server-option‘.
These server options are sent to the remote end when performing a fetch communicating using protocol version 2.

If communicating using a protocol other than v2 the provided options are
ignored and not sent to the remote end.

Same is done for git ls-remote.


And the transfer protocol v2 learned to support the partial clone seen in Dec. 2017 with Git 2.16.

See commit ba95710, commit 5459268 (03 May 2018), and commit 7cc6ed2 (02 May 2018) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano — gitster in commit 54db5c0, 30 May 2018)

{fetch,upload}-pack: support filter in protocol v2

The fetch-pack/upload-pack protocol v2 was developed independently of
the filter parameter (used in partial fetches), thus it did not include
support for it. Add support for the filter parameter.

Like in the legacy protocol, the server advertises and supports “filter
only if uploadpack.allowfilter is configured.

Like in the legacy protocol, the client continues with a warning if
--filter” is specified, but the server does not advertise it.


Git 2.19 (Q3 2018) improves the fetch part of the git transfer protocol v2:

See commit ec06283, commit d093bc7, commit d30fe89, commit af1c90d, commit 21bcf6e (14 Jun 2018), and commit af00855, commit 34c2903 (06 Jun 2018) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano — gitster in commit af8ac73, 02 Aug 2018)

fetch-pack: introduce negotiator API

Introduce the new files fetch-negotiator.{h,c}, which contains an API
behind which the details of negotiation are abstracted

fetch-pack: use ref adv. to prune “have” sent

In negotiation using protocol v2, fetch-pack sometimes does not make
full use of the information obtained in the ref advertisement:
specifically, that if the server advertises a commit that the client
also has, the client never needs to inform the server that it has the
commit’s parents, since it can just tell the server that it has the
advertised commit and it knows that the server can and will infer the
rest.


Git 2.20 (Q4 2018) fixes git ls-remotes:

See commit 6a139cd, commit 631f0f8 (31 Oct 2018) by Jeff King (peff).
(Merged by Junio C Hamano — gitster in commit 81c365b, 13 Nov 2018)

git ls-remote $there foo was broken by recent update for the
protocol v2 and stopped showing refs that match ‘foo‘ that are not
refs/{heads,tags}/foo, which has been fixed.


And Git 2.20 fixes git fetch, which was a bit loose in parsing responses from the other side when talking over the protocol v2.

See commit 5400b2a (19 Oct 2018) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano — gitster in commit 67cf2fa, 13 Nov 2018)

fetch-pack: be more precise in parsing v2 response

Each section in a protocol v2 response is followed by either a DELIM packet (indicating more sections to follow) or a FLUSH packet (indicating none to follow).

But when parsing the “acknowledgments” section, do_fetch_pack_v2() is liberal in accepting both, but determines whether to continue reading or not based solely on the contents of the “acknowledgments” section, not on whether DELIM or FLUSH was read.

There is no issue with a protocol-compliant server, but can result in confusing error messages when communicating with a server that serves unexpected additional sections. Consider a server that sends “new-section” after “acknowledgments“:

  • client writes request
    • client reads the “acknowledgments” section which contains no “ready”,
      then DELIM
    • since there was no “ready”, client needs to continue negotiation, and
      writes request
    • client reads “new-section“, and reports to the end user “expected
      ‘acknowledgments’, received ‘new-section‘”

For the person debugging the involved Git implementation(s), the error
message is confusing in that “new-section” was not received in response
to the latest request, but to the first one.

One solution is to always continue reading after DELIM, but in this case, we can do better.

We know from the protocol that:

  • “ready” means at least the packfile section is coming (hence, DELIM) and that:
  • no “ready” means that no sections are to follow (hence, FLUSH).

So teach process_acks() to enforce this.


Git 2.21 will bring an actual official support of the V2 protocol for fetch pack:

See commit e20b419 (18 Dec 2018) by Jeff King (peff).
(Merged by Junio C Hamano — gitster in commit d3b0178, 29 Jan 2019)

fetch-pack: support protocol version 2

When the scaffolding for protocol version 2 was initially added in
8f6982b (“protocol: introduce enum protocol_version value
protocol_v2″, 2018-03-14, Git v2.18). As seen in:

git log -p -G'support for protocol v2 not implemented yet' --full-diff --reverse v2.17.0..v2.20.0

Many of those scaffolding “die” placeholders were removed, but we
hadn’t gotten around to fetch-pack yet.

The test here for “fetch refs from cmdline” is very minimal. There’s
much better coverage when running the entire test suite under the WIP
GIT_TEST_PROTOCOL_VERSION=2 mode
, we should ideally have better
coverage without needing to invoke a special test mode.


Git 2.22 (Q2 2019) adds: “git clone” learned a new --server-option option when talking over the protocol version 2.

See commit 6e98305, commit 35eb824 (12 Apr 2019) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano — gitster in commit 6d3df8e, 08 May 2019)

clone: send server options when using protocol v2

Commit 5e3548e (“fetch: send server options when using protocol v2″,
2018-04-24, Git v2.18.0-rc0) taught “fetch” the ability to send server options when using protocol v2, but not “clone“.
This ability is triggered by “-o” or “--server-option“.

Teach “clone” the same ability, except that because “clone” already
has “-o” for another parameter, teach “clone” only to receive “--server-option“.

Explain in the documentation, both for clone and for fetch, that server
handling of server options are server-specific.
This is similar to receive-pack‘s handling of push options – currently, they are just sent to hooks to interpret as they see fit.


Note: Git 2.12 has introduced a git serve command in commit ed10cb9 by Brandon Williams:

serve: introduce git-serve

Introduce git-serve, the base server for protocol version 2.

Protocol version 2 is intended to be a replacement for Git’s current
wire protocol.
The intention is that it will be a simpler, less wasteful protocol which can evolve over time.

Protocol version 2 improves upon version 1 by eliminating the initial
ref advertisement.
In its place a server will export a list of capabilities and commands which it supports in a capability advertisement.
A client can then request that a particular command be executed by providing a number of capabilities and command specific parameters.
At the completion of a command, a client can request that another command be executed or can terminate the connection by sending a flush packet.

But… Git 2.22 does amend that, with commit b7ce24d by Johannes Schindelin:

Turn git serve into a test helper

The git serve built-in was introduced in ed10cb9 (serve:
introduce git-serve, 2018-03-15, Git v2.18.0-rc0) as a backend to serve Git protocol v2, probably originally intended to be spawned by git upload-pack.

However, in the version that the protocol v2 patches made it into core
Git, git upload-pack calls the serve() function directly instead of
spawning git serve; The only reason in life for git serve to survive
as a built-in command is to provide a way to test the protocol v2
functionality.

Meaning that it does not even have to be a built-in that is installed
with end-user facing Git installations, but it can be a test helper
instead.

Let’s make it so.


Git 2.23 (Q2 2019) will make update-server-info more efficient, since it learned not to rewrite the file with the same contents.

See commit f4f476b (13 May 2019) by Eric Wong (ele828).
(Merged by Junio C Hamano — gitster in commit 813a3a2, 13 Jun 2019)

update-server-info: avoid needless overwrites

Do not change the existing info/refs and objects/info/packs files if they match the existing content on the filesystem.
This is intended to preserve mtime and make it easier for dumb HTTP pollers to rely on the If-Modified-Since header.

Combined with stdio and kernel buffering; the kernel should be
able to avoid block layer writes and reduce wear for small files.

As a result, the --force option is no longer needed.
So stop documenting it, but let it remain for compatibility (and
debugging, if necessary).

And Git 2.22.1 will also fix the server side support for “git fetch“, which used to show incorrect value for the HEAD symbolic ref when the namespace feature is in
use.

See commit 533e088 (23 May 2019) by Jeff King (peff).
(Merged by Junio C Hamano — gitster in commit 5ca0db3, 25 Jul 2019)

upload-pack: strip namespace from symref data

Since 7171d8c (upload-pack: send symbolic ref information as
capability, 2013-09-17, Git v1.8.4.3), we’ve sent cloning and fetching clients special information about which branch HEAD is pointing to, so that they don’t
have to guess based on matching up commit ids.

However, this feature has never worked properly with the GIT_NAMESPACE
feature. Because upload-pack uses head_ref_namespaced(find_symref), we
do find and report on refs/namespaces/foo/HEAD instead of the actual
HEAD of the repo.
This makes sense, since the branch pointed to by the top-level HEAD may not be advertised at all.

But we do two things wrong:

  1. We report the full name refs/namespaces/foo/HEAD, instead of just HEAD.
    Meaning no client is going to bother doing anything with that symref, since we’re not otherwise advertising it.
  2. We report the symref destination using its full name (e.g., refs/namespaces/foo/refs/heads/master). That’s similarly useless to the client, who only saw “refs/heads/master” in the advertisement.

We should be stripping the namespace prefix off of both places (which
this patch fixes)
.

Likely nobody noticed because we tend to do the right thing anyway.
Bug (1) means that we said nothing about HEAD (just refs/namespace/foo/HEAD).
And so the client half of the code, from a45b5f0 (connect: annotate
refs with their symref information in get_remote_head(), 2013-09-17, Git v1.8.4.3), does not annotate HEAD, and we use the fallback in guess_remote_head(), matching refs by object id.
Which is usually right. It only falls down in ambiguous cases, like the one laid out in the included test.

This also means that we don’t have to worry about breaking anybody who
was putting pre-stripped names into their namespace symrefs when we fix
bug (2).
Because of bug (1), nobody would have been using the symref we
advertised in the first place (not to mention that those symrefs would
have appeared broken for any non-namespaced access).

Note that we have separate fixes here for the v0 and v2 protocols.
The symref advertisement moved in v2 to be a part of the ls-refs command.
This actually gets part (1) right, since the symref annotation piggy-backs on the existing ref advertisement, which is properly stripped.
But it still needs a fix for part (2).


With Git 2.25.1 (Feb. 2020), the unnecessary round-trip when running “ls-remote” over the stateless RPC mechanism is reduced.

See discussion:

A colleague (Jon Simons) today pointed out an interesting behavior of
git ls-remote with protocol v2: it makes a second POST request and sends
only a flush packet.
This can be demonstrated with the following:

GIT_CURL_VERBOSE=1 git -c protocol.version=2 ls-remote origin

The Content-Length header on the second request will be exactly 4 bytes.

See commit 4d8cab9 (08 Jan 2020) by Jeff King (peff).
(Merged by Junio C Hamano — gitster in commit 45f47ff, 22 Jan 2020)

transport: don’t flush when disconnecting stateless-rpc helper

Signed-off-by: Jeff King

Since ba227857d2 (“Reduce the number of connects when fetching”, 2008-02-04, Git v1.5.5-rc0 — merge), when we disconnect a git transport, we send a final flush packet.
This cleanly tells the other side that we’re done, and avoids the other side complaining “the remote end hung up unexpectedly” (though we’d only see that for transports that pass along the server stderr, like ssh or local-host).

But when we’ve initiated a v2 stateless-connect session over a transport helper, there’s no point in sending this flush packet. Each operation we’ve performed is self-contained, and the other side is fine with us hanging up between operations.

But much worse, by sending the flush packet we may cause the helper to issue an entirely new request _just_ to send the flush packet. So we can incur an extra network request just to say “by the way, we have nothing more to send”.

Let’s drop this extra flush packet. As the test shows, this reduces the number of POSTs required for a v2 ls-remote over http from 2 to 1.


With Git 2.26 (Q1 2020), The test-lint machinery knew to check “VAR=VAL shell_function" construct, but did not check “VAR= shell_function", which has been corrected.

See commit d6509da, commit a7fbf12, commit c7973f2 (26 Dec 2019) by Jonathan Nieder (artagnon).
(Merged by Junio C Hamano — gitster in commit c7372c9, 30 Jan 2020)

fetch test: mark test of “skipping” haves as v0-only

Signed-off-by: Jonathan Nieder

Since 633a53179e (fetch test: avoid use of “VAR= cmd” with a shell function, 2019-12-26), t5552.5 (do not send “have” with ancestors of commits that server ACKed) fails when run with GIT_TEST_PROTOCOL_VERSION=2.

The cause:

The progression of “have“s sent in negotiation depends on whether we are using a stateless RPC based transport or a stateful bidirectional one (see for example 44d8dc54e7, “Fix potential local deadlock during fetch-pack”, 2011-03-29, Git v1.7.5-rc0).

In protocol v2, all transports are stateless transports, while in protocol v0, transports such as local access and SSH are stateful.

In stateful transports, the number of “have“s to send multiplies by two each round until we reach PIPESAFE_FLUSH (that is, 32), and then it increases by PIPESAFE_FLUSH each round.

In stateless transport, the count multiplies by two each round until we reach LARGE_FLUSH (which is 16384) and then multiplies by 1.1 each round after that.

Moreover, in stateful transports, as fetch-pack.c explains:

We keep one window “ahead” of the other side, and will wait for an ACK only on the next one.

This affects t5552.5 because it looks for “have“s from the negotiator that appear in that second window.

With protocol version 2, the second window never arrives, and the test fails.

Until 633a53179e (2019-12-26), a previous test in the same file contained

GIT_TEST_PROTOCOL_VERSION= trace_fetch client origin to_fetch

In many common shells (e.g. bash when run as “sh“), the setting of GIT_TEST_PROTOCOL_VERSION to the empty string lasts beyond the intended duration of the trace_fetch invocation.

This causes it to override the GIT_TEST_PROTOCOL_VERSION setting that was passed in to the test during the remainder of the test script, so t5552.5 never got run using protocol v2 on those shells, regardless of the GIT_TEST_PROTOCOL_VERSION setting from the environment.

633a53179e fixed that, revealing the failing test.

Leave a Comment