Git Shell in Windows: patch’s default character encoding is UCS-2 Little Endian – how to change this to ANSI or UTF-8 without BOM?

I’m not a Windows user, so take my answer with a grain of salt. According to the Windows PowerShell Cookbook, PowerShell preprocesses the output of git diff, splitting it in lines. Documentation of the Out-File Cmdlet suggests, that > is the same as | Out-File without parameters. We also find this comment in the PowerShell documentation:

The results of using the Out-File cmdlet may not be what you expect if you are used to traditional output redirection. To understand its behavior, you must be aware of the context in which the Out-File cmdlet operates.

By default, the Out-File cmdlet creates a Unicode file. This is the best default in the long run, but it means that tools that expect ASCII files will not work correctly with the default output format. You can change the default output format to ASCII by using the Encoding parameter:

[…]

Out-file formats file contents to look like console output. This causes the output to be truncated just as it is in a console window in most circumstances. […]

To get output that does not force line wraps to match the screen width, you can use the Width parameter to specify line width.

So, apparently it is not Git which chooses the character encoding, but Out-File. This suggests a) that PowerShell redirection really should only be used for text and b) that

| Out-File -encoding ASCII -Width 2147483647 my.patch

will avoid the encoding problems. However, this still does not solve the problem with Windows vs. Unix line-endings . There are Cmdlets (see the PowerShell Community Extensions) to do conversion of line-endings.

However, all this recoding does not increase my confidence in a patch (which has no encoding itself, but is just a string of bytes). The aforementioned Cookbook contains a script Invoke-BinaryProcess, which can be used redirect the output of a command unmodified.

To sidestep this whole issue, an alternative would be to use git format-patch instead of git diff. format-patch writes directly to a file (and not to stdout), so its output is not recoded. However, it can only create patches from commits, not arbitrary diffs.

format-patch takes a commit range (e.g. master^10..master^5) or a single commit (e.g. X, meaning X..HEAD) and creates patch files of the form NNNN-SUBJECT.patch, where NNNN is an increasing 4-digit number and subject is the (mangled) subject of the patch. An output directory can be specified with -o.

Leave a Comment