How to split a git repository while preserving subdirectories?

You could indeed use the subdirectory filter followed by an index filter to put the contents back into a subdirectory, but why bother, when you could just use the index filter by itself?

Here’s an example from the man page:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD

This just removes one filename; what you want to do is remove everything but a given subdirectory. If you want to be cautious, you could explicitly list each path to remove, but if you want to just go all-in, you can just do something like this:

git filter-branch --index-filter 'git ls-tree -z --name-only --full-tree $GIT_COMMIT | grep -zv "^directory-to-keep$" | xargs -0 git rm --cached -r' -- --all

I expect there’s probably a more elegant way; if anyone has something please suggest it!

A few notes on that command:

  • filter-branch internally sets GIT_COMMIT to the current commit SHA1
  • I wouldn’t have expected --full-tree to be necessary, but apparently filter-branch runs the index-filter from the .git-rewrite/t directory instead of the top level of the repo.
  • grep is probably overkill, but I don’t think it’s a speed issue.
  • --all applies this to all refs; I figure you really do want that. (the -- separates it from the filter-branch options)
  • -z and -0 tell ls-tree, grep, and xargs to use NUL termination to handle spaces in filenames.

Edit, much later: Thomas helpfully suggested a way to remove the now-empty commits, but it’s now out of date. Look at the edit history if you’ve got an old version of git, but with modern git, all you need to do is tack on this option:

--prune-empty

That’ll remove all commits which are empty after the application of the index filter.

Leave a Comment