Brian reminded me that it is time for the annual update on how to manage GitHub repositories and your open-source connection. It turns out there are only a few paths that don’t lead you to a dead end. After four years of using it, here’s a quick set of tips, tricks, and traps:
Identities
Have a personal identity and a business one. Have one address which has all the things that you take with you. Like your personal settings and things. You do not want to be in a position, where if you leave a company, you lose all your personal settings and things. Github is fundamentally tied to email names and your handle, so find a good one (see https://github.com/richtong)
Repository structuring or have a ~/ws/src
.
There are a zillion notes on this, but in the end, as Sam says, the key question to ask is, how do I get to a reproducible build. That is, given any bare metal machine, encapsulate all the steps you need and make sure everything is checked in. If you have lots of repos that you use, there is no way to keep it all in sync, so most of the time, you want a single repo for your work. That way you can just git clone git@github.com:richtong/src
onto any machine and you are off to the races.
What does the directory structure look like on your home machine, well it is really useful to have a ~/ws
in your home which is your current project. So that if say you work for a company, then you can symlink this to ~/ws.surround.io
or whatever your current company is. That way the structure looks the same and if you are say a consultant, you can work on different projects in separate contexts.
Use Git LFS to manage large files
This is going to come up all the time, but it’s so easy to forget and suddenly you have a 1GB Zip file in some branch. When that happens, this carried around everywhere unless you radically rewrite things. So before you start, make sure you have Git large file storage turned on. On a Mac this means doing:
brew install git lfsâŠgit lfs initâŠcd ~/ws/srcâŠgit lfs track
Managing updates to your own code
Never commit to your master directly. You want to run everything on a branch. There is a very stylized way to do this that looks like:
cd ~/ws/srcâŠ# make sure your checkouts include your name firstâŠgit checkout -b rich-script-fixesâŠ# now make a bunch of changesâŠgit commit -aâŠgit push
In this way, you are making all your changes off your branch and you don’t pollute the master. Now run all your tests and things and when you are ready. That is you haven’t diverged too far then assuming that others may have changed things, this sequence protects you from that an cluttering the world with useless merges
# make sure all your changes are committedâŠgit commit -aâŠgit pushâŠ# now pull down the lastest masterâŠgit checkout masterâŠ# we are only fetching, this is because you âŠgit fetch -pâŠ# it will now tell you if you can just fast forwardâŠgit pull --ff-onlyâŠ# if there are changes, see what conflicts there areâŠgit pull --rebaseâŠ# it will show each file that is conflict, you need edit those filesâŠ# once done, you can continue, until you are all doneâŠgit rebase --continueâŠ# so now your local master changes are updated and push the new historyâŠgit push
At this point, you finally have a master on your local machine and have accounted for everything on the top, so now merge in your changes assuming the branch you’ve been working on is called rich-script-fixes
git checkout rich-script-fixesâŠ# do the same as the above to make sure any changes on other machines are incorporatedâŠgit pull --ff-onlyâŠ# If this does not work, then you needâŠgit pull --rebaseâŠ# you the do the above the edit each file and fix conflictsâŠgit rebase --continue âŠgit push
Now both the master and the branch are in sync with the GitHub server. We are now going to merge into master so that you get a nice history
git checkout rich-script-fixesâŠ# this is the magic, take all the changes in the branchâŠ# and add them into the master history âŠgit rebase masterâŠ# now rewrite the comments and reduct themâŠgit rebase master -iâŠ# you will drop into an editor and you can selectâŠ# which merges you want to squash, that is make all the âŠ# micro changes, big changesâŠ# here is the magic, this says take the new historyâŠ# and make it the same as masterâŠgit push origin rich-script-fixes:masterâŠ# if this doesn't work it means that someone updated your masterâŠ# and you have to rebase again, go to the top of the list
Why go through all this brain damage. Well, first you want to commit early and often when you are debugging so you never lost anything, but in the end, you will find (as I do) that you say have 10 commits to a single file that are all literally deleting a character or changing a variable. When you do the git rebase master -i
you are given an opportunity to change the history and make it simpler.
External repos with forks and submodules
GitHub makes it easy just to clone any open-source tool onto your machine. Resist the temptation to do that. If you do this, then you will never be able to figure out which build of what actually works and doesn’t work. Instead, go to the trouble of learning. If you do the normal thing which is just a git clone git@github.com:otherauthor/otherrepo
you have a long term problem because you have nowhere that says what branch a version that repo was.
There are many tools to do this, but the simplest is using the standard set that git comes with which are submodules and then making sure that you create your own fork, so when you are building you know what you are building. There is a magic file call .gitmodules
which stores the hash of the exact commit in your forked repo so when you checkout, you are also checking everything else out and you can guarantee reproducibility
# first in the github.com UI, find the other repoâŠ# then click on the Fork button to make it your ownâŠ# put all the external repos in one placeâŠmkdir -p ~/ws/git/externâŠcd ~/ws/git/externâŠgit submodule add git@github.com:richtong/newprojectâŠ# Add the submodules exact point into the src historyâŠgit commit -aâŠ# now create a connection to the original repoâŠgit remote add upstream git@github.com:otherguy/newprojectâŠ# at this point, your src repo is tracking exact changeâŠ# use the same strategy here, if you need to make changesâŠcd newprojectâŠgit checkout -b rich-fixes-to-other-projectâŠ# make changesâŠgit checkout masterâŠgit pull --ff-onlyâŠgit checkout newprojectâŠgit pull --ff-onlyâŠgit rebase masterâŠgit rebase master -iâŠgit push origin rich-fixes-to-project masterâŠ# at this point you are hacking away on your fork.
Everything is working hunky-dory, but what if there are changing in the forked project, well, it is easy to pull them and down and rebase
# get the changes from upstreamâŠgit pull --ff-only upstream masterâŠ# if this fails, then some of the change you have made are conflictingâŠgit rebase upstream/master âŠ# now the local has your changes plus the changes from the original projectâŠgit push origin master