Brian reminded me that it is time for the annual update on how to manage GitHub repositories and your open-source connection. It turns out there are only a few paths that don’t lead you to a dead end. After four years of using it, here’s a quick set of tips, tricks, and traps:
Identities
Have a personal identity and a business one. Have one address which has all the things that you take with you. Like your personal settings and things. You do not want to be in a position, where if you leave a company, you lose all your personal settings and things. Github is fundamentally tied to email names and your handle, so find a good one (see https://github.com/richtong)
Repository structuring or have a ~/ws/src
.
There are a zillion notes on this, but in the end, as Sam says, the key question to ask is, how do I get to a reproducible build. That is, given any bare metal machine, encapsulate all the steps you need and make sure everything is checked in. If you have lots of repos that you use, there is no way to keep it all in sync, so most of the time, you want a single repo for your work. That way you can just git clone git@github.com:richtong/src
onto any machine and you are off to the races.
What does the directory structure look like on your home machine, well it is really useful to have a ~/ws
in your home which is your current project. So that if say you work for a company, then you can symlink this to ~/ws.surround.io
or whatever your current company is. That way the structure looks the same and if you are say a consultant, you can work on different projects in separate contexts.
Use Git LFS to manage large files
This is going to come up all the time, but it’s so easy to forget and suddenly you have a 1GB Zip file in some branch. When that happens, this carried around everywhere unless you radically rewrite things. So before you start, make sure you have Git large file storage turned on. On a Mac this means doing:
brew install git lfs git lfs init cd ~/ws/src git lfs track
Managing updates to your own code
Never commit to your master directly. You want to run everything on a branch. There is a very stylized way to do this that looks like:
cd ~/ws/src # make sure your checkouts include your name first git checkout -b rich-script-fixes # now make a bunch of changes git commit -a git push
In this way, you are making all your changes off your branch and you don’t pollute the master. Now run all your tests and things and when you are ready. That is you haven’t diverged too far then assuming that others may have changed things, this sequence protects you from that an cluttering the world with useless merges
# make sure all your changes are committed git commit -a git push # now pull down the lastest master git checkout master # we are only fetching, this is because you git fetch -p # it will now tell you if you can just fast forward git pull --ff-only # if there are changes, see what conflicts there are git pull --rebase # it will show each file that is conflict, you need edit those files # once done, you can continue, until you are all done git rebase --continue # so now your local master changes are updated and push the new history git push
At this point, you finally have a master on your local machine and have accounted for everything on the top, so now merge in your changes assuming the branch you’ve been working on is called rich-script-fixes
git checkout rich-script-fixes # do the same as the above to make sure any changes on other machines are incorporated git pull --ff-only # If this does not work, then you need git pull --rebase # you the do the above the edit each file and fix conflicts git rebase --continue git push
Now both the master and the branch are in sync with the GitHub server. We are now going to merge into master so that you get a nice history
git checkout rich-script-fixes # this is the magic, take all the changes in the branch # and add them into the master history git rebase master # now rewrite the comments and reduct them git rebase master -i # you will drop into an editor and you can select # which merges you want to squash, that is make all the # micro changes, big changes # here is the magic, this says take the new history # and make it the same as master git push origin rich-script-fixes:master # if this doesn't work it means that someone updated your master # and you have to rebase again, go to the top of the list
Why go through all this brain damage. Well, first you want to commit early and often when you are debugging so you never lost anything, but in the end, you will find (as I do) that you say have 10 commits to a single file that are all literally deleting a character or changing a variable. When you do the git rebase master -i
you are given an opportunity to change the history and make it simpler.
External repos with forks and submodules
GitHub makes it easy just to clone any open-source tool onto your machine. Resist the temptation to do that. If you do this, then you will never be able to figure out which build of what actually works and doesn’t work. Instead, go to the trouble of learning. If you do the normal thing which is just a git clone git@github.com:otherauthor/otherrepo
you have a long term problem because you have nowhere that says what branch a version that repo was.
There are many tools to do this, but the simplest is using the standard set that git comes with which are submodules and then making sure that you create your own fork, so when you are building you know what you are building. There is a magic file call .gitmodules
which stores the hash of the exact commit in your forked repo so when you checkout, you are also checking everything else out and you can guarantee reproducibility
# first in the github.com UI, find the other repo # then click on the Fork button to make it your own # put all the external repos in one place mkdir -p ~/ws/git/extern cd ~/ws/git/extern git submodule add git@github.com:richtong/newproject # Add the submodules exact point into the src history git commit -a # now create a connection to the original repo git remote add upstream git@github.com:otherguy/newproject # at this point, your src repo is tracking exact change # use the same strategy here, if you need to make changes cd newproject git checkout -b rich-fixes-to-other-project # make changes git checkout master git pull --ff-only git checkout newproject git pull --ff-only git rebase master git rebase master -i git push origin rich-fixes-to-project master # at this point you are hacking away on your fork.
Everything is working hunky-dory, but what if there are changing in the forked project, well, it is easy to pull them and down and rebase
# get the changes from upstream git pull --ff-only upstream master # if this fails, then some of the change you have made are conflicting git rebase upstream/master # now the local has your changes plus the changes from the original project git push origin master