Using Github for the first time the lessons of 2019

Brian reminded me that it is time for the annual update on how to manage GitHub repositories and your open-source connection. It turns out there are only a few paths that don’t lead you to a dead end. After four years of using it, here’s a quick set of tips, tricks, and traps:

Identities

Have a personal identity and a business one. Have one address which has all the things that you take with you. Like your personal settings and things. You do not want to be in a position, where if you leave a company, you lose all your personal settings and things. Github is fundamentally tied to email names and your handle, so find a good one (see https://github.com/richtong)

Repository structuring or have a ~/ws/src.

There are a zillion notes on this, but in the end, as Sam says, the key question to ask is, how do I get to a reproducible build. That is, given any bare metal machine, encapsulate all the steps you need and make sure everything is checked in. If you have lots of repos that you use, there is no way to keep it all in sync, so most of the time, you want a single repo for your work. That way you can just git clone git@github.com:richtong/src  onto any machine and you are off to the races.

What does the directory structure look like on your home machine, well it is really useful to have a ~/ws in your home which is your current project. So that if say you work for a company, then you can symlink this to ~/ws.surround.io or whatever your current company is. That way the structure looks the same and if you are say a consultant, you can work on different projects in separate contexts.

Use Git LFS to manage large files

This is going to come up all the time, but it’s so easy to forget and suddenly you have a 1GB Zip file in some branch. When that happens, this carried around everywhere unless you radically rewrite things. So before you start, make sure you have Git large file storage turned on. On a Mac this means doing:

brew install git lfs
git lfs init
cd ~/ws/src
git lfs track

Managing updates to your own code

Never commit to your master directly. You want to run everything on a branch. There is a very stylized way to do this that looks like:

cd ~/ws/src
# make sure your checkouts include your name first
git checkout -b rich-script-fixes
# now make a bunch of changes
git commit -a
git push

In this way, you are making all your changes off your branch and you don’t pollute the master. Now run all your tests and things and when you are ready. That is you haven’t diverged too far then assuming that others may have changed things, this sequence protects you from that an cluttering the world with useless merges

# make sure all your changes are committed
git commit -a
git push
# now pull down the lastest master
git checkout master
# we are only fetching, this is because you 
git fetch -p
# it will now tell you if you can just fast forward
git pull --ff-only
# if there are changes, see what conflicts there are
git pull --rebase
# it will show each file that is conflict, you need edit those files
# once done, you can continue, until you are all done
git rebase --continue
# so now your local master changes are updated and push the new history
git push

At this point, you finally have a master on your local machine and have accounted for everything on the top, so now merge in your changes assuming the branch you’ve been working on is called rich-script-fixes

git checkout rich-script-fixes
# do the same as the above to make sure any changes on other machines are incorporated
git pull --ff-only
# If this does not work, then you need
git pull --rebase
# you the do the above the edit each file and fix conflicts
git rebase --continue 
git push

Now both the master and the branch are in sync with the GitHub server. We are now going to merge into master so that you get a nice history

git checkout rich-script-fixes
# this is the magic, take all the changes in the branch
# and add them into the master history 
git rebase master
# now rewrite the comments and reduct them
git rebase master -i
# you will drop into an editor and you can select
# which merges you want to squash, that is make all the 
# micro changes, big changes
# here is the magic, this says take the new history
# and make it the same as master
git push origin rich-script-fixes:master
# if this doesn't work it means that someone updated your master
# and you have to rebase again, go to the top of the list

Why go through all this brain damage. Well, first you want to commit early and often when you are debugging so you never lost anything, but in the end, you will find (as I do) that you say have 10 commits to a single file that are all literally deleting a character or changing a variable. When you do the git rebase master -i you are given an opportunity to change the history and make it simpler.

External repos with forks and submodules

GitHub makes it easy just to clone any open-source tool onto your machine. Resist the temptation to do that. If you do this, then you will never be able to figure out which build of what actually works and doesn’t work. Instead, go to the trouble of learning. If you do the normal thing which is just a git clone git@github.com:otherauthor/otherrepo you have a long term problem because you have nowhere that says what branch a version that repo was.

There are many tools to do this, but the simplest is using the standard set that git comes with which are submodules and then making sure that you create your own fork, so when you are building you know what you are building. There is a magic file call .gitmodules which stores the hash of the exact commit in your forked repo so when you checkout, you are also checking everything else out and you can guarantee reproducibility

# first in the github.com UI, find the other repo
# then click on the Fork button to make it your own
# put all the external repos in one place
mkdir -p ~/ws/git/extern
cd ~/ws/git/extern
git submodule add git@github.com:richtong/newproject
# Add the submodules exact point into the src history
git commit -a
# now create a connection to the original repo
git remote add upstream git@github.com:otherguy/newproject
# at this point, your src repo is tracking exact change
# use the same strategy here, if you need to make changes
cd newproject
git checkout -b rich-fixes-to-other-project
# make changes
git checkout master
git pull --ff-only
git checkout newproject
git pull --ff-only
git rebase master
git rebase master -i
git push origin rich-fixes-to-project master
# at this point you are hacking away on your fork.

Everything is working hunky-dory, but what if there are changing in the forked project, well, it is easy to pull them and down and rebase

# get the changes from upstream
git pull --ff-only upstream master
# if this fails, then some of the change you have made are conflicting
git rebase upstream/master 
# now the local has your changes plus the changes from the original project
git push origin master

I’m Rich & Co.

Welcome to Tongfamily, our cozy corner of the internet dedicated to all things technology and interesting. Here, we invite you to join us on a journey of tips, tricks, and traps. Let’s get geeky!

Let’s connect