Ok there is a place that is actually beyond the bleeding edge and I guess I’m right there. Fortunately there are amazing folks like Luxas, Rhess and the folks at Hypriot that are right there with me, but here is what I’ve learned about docker and its internals and how it handles clustering. But
Right now for production despite what it says the best choice is to use docker swarm with their hosted
token discovery. This seems pretty stable and safe as you get a nice long 32-bit GUID. For development, I’m working on figuring out which of the many Kubernetes installations will actually work.
- docker by default is not network aware. There is a magic file called /etc/default/docker (at least when you use HypriotOS which is a customization of Raspbian) which has the docker daemon settings. Normally this is not set to listen to any external ports. It needs a variable
DOCKER_OPTSto be set
- The simplest thing to make it aware is to use docker-machine. This will connect to any docker using the
-d genericdriver and it will automatically set up a secure port note that this is insecure, so docker-machine does a opens socket 2376 and adds the TLS certificates so you can access it safely. If you do not care about security that you can set any docker client to look at a machine by setting
export DOCKER_HOST=tcp://machine.local:2375note that you can use dns names. If you use docker-machine, then it manages the certificates for you and uses the secure 2376
- If you are using docker swarm, then you need to access the cluster on a different secure port
-H tcp://0.0.0.0:3376while the 2376 ports remain available. This has to be on the swarm master. This is all set by the
--swarmto for each swarm member and
--swarm-masterfor the main one. The whole thing works because you have a swarm docker image which manages everything.
So this will give you basic access to docker over the network. But it doesn’t let you access multiple machines as one. This is what clustering all the about and where all the fun begins.
Here are the current traps with Hypriot
Since raspbian does not have docker, the good folks at http://hypriot.com were good enough to port it to the Pi. They also have a version of raspbian called hypriotOS which basically adds their repo and then installs their packages
As with all things there are some gotchas and here the strange ones:
- If you install cluster-lab, then be careful about you deinstall. You *must* first
systemctl stop cluster-labwhich puts back the
/etc/default/dockerfile correctly. And then do a
systemctl disable cluster-lab.If you do it in the wrong order you will get a back file there and you need to fix it by removing the cluster-lab port -H in that file. The error is pretty obscure, you get a cannot start docker message, no docker host.
- As an aside, if the docker daemon doesn’t start, you only get a very short message, you need to run
systemctl status --all dockerto get the complete list, in this case, the offending failure is just beyond the last message so it is easy to miss.
- You cannot
apt-get remove hypriot-dockerbecause it does not correctly add back the
/etc/systemd/system/docker.servicebecause it uses the default installation which uses the aufs driver, you have to manually change that file to use
A primer on clustering layers
- To make this work you basically need three services and there is lots of competition for them. It’s nice to stay with docker as much as possible but even more important to stay on the main line.
- Networking. With a cluster, it might be on different machines and the IP addresses are changing. So folks typically put in an overlay network so that the cluster gets its own virtual network to play in. The approaches are to use the Linux vlan to which seems to underly the other mechanisms. Docker now has overlay networking built in or to use flannel. Docker seems simplest but flannel because of kubernetes support seems most popular with the big boys. So right now hat tip to flannel.
- Discovery. If you have a virtual network, you need a way to figure out where the different hosts are. With docker networking you get this for free. But with others you build your own dns provider.
- Key/Value Store. This is where you can put parameters in so you can figure out what is running. Docker hosted is a test option but Consul and etcd seem like the two main ones so consul being easier to use but having less support and at least for is a bit buggy. There are a few approaches like consul (cluster-lab) and running etcd in either in a container (kubernetes-on-arm) or natively (ansible-kubernetes-openshift-pi3). Right now hat tip to etcd.
- Orchestration. This is basically how you start a bunch of machines and connect them together. Docker-compose and ansible seem to be the two choices. Docker compose is elegant but not super powerful. You can’t specify which node for instance. Right now hat tip to ansible.
Raspberry Pi Clustering
For the Raspberry Pi, there are at least four different methods for getting clustering working. Tl;dr we are using docker swarm right now with their hosted service and hoping either luxas or rhess get it right.
- Docker-machine without swarms (Working rpi1 and rpi3). Currently this works fine with hypriot-docker on the rpi on their 2016 image for both rpi1 as well as their test 0.4.9 image on rpi3.
- Docker swarms using a hosted service (Working on rpi3) This is not supposed to be used in deployment but uses docker’s own system as the host for the cluster. You need a host so you can store central configuration information about the system. You basically generate a random token and then use it with docker-machine to create a swarm. This is pretty easy and you get it working. My main problem right now is that docker-machine doesn’t seem to be working correctly against the hypriot and is hanging. I think this might be related to using the pi account instead of root, but am not sure. The main weakness that it uses dockers hosted. Also it doesn’t handle orchestration at all. So have to either write bash scripts or use ansible. You can of course run individual jobs on specific nodes by using docker-machine env to connect to specific members and that basically works or course and is a good workaround.
- Hypriot Cluster-lab (not working reliably and corrupts install for docker swarm and docker machine). They bundled it all together into something that uses consul for the control. This worked amazingly well for a single cluster on a single network switch but we had lots of trouble across switches. The vlan support seemed to work but consul discovery didn’t happen reliably. And I’m not are investing more time makes sense given kubernetes seems more popular.
- Kubernetes on arm (does not run if cluster-lab ran before it). There are at least three flavors of kubernetes for rpi. Luxas has a nice project kubernetes-on-arm that uses a docker image prebuilt. The main problem is that I can’t get workers to connect with rpi1s. The 8080 API server is not coming up. Also there are conflicts with Hypriot prebuilt for rpi3 that causes Hans.
- Kubernetes/ansible. Rhess has a similar project but he adds ansible for orchestration. Haven’t tried it yet.
- Kubernetes hand rolled. There is a guide that shows you how to do this but I haven’t tried. Will do if the first two don’t work.