duplicates

Dockerfile reproducibility when an image will not build

0

Docker has become the default for lots of reproducible build system, but the sad thing is that although you get a defined image, the specification, the Dockerfile itself, is not reproducible. That's because most Dockerfile do not have pinned dependencies:

FROM ubuntu
RUN apt-get update && apt-get install vim python-pip python-dev
RUN git clone https://github.com/stevenlovegrow/Pangolin && \
    cd Pangolin && make

COPY . /app
WORKDIR /app
RUN make all

This is a simple example that starts ubuntu, installs some python junk, and then you have a nice image that you can push.

The problem is that if you come back say six months later, then the version of python may have changed and the version of python-essentials and the version of your application may have changed as well. So the prevention is to always run dockerfilelint which you can do at the command line, as a pre-commit or in a GitHub Action.

The dilemma is that you do not want to pin too hard as Ubuntu and other Linux repositories do not keep a complete history of all packages, so instead, you may want to pin with an "*" wild card like this so you capture the version numbers and you have some hope of building the image from the Dockerfile in six months. Note here that I added the clean and remove which shrinks the size of the layer by getting rid of the temporary files.

There are a few things here, first is that every apt-get gets pinning to a version. This one is pretty strict down to the minor version, but you could easily say python-pip="9.*" to just get any version 9. Note that this is strictly a string match for what is first, so if you really want version 9.0, you need to put it there.

The second thing is that when you are doing building inside the system, you want to do a branch checkout for whatever version. Hopefully, the thing is doing version tagging so you do not have to nail a specific commit and bug fixes get ignored, but that depends on the upstream package.

Note we are using the WORKDIR command this makes sure that the directory is made and drops you in which is convenient and note also this removes all the sources and assumes that you the build artifact ends up in a library like /usr/lib so you do not need to keep the source around also shrinking the layer:

FROM ubuntu:22.04
ARG DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y --no-recommends \
    vim="2:8.0*" \
    python-pip="9.0.1*" \
    python-dev="2.7.15*" && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

RUN git checkout -b v0.6 https://github.com/stevenlovegrove/Pangolin
WORKDIR /Pangolin
RUN make && \
    rm -rf *

COPY . /app
WORKDIR /app
RUN make

What if you only have the image and Dockerfile sucks

Well in that case, there is a trick, what you can do is to run the image and then use apt-get show to see the version numbers

docker build -t test .
docker run -it --rm test bash
# now you are inside the container check the ubuntu version
cat /etc/lsb-release
apt-get update
# now for each package find all the versions available
apt-cache show vim | grep Version

The you have the actual versions and can update the Dockerfile and you will have a newly pinned Dockerfile that will hopefully be resistance to these problems.

Related Posts

This site uses Akismet to reduce spam. Learn how your comment data is processed.