Use Testcontainers to create a Docker Test Image

TLDR?! You can always just check out the example on Github.
Prerequisites: You should be familiar with JUnit, Testcontainers and have some fundamental Docker knowledge.

I’ve seen a lot of projects use a database management tool like Flyway or Liquibase to update their databases. In production, only a small part of the changelog will need to run because most of the changes have already been applied in previous deployments. 🚀

When it comes to testing, we usually start out with an empty database. Either we use an in-memory database like H2 or start a Docker Container for a database like PostgreSQL using Testcontainers. In these cases, a long changelog can severly impact your test suite execution time, as all changes need to be applied at the start of your testrun!

I’ve seen teams react in multiple ways to this problem.

Option A: Ignore it accepting the longer test run. “Longer” is relative, so the startup-time might not be too horrendous (yet);

Option B: Disable Liquibase/Flyway and let Hibernate create the database by using the auto-ddl functionality. This will allow you to skip the Liquibase/Flyway part completely. The trade-off with this approach is that the database structure created by auto-ddl might not be the same as the one created by your Liquibase/Flyway scripts!

Wouldn’t it be great if we instead could connect our application to a database that was already up-to-date? In this way, we only have to apply the newest changes on our current branch, drastically improving our build time! We could even add some test data as well while we’re at it!

Let’s have a look at how using Testcontainers can improve this situation.

Creating a preloaded Docker Test Image

To create an up-to-date docker image, we would need to:

Start the database in a Docker container;
Apply all our changes to it, either with Liquibase/Flyway or through code;
Save the current Docker container state to a new image;
Push the image to a repository;

While you could script this separately, we can achieve the same by writing a test! This allows us to stick with to favorite programming language and leverage Testcontainers, JUnit and Spring to do the heavy lifting for us. 😊

We’ll create a standard @SpringBootTest. This will start Spring, which in turn will run your Liquibase or Flyway scripts. To make sure this test isn’t run on every branch, we’ll give it a profile. This will allow us to specify when we want this container to be updated.


    @Testcontainers
    @SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.MOCK)
    @Profile("main")
    public class ContainerUpdater {
        //
    }

Then we start a container for the database we’ll be using, using Testcontainers. In this case, a PostgreSQL database.


    @Container
    public static PostgreSQLContainer postgreSQLContainer = new PostgreSQLContainer<>("postgres:14.2")
            .withDatabaseName("testcontainer")
            .withUsername("sa")
            .withPassword("sa");

Next, we write a single test and the following will happen.

The @Testcontainers annotation will make sure the PostgreSQL Docker Container is started.
The @SpringBootTest annotation will start the Spring context, which will execute the Liquibase/Flyway scripts.

In the body of the test-method we use the DockerClient included in Testcontainers PostgreSQLContainer.class to:

commit the updated database to a new docker image;
push the newly create docker image to a docker repository for further use;

Here is what such a method looks like:


    @Test
    public void pushNewImage() throws InterruptedException {
        // Startup and Liquibase happen first :-)


     😇 // You can even add some testdata, as this is just another Spring test! @Autowire something and add some data!
     😇 // Or use one of the standard annotations to execute SQL scripts.

        // Get the DockerClient used by the Testcontainer library (you can also use your own if they every make that private).
        final DockerClient dockerClient = postgreSQLContainer.getDockerClient();

        // Commit docker container changes into new image (equivalent to command line 'docker commit')
        dockerClient.commitCmd(postgreSQLContainer.getContainerId())
                .withRepository("tomcools/postgres")
                .withTag("main")
                .exec();

        // Push new image to your repository. (equivalent to command line 'docker push')
        dockerClient.pushImageCmd("tomcools/postgres:main")
                .exec(new ResultCallback.Adapter<>() {
                    @Override
                    public void onNext(PushResponseItem object) {
                        log.info(object.toString());
                        if(object.isErrorIndicated()) {
                            // This is just to fail the build in case push of new image fails
                            throw new RuntimeException("Failed push: " + object.getErrorDetail());
                        }
                    }
                }).awaitCompletion();
    }

🚀 Once the method has run, you now have a new Docker image with the latest version of your database, including data! Other tests can now use this image for a faster startup or just to have some test data already in the DB. 🚀


    // Testcontainers checks for image-compatibility in their classes (like PostgreSQLContainer). 
    // Adding this "asCompatibleSubsituteFor" explicitly declares this compatibility when using a custom image.
    private static DockerImageName IMAGE = DockerImageName.parse("tomcools/postgres:main")
            .asCompatibleSubstituteFor("postgres");

    @Container
    public static PostgreSQLContainer postgreSQLContainer = new PostgreSQLContainer<>(IMAGE)
            .withImagePullPolicy(PullPolicy.alwaysPull())

To make sure your actual tests always have the latest version of your custom Docker Image, make sure to use the alwaysPull pull policy. In case there is a newer version (after running your updater), this policy will pull that version. If no newer version is available, this policy only does a short check at the start of your tests, because the hash of the image wouldn’t have changed… thanks Docker! 😊

However we are not quite there yet.

Docker Commit </3 volumes

I previously mentioned that we use the docker commit command to create a new image out of the updated container. However, docker commit will not save anything that is saved into a volume. This is a problem, as most database images will create a volume 😕. Here is a snippet of the PostgreSQL Dockerfile.


    # A lot of the file is excluded, see source: https://github.com/docker-library/postgres/blob/e8ebf74e50128123a8d0220b85e357ef2d73a7ec/14/bullseye/Dockerfile
    
    RUN mkdir -p /var/run/postgresql && chown -R postgres:postgres /var/run/postgresql && chmod 2777 /var/run/postgresql
    
    ENV PGDATA /var/lib/postgresql/data
    RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 777 "$PGDATA"
    VOLUME /var/lib/postgresql/data     #<- PESKY VOLUME HERE!
    
    COPY docker-entrypoint.sh /usr/local/bin/
    ENTRYPOINT ["docker-entrypoint.sh"]

As you can see, the Dockerfile declares a volume (/var/lib/postgresql/data) and also creates an environment variable PGDATA with the same folder. Now, in order for our docker commit solution to work, we have need to make sure the data isn’t saved into a volume! For PostgreSQL, this leaves us with 2 options.

A) Copy the Dockerfile, and create our own image but removing the VOLUME declaration. This option will not make a volume but will instead save the data inside the container. In this way, docker commit will persist the data we have changed into a new image. The downside is, we now have a Dockerfile which we need to maintain;

B) Pass in the PGDATA environment variable to point to a different location, either when extending the image (with FROM postgres in a custom Dockerfile) or when running the container. This will still create an unneeded volume but the data will not be saved in that location, so docker commit will persist it in the image;

Please note, that this solution will be similar with other databases but not exactly the same! It usually involves mapping the location of the data to something that isn’t a volume.

I’ve gone for Option B and just mapped the location where Postgres will save data to a location which is not a volume.

    @Container
    public static PostgreSQLContainer postgreSQLContainer = new PostgreSQLContainer<>("postgres:14.2")
            // map container data to "non volume path" as only those are saved on commit.
            // original volume path is "/var/lib/postgresql/data"
            .withEnv("PGDATA", "/var/lib/postgresql/container-data")
            .withDatabaseName("testcontainer")
            .withUsername("sa")
            .withPassword("sa");

Testcontainers and PostgreSQL, a waiting game

There is currently an issue with the Waiter in the Testcontainer library, as PostgreSQL has some weird output when data is present. This is described here: https://github.com/testcontainers/testcontainers-java/issues/5359.

The workaround for this for now, at least until the issues is resolved (I’m contributing this one! 💪), is to configure another wait-strategy for Testcontainers. The full example can be found below.


    public static PostgreSQLContainer postgreSQLContainer = new PostgreSQLContainer<>("tomcools/postgres:main")
            .withImagePullPolicy(PullPolicy.alwaysPull())
            // Custom waiter
            .waitingFor((new LogMessageWaitStrategy())
                    .withRegEx(".*database system is ready to accept connections.*\\s")
                    .withTimes(1)
                    .withStartupTimeout(Duration.of(60L, ChronoUnit.SECONDS))
            )
            .withDatabaseName("testcontainer")
            .withUsername("sa")
            .withPassword("sa");

How to use this setup

How you can use this setup depends on your branching strategy. The way you could use this example, is to only run the ContainerUpdater on the main-branch. This can be achieved by running the tests on that branch using the main Spring Profile (we annotated the class with @Profile(“main”) for that reason). Every other branch could then start from a Docker Image which already contains everything that’s on the main branch.

Alternatively, instead of running it on each merge to the main-branch, you could run it nightly, which honestly should be more than frequent enough.

This setup does allow for more complex scenarios. One thing you could do is to pass in an environment variable to change the tag of the Docker Image so you have different images ready for you to use. However, that kind of specific use-cases, I’ll leave as an exercise for the reader 😅.

That was the idea! I hope this unconventional approach might inspire you. The full example is available on Github

Until the next time...lots of 💖
Tom

Addendum: It would be a shame to improve your test speed by using the ideas in this blog only to slow them down again by excessive usage of the @SpringBootTest annotation. No idea what I’m talking about? Check out this talk from Philip Riecks @ Spring I/O to learn more about it!