Patterns of Narrow Integration Testing

“Integration testing (sometimes called integration and testing, abbreviated I&T) is the phase in software testing in which individual software modules are combined and tested as a group.”

I’m not too fond of this definition, because it implies that tests which mock dependencies to other systems are not integration tests, as that is not combining the systems as a group.

Guru99 has a similar issue with its definition:

“INTEGRATION TESTING is defined as a type of testing where software modules are integrated logically and tested as a group.”

The third result on google is an article by Martin Fowler where he brings the problem of defining integration tests, and he arrived at a conclusion which I agree with, that integration testing is a polysemy.

Narrow vs Broad

I recommend reading Martin Fowler's post as it goes more into detail about the differences between the two types of integration testing than I will here. Summarized we have two levels of integration testing:

Broad: Where you spin up live versions of all services and run tests against them. Broad tests need substantial setup and maintenance to maintain and is often conducted in a staging environment where all systems run.
Narrow: Where you test an individual service but mocks most of its external dependencies.

Commonly, the broad integration tests are on a very high level written as black box styled tests. They trace the effects of a call throughout the overall solution to figure out that everything is talking together correctly. In the broad category, we are running entire databases with real services which transfer data to each other.

Why do we need narrow integration tests?

To determine the value of narrow integration tests, we must see where its broad counterpart falls short.

Broad integration tests are slow: Since broad integration tests have to run live versions of all systems and bounce data between them, the tests are naturally slowed. It is not helped that we need to make sure to run the correct version of various services as well as setting up all of them with different data.
Broad integration tests are complicated: As explained, a lot is going on in these tests so that they can become complicated very quickly.
Board integration tests are hard to maintain: We quickly run into similar issues as we do with E2E tests, that we must limit the number of tests because, if you have too many, it can become a nightmare to maintain.

In comparison, narrow integration tests make up for the shortcomings of broad integration tests:

Narrow integration tests are fast: While not as fast as unit tests they run with such a speed that they can be executed during compile time on the developer computer. That means faster feedback for the developer.
Narrow integration tests are more straightforward than their broad counterpart: They require less setup as they only need to worry about the design of mocks and maybe a database.
Narrow integration tests are easier to maintain: Which means we can have a bunch of them!

Like many things in software development, a practice cannot be determined in a vacuum, as techniques often make up for the downsides of other techniques. We see this happening with companies which only want to adopt parts of agile. They don’t get the full benefit because they implement just the parts they deem necessary without realizing that everything builds on something else. It is the same with narrow and broad integration tests.

To have a healthy test portfolio, we should have both broad and narrow integration tests, but their goals are slightly different:

Broad integration tests try to verify the correctness over how one service interprets another service’s contract.
Narrow integration tests assume that the contract is correct and has more tests for how a single system interacts with that contract.

To distil the differences in the goal of narrow and broad integration tests even further, we can say that:

Broad integration tests verify the contract
Narrow integration tests verify the interactions with the contract

Does broad integration tests verify the contract by having systems interact with it? Definitely, but it is not as detailed as with the narrow integration tests. The question is not necessarily whether or not the systems did the right thing; rather, it is whether or not the information was transferred and interpreted correctly.

Narrow integration tests and you

Narrow integration tests can be viewed as I/O tests for our application, often written in the style of unit tests. So let’s define the most common I/O our systems tend to have:

External web APIs
Database
Files
Messaging systems/queues
Searching systems

NOTE
When we write narrow integration tests, we don’t usually want to test the input of our application. Our system’s input will be covered and defined by automated high-level functional tests and automated acceptance tests. Broad integration tests will also touch it. We are potentially looking at a bunch of duplicated tests, so we usually don’t want to write narrow integration tests for the input of our system and instead focus on data which our system requests.

Run locally on a developer’s machine (no need for an external environment)
Require no external configuration
Run as part of the build per default, but can be manually excluded

What does a narrow integration test actually look like?

As with unit tests we often have one test class per class in our production code with a bunch of tests and the same goes for narrow integration tests, and we have the same for our narrow integration tests. Consider this class from the spring boots guides:

@SpringBootApplication
public class ConsumingRestApplication {

    private static final Logger log = LoggerFactory.getLogger(ConsumingRestApplication.class);

    public static void main(String[] args) {
        SpringApplication.run(ConsumingRestApplication.class, args);
    }

    @Bean
    public RestTemplate restTemplate(RestTemplateBuilder builder) {
        return builder.build();
    }

    @Bean
    public CommandLineRunner run(RestTemplate restTemplate) throws Exception {
        return args -> {
            Quote quote = restTemplate.getForObject(
                    "URL_TO_RANDOM_QUOTE_API", Quote.class);
            log.info(quote.toString());
        };
    }
}

This might be a very simplistic example, but a narrow integration test for this code would be to verify the call to a random quote API.

In short, narrow integration tests are unit tests for the I/O, but they are distinguishable from unit tests because they deal with I/O. We should not make the mistake of calling narrow integration tests for unit tests even if they are written as such.

If the definition of a unit test is a test which tests a single aspect of the code, then the definition of a narrow integration test is a test which tests a single aspect of the code that interfaces with I/O in some way.

Database

Mock the outer layer, whatever DB framework that is used. I.e. the repository/query object which your DAO/repository uses to access the data.
Use an actual database and fetch real data.

Mock the repository/query objects

The main benefit of this is that we don’t need a database running and our tests will be blazing fasts.

Running an actual database

my post about database migrations

By running a local database, it might seem that we are breaking our “no external dependencies” rule. I view it as we are bending the rules a little. Some might argue that we are trespassing on the territory of wide integration tests, but I disagree. We are limited to how many wide integration tests we can have without feeling pain, so we should limit them wherever possible, and doing database tests as narrow integration tests makes much more sense. There are two main reasons why I think that:

A service might have a bunch of different tables, all with their DAOs/repositories which must be tested. Having all these tests in the wide integration test might not be feasible, yet we want to test them all.
A database is very much an independent piece. If you need other dependencies to spin up a database instance, then you are probably doing something wrong.

This approach is, to me, last resort, when there’s absolutely no other way of running a database. The reason for this is that this way of testing doesn’t make sure that the database understands our objects/data structure and vice versa. We never get to test that conversion. Even if we found some mocking system to throw around the ORM/Database framework, we still cannot be sure that the actual database will understand.

When we’re running tests towards an actual database, we must do a few things as part of the setup for our test:

Spin up our database in some way
Migrate the database schema
Insert whatever required data one needs for the specific test.

Using an in-memory replacement

lack of in-memory databases

If we use a database framework which allows us to switch out our database provider at will, and our schema is so generic that it can be transferred from one SQL based to another, then this can be a practical way. We might be other types of databases, like NoSQL, distributed and so forth. There might not be possible to switch over to a different kind of database provider that is used in production, or there might not be an easy way to spin up your database locally.

What is so great about these in-memory databases is that we can spin one up without requiring any external dependencies (like docker) to make things work.

Using docker

MySQL

MongoDB

progress

The second step is to include this in your build tool. Whether it is Maven or MSBuild, most have dependencies which can interface with Docker and automatically start various containers. Or you can have a docker-compose file which gets executed before the build taking place. The point is that we’re spoiled for choice and there are multiple ways on how we can achieve spin up a new database instance on the fly.

While docker makes it easy to spin up databases of all kinds, it also comes with some quirks. For example whenever I run up a container with Oracle SQL, then docker returns successfully before the Oracle SQL server is ready. This means that if I continue with the build, then it may fail because the SQL server is still working on setting itself up. The workaround so far has been to add a hardcoded timeout to the setup, but that is not very elegant. If there isn’t much proprietary trickery in the schema, it might be well worth switching to a lighter database system for these tests, like SQLite. It might not be possible, but it is something to consider. We want our integration tests to be as fast as possible, and the more we have to wait for the tests to complete the less value they bring. The actual integration will be indirectly verified by the wide integration tests anyway, so it isn’t vital that we are using the same database for the narrow integration test.

Files

Writing files

Writing these tests is pretty straight forward:

Have a dataset ready
Generate file
Compare the file to another hardcoded file which has already been verified to be correct

When generating a file, I’d like to suggest that we don’t write it to disk. I am much more in favour of wrapping the actual functional call, which writes a file to the disk, in a class and an interface:

In the model above, we see that we have wrapped the actual I/O call in a class. It is the last chain of our call, but we can intercept it as a mocked object. That means we can get direct access to the content of the file and all other parameters without writing anything to disk.

There are multiple reasons why we might avoid writing a file to disk:

Writing a file to disk adds a file to the file system which needs to be cleaned up and dealt with somehow. Suddenly there’s a new file in the git commit, or you have to take other precautions to make sure that such files don’t cause a mess. It is another thing to consider.
In most high-level languages, file reading and writing files has become a non-issue. If we have managed to create the file object, then we are pretty much guaranteed to be able to write the actual file to disk. We are running this test locally on our machine; thus we won’t be able to test whether we have the correct paths anyway, so we might as well not write to disk. What is essential is that our code thinks it has written a file to disk.

No matter whether the file is written to disk or caught by a mock, we the next step in the process remains the same: Read the content of the file. It might be tempting to parse the content into some object structure, but I’d recommend against it—the more logic put into our tests, the more problems we create for ourselves. Instead, we should have a hard-coded file which our generated file should be equal to. We can choose to compare the content or hash the file and compare it that way.

Making that initial hard-coded test file can be a bit of a pain. Some might see me as a heathen for saying it, but I’d recommend generating a file based on your desired test data and use that as a starting point. We cannot get away from having to check the correctness of the hard-coded file manually, but when that is done, we only need to verify the correctness of any changes made to it.

Reading files

Read a hard-coded file
Compare the result of the read (usually an object of sort) with whatever hard-coded result we might have in our test (like a normal unit test)

RabbitMQ/Kafka

NOTE
You might get away with not having narrow integration tests for message brokers. It depends how much logic you might (or might not) have in the layer/class which deals with the message broker logic. If it is separate enough, then wide integration tests might be enough. If there is a bunch of logic which is hard to write regular unit tests for I do recommend writing narrow integration tests, but it might not always be required.

RabbitMQ-mock

mocking built-in

Web APIs

When I say web APIs, I am primarily talking about protocols running on HTTP. We will explore other ways of transferring data later, but for now, let’s keep the conversation to HTTP based protocols only.

The solution to all is mostly the same, however. We know that we cannot run all of these services locally as all might have other dependencies and databases which they rely on. The goal is to run something locally, which our service can connect to and return some predictable value, which we can use in our test. Currently, I’ve seen this be solved in two ways:

Make a test service

If we also package this test service into a docker container, we can simply add it to docker-compose and have it automatically start up during testing so that the developer doesn’t have to worry too much about it while developing.

While this approach works, it is not the one I would default to. Making a test service might be a good idea if you’re working with some proprietary technology, or any technology which doesn’t have a mock standing available, however, one should avoid this if possible as it comes with some downsides:

To introduce variants in the response, we have to include logic in the test service
We have to manually update the test service when we make changes to the contracts for the actual services
By changing the data in the test service for our test, we risk accidentally breaking tests in other systems
Gets more complicated as more endpoints/contracts are added to the test service

Mock the external servicers

When working with SOAP services I have resorted to using the method where I build my master test service, but there are projects like this which seems promising which I should take a closer look at (a post for the future?).

My type of dependency isn’t here, John! What do I do?! (Generic patterns)

The points above don’t capture all protocols, applications or ways of transferring data. So this section will be a general catch-all for whatever I haven’t concretely covered in the rest of this post, but we will see an available pattern on how to approach a given dependency and the solution.

Note that I am not saying that one solution is better than the other. It all depends on the context. For some dependencies, like the database, it is easier to just run the dependency rather than trying to fake it. For other dependencies, like SOAP, it might be easier to do a complete fake service to write tests for it meaningfully.

Mock the dependency

This is usually the most preferred way, as it makes life so much easier when writing the tests. It allows each test to specify the behaviour of the mock, as well as the return data. It also keeps all the test related data within the test itself. This approach tends to make the tests a bit bloated, but that is hard to avoid.

While there might not be a framework or server which is made for whatever programming language that is used, there might be a docker image which can be reused for this, so it is often worth looking around.

This system is described in the “make a test service” section in this post, but it is not limited to webs services only. Whatever that cannot be mocked easily; we can replicate with what I like to call a manual mock.

Sometimes, like in the case of SOAP, there might not be any readily available mocking tools for whatever platform you use. Or the ones that exist are so clunky that you don’t want to use them. In these situations, we might want to consider making a mock system ourselves.

Writing your manual mocks might not be worth it, and should be judged on a case-by-case basis. It is generally worth it if most of the solution relies on the technology which we are trying to mock. If it is just one dependency in the overall solution that has this problem, then it may not be worth it and instead use one of the other approaches might serve you better.

Abstract and mock the outer layer of the application

If there is no easy way to mock the dependency itself, it might be easier to simply mock the part of the code that interfaces with the dependency. In this approach, we simply extract out all the logic and hide the direct calls to the external dependency behind an interface which we can pretty quickly mock with a standard mocking framework.

The downside is that this might require rewriting parts of the service, which we usually want to avoid, but it is guaranteed to work with pretty much any modern programming language and any protocol on the virtue that we are merely ignoring the external dependency. Another downside is that we are not testing whether or not our application can communicate with that technology, but that will be caught by the wide integration tests anyway, so we shouldn’t worry too much about that.

Another thing to consider when going with this approach is whether or not we are writing integration tests at this point, and I would agree. At this point, we are writing unit tests, as we have abstracted away from any I/O dependency. This is also why we should be careful going this route as we are not testing the integration in any way. That said it is all about confidence in our application working, and if we trust our wide integration tests, this might be an appropriate solution.

Run the dependency

If possible, we should require as little setup as possible when writing our narrow integration tests, but sometimes it is easier to run the dependency than it is trying to fake it in some way. Technologies like Docker have made setting up reusable containers much more straightforward than it ever has.

A consideration that should be made when considering this approach is whether or not the desired test belongs as a wide integration test rather than a narrow one. We made an exception with the database previously, but that is because databases tend not to need any other dependencies and our systems tend to have a bunch of different queries which all need to be tested. There might be similar situations for other scenarios where this approach does make sense.

Conclusions

Writing tests is hard. Not only because it is difficult to get right, but it is also something which many don’t prioritize all that much. When we add the additional dimension, which is external dependencies, we make it even more challenging to manage, so we often see bloated test environments which take hours upon hours to execute tests. Or we see such tests disabled in the pipeline entirely because people can’t be bothered with fixing them. I do think this is the wrong approach, and instead, we should work with making our tests more reliable and portable. I hope that my post has given some food for thought in that sense.

Another goal of this post was to serve as a starting point on how we write our narrow integration tests as well as documenting some of the patterns I tend to use. Hopefully, I discover newer and better patterns as I progress as a developer.

John Mikael Lindbakk