Writing Maintainable Acceptance Tests

Over the past six months or so, there has been a fair amount of negative commentary about automated acceptance / integration / system testing. The thrust of this commentary is that testing at this level tends to be brittle, slow and have a high maintenance overhead. None of this needs to be true, but producing a robust suite of tests requires an uncommon adherence to good practice.

To be sure, this kind of automated test will run slower than unit tests, because almost all of the machinery of the system under test is going to be involved. But I have seen many implementations that are much slower than they need to be because of one or two technical choices.

The advice that I offer here does not guarantee fast and robust tests, but if you don’t follow this advice and you don’t somehow deal with the issues raised in this article, I am pretty sure your tests will be slow and time-consuming to maintain as the test suite grows.

By way of example, let us consider an online book seller. The kind of test that can be built with the aid of many libraries might well look like this:

public void testSearchAndOrder() {
    type("username", "JRHartley");
    type("password", "verysecret");
    click("Submit");

    type("searchField", "Fly Fishing Hartley");
    click("Search");

    assertEquals("Fly Fishing"), innerHtml("searchresults/tr[1]/td[1]");
    assertEquals("J.R.Hartley"), innerHtml("searchresults/tr[1]/td[2]");

    click("searchresults/tr[1]/td[1]");
    click("Order");

    assertPageTitle("Your order has been placed");
}

This will certainly test the functionality. However, there are two things wrong with it:

  • Who is ‘JRHartley’? Why is he buying ‘Fly Fishing’? i.e. Is there something significant about that author or that book in the ’standard test dataset’? No? Then this test is overspecified. Yes? Then the test is dependent on something that is not stated clearly. In either case, it is dependent on a ’standard dataset’.
  • Is JRHartley really interested in clicking buttons, typing into text fields, etc.? i.e. is the logic we are testing really about button clicks and typing? This test is specifying mechanisms rather than outcome.

These issues contribute to brittleness and difficulties with maintenance. In addition, the dependency on standard data frequently leads to slow tests because there is more data than required.

My general approach to these problems is to focus on abstraction and composability. The solutions shown below achieve these fundamental aspects of software development partially by using a domain specific language in fluent interface style (although the fluency is limited and I usually go considerably further). By good fortune, Debasish Ghosh describes an approach to building DSLs here that is precisely what I am advocating.

However, this choice is largely irrelevant. You could achieve the results with, for example, FIT, Fitness, Concordion or some other tool instead. I have never used those tools and suspect that I don’t really want to, but if you like them, go for it.

Dependency on a ‘Standard Dataset’

Note: Although this section refers to databases, the same argument applies if, for example, your system accepts feed files from other systems and processes them.

My preference is for each test to start with an empty database schema and to populate it with exactly what is needed. This tends to:

  • prevent one test from destroying the data that another test depends on
  • prevent tests from becoming dependent on an incidental aspect of the data
  • prevent more and more data being added to the database in order to accomodate new tests without breaking previous tests
  • make tests run quickly because there is very little data in the database

However, I do not want each test to have a bunch of database inserts, because doing so will make the test fragile with respect to database changes and in any case, doing this specifies the test at the wrong level. Therefore the database setup must be abstracted.

In order to truly achieve this, the abstraction must be above the level of database tables; for example, if tests need to know that in order to add a book to the database, a publisher must be created first, each test:

  • has structural knowledge of the database leading to duplication of information
  • contains information that is not relevant to the test leading to overspecified tests

In the example above, as far as the test is concerned, the book needs two attributes: a name and an author. Undoubtedly, the book will have many more attributes and relationships in the database, but none of these are relevant to this test and so should not be part of the setup. My DSL will therefore start to look like this:

public void testOrderBook() {
    given()
        .aBook()
            .title("Fly Fishing")
            .author("Hartley");

    ...
}

The given() method is an entry point into the fluent interface and this expression should be read ‘Given a book with title “Fly Fishing” and author “Hartley”‘. The important aspect of this is that the book() method initialises all of the book’s fields and knows how to build a book with integrity in the database. I usually initialise fields to random values.

This deals with the dependency on the standard data issue. However, it does not eliminate the over-specificity of the test. To put it simply, while the test is interested in the title and author, it should be non-specific about what title and author is actually used.

Overspecificity

In order to solve this problem, we need to allow the title and author to be given to us, rather than prescribed. As said earlier, my convention is to randomise any unspecified fields. So not specifying an author and title will result in what we want.

public void testOrderBook() {
    Book book = given().aBook();

    ...
}

We can now express the rest of the test using the book’s properties. We will see how to do that later.

Some might feel uncomfortable that something important has been lost from this fixture; the fact that the test does depend on an author and title and yet we are not setting up the book with a particular author and title. If that really bothers you, it is easily remedied (I’ll use the more terse form above for the rest of the article though):

public void testOrderBook() {
    String author = given().anAuthor();
    String title = given().aBookTitle();

    Book book = given()
        .aBook()
            .author(author)
            .title(title);

    ...
}

The important point is that the test is now free of pre-existing fixture data and is minimally specific.

Specify Tests In Business Terms

The other concern in the original test was that it was expressed in terms of what to type into form fields and what buttons to click. We should hide this away:

public void testOrderBook() {
    Book book = given().aBook()

    ...

    then()
        .searchingFor(book.getTitle() + " " + book.getAuthor());

    resultsIn()
        .searchResults(book);

    ...
}

The logic of how to do the search is hidden behind the searchingFor(…) method. The only relevant thing passed is the search term. Subsequently, we make sure that the results returned from the search contain the book. We could be more specific and assert that the search results contain only one book. Of course, for that to have any real relevance, more than one book would have to be created in the database.

This test now does not directly depend on button clicks and, apart from a few artifacts of the Java language, is pretty easy to understand at the level of business interactions.

Finally, we want to place the order:

public void testOrderBook() {
    Book book = given().aBook();
    User user = given().aUser();

    loginAs(user);

    then()
        .searchingFor(book.getTitle() + " " + book.getAuthor())

    resultsIn()
        .searchResults(book);

    then()
        .select(book)
        .order()

    resultsIn()
        .anOrder(book, user);
}

And once again, details about how the order is placed and how we verify that the order has been placed are abstracted away. Of course, if the user was able to check their orders on a web-page, we could use that page to determine that the order has been placed. Otherwise, we might go to the database to make sure that an entry has been added to an appropriate table. We might also expect an email to be sent, so the anOrder(Book, User) method might verify that too. The point is that all of that can be hidden in anOrder(Book, User) and can be changed over time if necessary.

Emergent Goodness

The approach described above brings enough benefits as it is; fixture data independence and tests specified at the level of the business process. However, there are now two things that also emerge:

  • Because the DSL is written in fluent interface style in Java, as more tests are written, the fluent interface helps to guide the writing of tests. New tests can be composed very quickly.
  • The test is not tied to an implementation. The fact that this started life as a test for a web-app does not mean that it will always be so. An entire test suite can be reused to test, for example, a REST api, or a B2B messaging system for book supplies. All that is needed is to reimplement the model behind the DSL appropriately. This may not be a small task, but the fact remains that the tests themselves can be used in many ways.

This latter point has been exercised on one of our recent projects in which there were multiple ways of using the system (web-app, REST API and message queues each offering the same functionality to clients).

Conclusion

In order to keep automated acceptance tests maintainable, fast and flexible, focus on those staples of solid development: abstraction and composability. Avoid ’standard data sets’ and express tests in terms of business processes.


3 Responses to “Writing Maintainable Acceptance Tests”

  1. Great post, cool points!

    However can’t agree with one thing. You said you fill unspecified fields with random data. Eeeek? So you making a hole that can lead to failing test in one run and successful test in second run initiated immediately after first. Isn’t it?

  2. lance says:

    Hi Victor.

    I understand that the notion of randomness in tests makes some people uncomfortable.

    However, everything that is essential to the test passing should be explicit in the test. The only randomness left is in those things that should not matter to the test. Therefore, if the test flickers, then something that is necessary has not been specified. The fact that the test flickers under these conditions is a good thing, because it tells you that the test is not specific enough, or conversely, that the particular partition of the ’state space’ of the system under test has more dimensions than you thought.

    The alternative is that the test always passes, but only because something significant but unspecified just ‘happens’ always to have the right value.

    Regards,

    Lance

  3. David Kemp says:

    Nice blog. I often find that acceptance tests catch serious regressions missed by unit tests. It is too easy to change a class and its tests to satisfy a change of requirements and then find that your change breaks some assumptions made elsewhere. I find unit tests good for driving the code (and design), but acceptance tests good for catching regressions. As for randomness, I used to scoff at code that used the current date for everything until some tests started breaking one leap year! Yes it would be better if we had tests that covered leap years etc, but the guy who wrote the tests didn’t think of doing it at the time. Randomly generated dates may have picked the problem up sooner!

Leave a Reply

Site Designed By Top Left Design