TUI Acceptance Testing

Posted on January 17, 2020
Tags:

Update: I had created this blog post as a draft and forgotten about it. Since this went into a talk at LCA2020 I thought I publish it for completeness.

Since Purebred is closing in on it’s seconds birthday on 18th of July, I wanted to highlight how useful our suite of acceptance tests has become and what work went into it.

The current state of affairs

We use tmux extensively to simulate user input and basically black box test the entire application. Currently all 25 tests run on travis in 1 minute and 25 seconds. Most of it comes down to IO performance. On a modern i5 laptop it’s down to 30seconds.

Screenshot_2020-01-17 Job #1366 1 - purebred-mua purebred - Travis CI

Each test runs performs a setup, starts the application and runs through a series of steps performing user input, waiting for the terminal to repaint and assert a given text to be present. Since everything is text in a terminal - even colours - this makes it really is to design a test suite if you can bridge the gap using tmux.

I think I should explain more precisely what I mean by “waiting for the terminal to repaint”. At this point in time, we poll tmux for our assertion string to be present. That happens in quick succession by checking the hardcopy of the terminal window (basically a screen shot) with an exponential back off.

Problems we encountered

tmux different behaviour between releases

The travis containers run a much older version of GNU/Linux and therefore tmux. We’ve subcommands accepting different arguments or changed escape sequences in the terminal screen shot.

Tests fail randomly because of too generic assertion strings

That took a bit of figuring out, but in hind sight it is really obvious. Since we determine the screen to be repainted once our assertion string shows up, we sometimes used a string which showed up in subsequent screens. Next steps were executed and the test failed at subsequent steps. This is confusing, since you wonder how the test steps have even gotten to this screen.

We solved this not really technically, but simply by fixing the failing problems and documenting this potential pitfall.

Races between new-session and tmux server

Initially each test set up a new session and cleaned the session during a tear down. While this intention was good, it lead to randomly failing tests. The single session being torn down meant that also the server was killed. However, if the next session is created immediately after the old session is being removed we find our self in a race between the tmux server being killed and newly started.

We solved this problem with a keep-alive session which runs as long the entire suite runs and numbering the test sessions.

Asserting against terminal colours

Some tests assert specific how widgets are specifically rendered including their ANSI colour codes. Since we do write the tests on our own computers which support more than 16 colours, the terminals the CI runs is typically less sophisticated. This can lead to randomly failing tests because the colours in CI are different depending on the type of the terminal.

We solved this problem by simply setting a “dumb” terminal only supporting 16 colours.

Line wrapping in a terminal

A terminal comes with a hard character line width. By default it’s 80 characters (and 24 in height). Part of those 80 characters will be eaten up by your PS1 (the command prompt) setting of your shell, the rest can be consumed by command input.

We did run into randomly failing tests if lines wrapped at random points in the input and therefore introduced newlines in the output.

We solved this by invoking tmux with an additional “-J” parameter to join wrapped lines.

Optimisations

Initially, each step was waiting up to 6 seconds for a redraw. With an increasing amount of tests, the wait time for a pass or fail increased as well. Since we faced some flaky tests, we felt fixing those first, before making the tests run faster. The downside of optimising first and fixing flaky tests afterwards is that random test failures become more pronounced eroding the confidence in any automated test suite.

After were were sure we had caught all problems, we introduced an exponential back off patch which would wait cumulatively up to a second for the UI to be repainted. That’s a long time for the UI to change.