TUI Acceptance Testing

Update: I had created this blog post as a draft and forgotten about it. Since this went into a talk at LCA2020 I thought I publish it for completeness.

Since Purebred is closing in on it’s seconds birthday on 18th of July, I wanted to highlight how useful our suite of acceptance tests has become and what work went into it.

The current state of affairs

We use tmux extensively to simulate user input and basically black box test the entire application. Currently all 25 tests run on travis in 1 minute and 25 seconds. Most of it comes down to IO performance. On a modern i5 laptop it’s down to 30seconds.

Screenshot_2020-01-17 Job #1366 1 - purebred-mua purebred - Travis CI

Each test runs performs a setup, starts the application and runs through a series of steps performing user input, waiting for the terminal to repaint and assert a given text to be present. Since everything is text in a terminal - even colours - this makes it really is to design a test suite if you can bridge the gap using tmux.

I think I should explain more precisely what I mean by “waiting for the terminal to repaint”. At this point in time, we poll tmux for our assertion string to be present. That happens in quick succession by checking the hardcopy of the terminal window (basically a screen shot) with an exponential back off.

Problems we encountered

tmux different behaviour between releases

The travis containers run a much older version of GNU/Linux and therefore tmux. We’ve subcommands accepting different arguments or changed escape sequences in the terminal screen shot.

Tests fail randomly because of too generic assertion strings

That took a bit of figuring out, but in hind sight it is really obvious. Since we determine the screen to be repainted once our assertion string shows up, we sometimes used a string which showed up in subsequent screens. Next steps were executed and the test failed at subsequent steps. This is confusing, since you wonder how the test steps have even gotten to this screen.

We solved this not really technically, but simply by fixing the failing problems and documenting this potential pitfall.

Races between new-session and tmux server

Initially each test set up a new session and cleaned the session during a tear down. While this intention was good, it lead to randomly failing tests. The single session being torn down meant that also the server was killed. However, if the next session is created immediately after the old session is being removed we find our self in a race between the tmux server being killed and newly started.

We solved this problem with a keep-alive session which runs as long the entire suite runs and numbering the test sessions.

Asserting against terminal colours

Some tests assert specific how widgets are specifically rendered including their ANSI colour codes. Since we do write the tests on our own computers which support more than 16 colours, the terminals the CI runs is typically less sophisticated. This can lead to randomly failing tests because the colours in CI are different depending on the type of the terminal.

We solved this problem by simply setting a “dumb” terminal only supporting 16 colours.

Line wrapping in a terminal

A terminal comes with a hard character line width. By default it’s 80 characters (and 24 in height). Part of those 80 characters will be eaten up by your PS1 (the command prompt) setting of your shell, the rest can be consumed by command input.

We did run into randomly failing tests if lines wrapped at random points in the input and therefore introduced newlines in the output.

We solved this by invoking tmux with an additional “-J” parameter to join wrapped lines.

Optimisations

Initially, each step was waiting up to 6 seconds for a redraw. With an increasing amount of tests, the wait time for a pass or fail increased as well. Since we faced some flaky tests, we felt fixing those first, before making the tests run faster. The downside of optimising first and fixing flaky tests afterwards is that random test failures become more pronounced eroding the confidence in any automated test suite.

After were were sure we had caught all problems, we introduced an exponential back off patch which would wait cumulatively up to a second for the UI to be repainted. That’s a long time for the UI to change.

User permissions checking

Background

I’ve currently been working on restricting user access to executables on a Linux box. I removed all executable rights for others and added them via access control lists for certain groups. So for example for cat it looked like this: # getfacl /usr/bin/cat getfacl: Removing leading ‘/’ from absolute path names # file: usr/bin/cat # owner: root # group: root user::rwx group::r-x other::r–

For an executable I want the user access to the permissions looked like this:

# getfacl /usr/bin/ping getfacl: Removing leading ‘/’ from absolute path names # file: usr/bin/ping # owner: root # group: root user::rwx group::r-x group:staff:–x mask::r-x other::r–

The user is in the staff group and can execute ‘ping’, while everyone else get’s a permission denied.

The Test

As an automated test, I thought I go over all commands and produce a whitelist of executables a given user has access to.

The script looks a bit like this for a single executable:

# cat /tmp/foo.py import os

the users group id

os.setgid(2000) # the users ID os.setuid(2003)

print(os.access(‘/usr/bin/cat’, os.X_OK)) print(os.access(‘/usr/bin/ping’, os.X_OK))

When I run it I expected the test for the first executable to be false and the second to be true:

# python /tmp/foo.py True True

Aeh what? Doesn’t the script run as my user who’s in the staff group?

Turns out there is more to the process than just the group and user id. There are also supplementary groups and capabilities. When changing the script to call print(os.getgroups()) it printed the supplementary groups of the user I was running the script as, which was root in this circumstance. Changing the script to also set the supplementary groups to the one of the user:

import os

the users group id

os.setgid(2000) os.setgroups([2000, 2003]) # the users ID os.setuid(2003)

print(os.access(‘/usr/bin/cat’, os.X_OK)) print(os.access(‘/usr/bin/ping’, os.X_OK))

and running it returns the right results:


# python /tmp/foo.py
False
True

Caveats

Restricting permissions with ACLs and testing the way I demonstrated above can lead to false positives for scripts. You can not remove executable permissions from the script interpreter (e.g. /usr/bin/python) while keeping it with an ACL on the actual script. The test above will tell you it’s all fine and dandy, while in reality the user will run into a permission denied.

Romanofskis Blog

About

Flickr Photos

TUI Acceptance Testing

The current state of affairs

Problems we encountered

tmux different behaviour between releases

Tests fail randomly because of too generic assertion strings

Races between new-session and tmux server

Asserting against terminal colours

Line wrapping in a terminal

Optimisations

User permissions checking

Background

The Test

the users group id

the users group id

Caveats