Blog

+44 (0)207 993 2287

Well has the test failed or hasn’t it?

by Neil Hudson

Neil Hudson

When should you classify a test as Failed? This sounds such a simple question and you may think the answer is obvious; however there are some factors that mean a well thought out approach can have significant benefits to the test manager.

Introduction

Generally one of the states used in test reporting is Failed. A common assumption, one that is generally sound, is that failed tests mean you have problems. Given typical practice a less well founded extension of this goes that failed tests indicate the system has problems doing the things that those tests were testing. Years of attempting to understand what is really going on inside projects show that this is the point at which the complexity of the real world overwhelms the abstract model of tests and test failures.

Think about the simple abstract model. Tests have actions and, hopefully, expected outcomes. If the action is done correctly and the outcome does not match the expectation then the test has Failed. Simple or what? This model is applied on a regular basis all over the world so what is the issue? Issues come in many forms and will be illustrated here using three examples.

Example One - Environmental Problem

Our system allows external users to submit transactions through a web-portal. There is a change to the way these submissions are to be presented to internal users on the backend system. If the submission has an attachment this is flagged to the user. One type of transaction has three modes; two tests are passed and the third is failed. Over a number of days a common understanding across both the test and development team builds up that the change works for two of the three modes and does not work for the third. Only when we dig into the detail to decide whether to release with the issue or not do we discover that transactions for the third mode fail to submit at the portal. No on had managed to get this transaction in; the handling of it in the backend had not been tried.

The real problem was a test environment configuration issue that derailed this test. The test was marked as Failed and the story began to develop that the third mode did not work. This test had not Failed it was blocked and unable to progress and discharge its purpose.

Example Two - Incorrect Search Results

To test that billing accurately consolidates associated accounts these associations have to be created and then the accounts billed. To associate accounts one account is selected as the master and then a search facility is used to obtain the list of accounts that can be associated; selections are then made from the list. After this billing can be tested. When the search is done it returns the wrong accounts and association attempts fail. Has the test failed?

If the test is classified as failed this tends to (well should) indicate that when you bill associated accounts then the bill is wrong. So marking tests like this as failed sends the wrong message. The test can’t be completed and a fault has been observed and can’t be ignored, but this fault is not to do with the thing being tested.

Example Three - Missing Input Box

A test navigates through a sequence of common HCI areas. On one page it is observed that one of the expected input boxes is missing. This doesn’t bother us as the test doesn’t use it. Everything works well for the test. Has it Passed?

The most meaningful outcome for the test is that it Passed; but then that leaves the defect that was observed floating around so shouldn’t it be marked as failed to ensure it is re-tested?

An Alternative model of Failure.

Those were just three examples. There are many similar variations; so what rules should be used to decide whether to claim Failure? Generally a test should have a purpose and should include explicit checks that assess whether the thing tested by that purpose has or has not worked correctly. An expected result after an action may be such a check; alternatively a check may require more complex collection and analysis of data. Checks should relate to the purpose of the test. Only if a check is found to be false should the test be marked as Failed. If all the checks are ok then the test is not Failed even if it reveals a defect.

The role of Expected Results

So are all expected results checks? Often there are expected results at every step; from logging in through navigation to finally leaving the system. Given this the position is a very very strong no. Many expected results in tests serve a utility purpose. They verify some step has been done as required; they often say little about the thing the test is actually needed to prove. If you don’t get the expected result then it means there is a problem some where; a problem with the test, with the way it is executed or with the system; however it does not necessarily mean that there is a problem with the thing being tested. Only when there is a definite problem with that should the test claim to be a Failure.

Orphaned Defects

That leaves defects that are triggered when running tests but that don’t mean the test has Failed. We could end up with no tests Failed, perhaps even all Passed, and a stack of defects; this is counter intuitive so what is going on? Actually the discipline of refusing to fail tests unless an explicit check fails provides very useful feedback.The statistical discrepancy can indicate:

(a) That the tests do not have adequate checks; they are revealing errors in the thing being tested that can be seen but nothing in the test itself says check for that. Time to improve the test and then mark it as Failed. Improving the test is required to make the defect detection delivered by the tests consistent; we should only depend on explicitly defined error detection.

(b) That we are finding errors in things that are not being tested as no test is failing as a result of the defect. For control purposes add tests that do Fail because of the defects. Also is this indicating a major hole in regression or testing of the changes? If so is action required?

(c) That there are environmental problems disrupting test activities.

Conclusion

Adopting an approach that governs, actually restricts, when a test can be marked as Failed to circumstances where an explicit check has shown an issue provides more precise status on the system and improved feedback on the quality of the testing. Furthermore this reduces the discrepancy between the picture painted by test results and the actual state of the release and the management time required to resolve this.