EVERYTHING WRONG WITH USE CASE BASED SOFTWARE TESTING

Abstract

Most modern commercial software projects implement some sort of quality assurance process. This process often includes Software Testing. And its usually use case based.
However, this approach to testing is in fact unable to give any guarantees, i.e. valid predictions about how software will actually work on production.
Hence, as long as it is important to know how software will work on real world production, use case based testing is simply unfit for measuring quality.
This article will go through why QA should focus on how software will work on production, and production only. Why use case based testing approach is unable to give valid predictions about production.
And that the only way to get such a prediction - is scientific proof.

Introduction

According to most modern development practices, a software company should have a quality assurance process, before delivering a product to a customer.
QA itself is a broader term, but in context of software development, QA for the most part consists of software testing.
What kind of testing, and how much of it should be done - depends on a specific project. But still, there are few popular combinations.
For instance, some manual testing, unit tests, integration tests during CI, functional auto tests, some stress tests, etc.
Another popular practice is to have a testing session - running all available tests (or almost all) before releasing a product.
It is assumed that a product is good enough for release, if testing results pass certain threshold, usually set by product owner.

QA and how software works on production

Right from the start, lets clear up one thing. QA should focus on how software will work on production, and production only.
Yes, this is quite a bold statement. Especially for the start. But there's a justification to it.

Lets start with one story. If you've worked anywhere software development - than you probably already know it.
It is already way past some major Release. And suddenly there's new critical bug. Found and reported from production.
Moreover, that bug is often hard to reproduce. Or it can be reproduced only on production.
If latter is the case, than any attempt to even reproduce this bug is often long and quite tricky thing.
Let alone investigation and making the actual fix.

Moreover, is that even after there is fix, and new release/update had been delivered - there's still pretty high chance of finding another major bug.
Or even the exact same one ( when a fix did worked on dev machine, but didn't on production ).

Whats common here is that all these companies have QA process. And there certainly was a testing session before releasing a product.
And yet, that major bug wasn't found before release.

One might say that this is just one bug. And that, if there will be an investigation on why that bug wasn't found, and actions will take place accordingly - than it won't happen again.
But in practice it Does happen again. Even if there indeed had been an investigation, and actions had been taken accordingly. It still happens again.
And just for the record, this is not a story about one project, or one company, that just does these things the wrong way.
It's a story that happened multiple times, on very different occasions, to vast number of companies.
You can just ask around, and you'll probably find at least couple of this kind of stories right at your first try.

This reoccuring story actually raises quite a few questions. For example, why production is different from development/ test setup ?
It seems obvious now, that it is different. Especially after all that 'reproducible on production only' kind of thing.
Or another quiestion. Why testing session keeps missing major bugs time and time again?

And finally, why the hell QA is not focusing on how software will end up working on real production, and production only ?
You would normally assume that production, really, is the only one that matters. Not dev machines, or test setups - production.
And yet, these stories about major bugs, that can be reproduced only on production, keep coming up time and time again (and make you scratch your head really hard).

We do know, of coarse, that there are cases, when no one cares about quality on production.
But that is Not what most companies declare. Most often, it is the exact opposite - that user/customer experience is very important.
Many companies even have user feedback systems in place. Some even encourage users to participate in beta testing programs.
All of this is aside of warranties, return policies, etc.

So, clearly, user experience on production is declared as valuable. Therefore, QA should provide information on how software will work on production.
Looking back on that story about major bug, found on production - it seems that current QA approach is not doing it.
Although it definitely should.

Use Case based testing and predictions about production

Use case based testing, by design, is unable to give valid predictions about how software will work on production.
This is yet another bold statement. And yet again, there's a justification for it.

For starters, use case based testing lacks one fundamental property - predictive power.
Another words, it lacks the one thing that is actually able to give a valid prediction about anything at all.
(Btw, this is actually why we keep finding new bugs on production like all the time, despite all the efforts).
To fully understand why, we'll need to dig into what typical software testing looks like.

According to many popular software development practices, software project should have some form of software requirements.
These requirements are often based on business requirements. And supposed to represent something that product owner considers as 'working', 'fit to use'.
Or what, as owner thinks, end user will consider as 'working' and 'qualitative'.

Software requirements often become a base for writing use cases. These use cases subsequently become a base for both manual and automated tests.
Moreover, its quite popular to take Software Requirements as a sort of specification. And then follow that specification as close as possible.
Especially, when writing down validation criteria for a test case.

And here comes the tricky part. There are, actually, few things that got to be unraveled before transforming requirements into test cases.
- Which requirements , if any, are critical ?
- When do we want requirements to bet met?
- Where do we want requirements to be met?

It is, actually, quite essential part. Because it defines the very scope and object of testing.
A scope of testing is defined by set of critical requirements. And object is defined by answers to 'when?' and 'where?' questions.

So, how many (if any) of all requirements are critical, i.e. mandatory ?
It could seem obvious at first that all requirements are critical.
Because if not - why bother making a requirement that isn't mandatory in the first place ?

In general, its true. But in reality, we're dealing with software. And software often has complex and interconnected structure.
And for that matter, requirements could easily become dependent on one another. Or even mutually exclusive.
To a point, where respecting one requirement may come at a cost of not respecting the other one.
Plus there's always business priorities. For example, robustness could be way more important than performance.
For that reason requirements start getting priorities. And some of it even become obsolete.

There's actually one more thing that makes a requirement critical.
And it is one key assumption - there is at least some part of software, that definitely is required to work as expected.
This is a key assumption, because it changes prospective from 'it would be nice if it works' to 'it definitely needs to work as expected'.
And with that it changes from 'it'd be nice to know if it'll work' to 'we need a guarantee that software will work as expected'.

And now lets move on to the object of testing. When do we want requirements to be met ?
In most cases, its when end user/customer is using software. Another words, it is about end product, i.e. production.
This is unless, of coarse, requirements state something else. For example, when developing source code.
Quite rare thing, but nevertheless possible.

Yet, in most cases, it is still production that we talk about. To be more specific - when end user will be using the product.
Now, final production is in the future in relation to software developers. And so is end user.
So it turns out, that we actually want software to work as expected in the future!

Next question. Where do we want requirements to be met ?
Well, we've already answered that - on production, i.e. whatever machine/device end user is using.
With exceptions, when requirements explicitly state something other than production.

With all that said, lets now look at a typical software testing session, and what's actually happening there.
Testing session, for the most part, is usually done on separate test setup/machine. Sometimes it includes CI/Build machines, or some other specific setups.
But for the most part, it is just multiple dedicated setups, in several different configurations.
Both manual and automated tests are usually run on these dedicated test setups.
Then, test results are put together into a test report. As an option, test session could inlude calculating statistics, and /or regression.
Which then also becomes a part of a test report.

So, what do we have here by now?
There is a 'present', or rather 'recent past', when tests had been executed.
And then there's 'future', when we want software to work as expected.

There are dedicated test setups, where tests had been run on.
And then there's production, where we want software to work as expected.

Production is in the future. End user is in the future.
But test results are from 'recent past', and from test machine.

How come that test results from the 'recent past' apply for the 'future'?
And how come that results from test setup apply for an actual production ?

In the very beginning of this article, it's already been mentioned that QA should provide valid information about how siftware will work on production.
Hence, its goal is to give a valid prediction about production.
By definition, a valid prediction can only be produced by something that has predictive power - scientific model.
Use case based testing approach neither imply that tests would be run on a model, nor imply that there even would be one.
Another words, test results of use case based tests cannot be used as a valid prediction, because it doesn't have predictive power.
To put it simply, there is no proof, or even reason that test results from 'recent past' would end up the same way in the future.
And there is no proof that test results from test machine (even in the exact time) are exactly the same as it'd be from porduction.

It should be mentioned, that there's an option - to prove a special case, i.e. prove that specific use case based tests have predictive power for a specific piece of software.
But, usually, it is not worth the effort. Most notably because prooving something is a costly process, but proof of a special case is not applicable to other cases on its own.
In order for that specail case to be applicable to other cases there should be a separate proof for that.
But, like we said, prooving something is a costly process.

There's, actually, one more argument to why use case based tests lack predictive power.
The thing is, running a test aiming to know if something Will work, is actually an attempt to measure the future.
And here's a catch - we cannot measure the future (due to fundmental properties of the universe, stated by physics).
Strictly speaking, even task of measuring 'present' is quite tricky on its own. Due to Heisenberg's uncertainty principle[2], or observer effect[3].
More often than not, when trying to measure 'present', the results are actually for rather 'recent past'. Including the case of measuring something with use case based tests.
So it seems kind of unrealistic, to say the least, to try to 'measure' future production with use case based tests.

With all that said, if use case based testing lacks predictive power, than what are the alternatives to it?
What are the options to get valid prediction about production?

Possible ways to predict if software will work as expected on production

Its a tricky question. Because it is actually about knowing the future. Like literally.
No, no fortune telling. We'll go with scientific methods for that.
Largely because scientific methods imply that there is proof of a statement.
Including cases of prooving that a given scientific model, for instance, has predictive power.
And we've already stated that we aim for it (for predictive power).

We'll start with determining future of a computer system. Right from the beginning, there's at least 2 factors when it comes to determining its future state:
- is it deterministic system? [1]
- does it have input data from real world ?

Is it deterministic system?
Deterministic system, by definition, is a system in which no randomness is involved in the development of future states of the system [1]
Therefore there is a way to determine future state of a system. For example, by evaluating an analytical model of that system.

If a system is non-deterministic - than its future state involves some randomness.
Therefore it is the matter of probability. And, although it is possible to calculate future state of such a system, it is out of scope of this discussion.

Input data from real world.
'Input data from real world' just means all analog input. Like input from analog sensors, input from video cameras, etc.
There's one key point about analog input. With quite high accuracy, we can say that no 2 samples of analog input are equal. Although it could coincide.
Example - cover up video camera's lens (so it'll go all black) and take 2 consecutive photos.
No specially designed environment, that was specifically made to produce exact same input for the tests. Nothing like that, just regular camera.
Now, compare 2 images byte by byte. Headers and meta info aside - there's gonna be a difference.
Another example. Try to record 2 audio clips of a supposed silence. Again, no special setups.
And again, compare the clips byte by byte. There's gonna be a difference.

And what if we have a test case (use case based) that has analog input?
The thing is, unless executed on special setup, the exact analog input for that test will differ each time you run a test.
One conclusion from this is yet another argument why use case based test results are not a valid prediction.
The other one is - if software imply analog input, it should be taken into account, when determining future state of a system.

Mathematical proof.
Now lets look at a case, when system is deterministic, but there's no input from real world.
For example, an arbitrary function that depends only on its input parameters.
No context, no OS, no extra factors, nothing.

How can you predict that this function will work as expected 100% of a time?
Since there's no OS context, and output depends only on input - the most obvious answer is mathematical proof.
By definition, mathematical proof is an inferential argument that must demonstrate that a statement is always true.
Replace a 'statement' with 'software requirement' - and there you have it.
You've just proved that requirement is always true.

Scientific models.
Now lets try something a little more sophisticated. Lets say we have deterministic system, that has input from real world.
And say we want to run a use case based test on this system.
We've already covered that, unless run in special environment, no 2 samples of analog will most likely be equal (although could coincide).
Since each and every test run would have different analog input, test result is not a valid predcition whether a test will pass or fail in the future.
As a matter of fact, we've already covered that we need something with predictive power for that.For example, scientific model.

By definition, scientific model seeks to represent empirical objects, phenomena, and physical processes in a logical and objective way.[4]
It aims to construct a formal system that will not produce theoretical consequences that are contrary to what is found in reality.

For a software system, one option is to go with mathematical modeling.
Since software is most commonly a deterministic system, with little to no entropy - mathematical modeling seems a good way to go.

Important note. Having a valid math model and running use case based tests on it, in and on itself, does not mean that software as a whole will work 100% of a time.
That would mean that this exact test, in these exact modeled conditions, will end up the same way on production (pass or fail).
To get a prediction about piece of code (function, class, library), or software as a whole - you'll still need an equivalent of mathematical proof.

One option to get a mathematical proof is to go with proof by exhaustion. Proof by exhaustion, also known as proof by cases, proof by case analysis, complete induction, or the brute force method,
is a method of mathematical proof in which the statement to be proved is split into a finite number of cases or sets of equivalent cases and each type of case is checked to see if the proposition in question holds.[3]
In a context of software testing, this means running a test for all possible ways, that a piece of code can be executed.
This brings us to code coverage. Because running a test for all possible ways that a piece of code can be executed - is closely related to full code coverage.

Extra node Code Coverage


In context of getting a mathematical proof be exhaustion, full code coverage must measure a degree to which tests cover all possible ways that a piece of code can be executed.
This basicaly means, that a popular definition of code coverage is a little off [5]. Because its not the degree to which the source code of a program is executed, that we're interested in.
Its covering all possible ways that certain piece of code can ever be executed.
Another words, popular code coverage criteria are pretty much off the table (not entirely though).

Instead, lets look at the simple example:

unsigned int sum(unsigned int a,unsigned int b)
{
return a+b;
}


What are all possible ways that this function can be executed? It would be combination of:
- combination of all possible values of each input parameter
- all possible execution paths of body of a function
- all possible variants of context, in which function is called.

Context here means all extra dependencies that a particular function can have, i.e. runtime, global variables, network traffic, implicit references to objects in memory, etc.

Lets do the math for our example, starting with input parameters.
Lets say that integer is 4 byte - 2^32 * 2^32 = 2^64 possible variants.

And we haven't even started on function body yet. Or external context.
Speaking of coverage criteria for function body. If we'd go with, for instance, Linear Code Sequence and Jump Criteria [6] -
every 'if-else' would double the number of possible execution paths by factor of 2.
And each 'for' loop of N - would multiply it by N.
You get the point - numbers just start grow exponetially.

Now. Amount of tests needed to achieve full code coverage = 'all possible inputs' * 'all possible execution paths' * 'all possible context'.
Even very simple fucntion, like our example, would require ennormous amount of tests.
For our example, input alone is 2^64. Which is far more than amount of tests that anyone can realistically write for 1 function.
This basically means that proof by exhaustion, using use case base tests is not really realistic, although possible.
And by the way, this is yet another reason why use case based testing, in its current form, lacks predictive power.

Geeting back to proof by exhaustion. Something like statistical proof using data, or probabilistic proof. Or basically any other scientific method of getting a proof can do the job.
All the specifics, like which method to choose, or how exactly to get a proof using that method are out of scope of this article.
But, non the less, getting such a proof is the only way to get a valid prediction about future production.

[1] https://en.wikipedia.org/wiki/Deterministic_system
[2] https://en.wikipedia.org/wiki/Uncertainty_principle
[3] https://en.wikipedia.org/wiki/Observer_effect_(physics)
[4] https://en.wikipedia.org/wiki/Scientific_modelling
[5] https://en.wikipedia.org/wiki/Code_coverage
[6] https://en.wikipedia.org/wiki/Linear_code_sequence_and_jump