What is baseline testing in software?
Baseline testing refers to comparing the behavior of a component or system under test to a baseline of previous behavior. The behavior being compared can include API responses, command outputs, file contents, and even performance metrics.
The simplest baseline tests capture the literal output of a deterministic system under test as a "baseline." To test against the baseline, a test harness runs the same test that generated the original baseline, captures the output in the same way as the baseline, then runs a "diff" against the captured output and the original output. If there is a delta (a "diff"), the test fails. If there is no delta ("no diff"), the test passes.
The concept is simple enough that we can create an end-to-end example in just a few minutes.
Suppose we want to create a baseline test for the application hello_world.rb
:
puts 'hello, world!'
Running the program generates:
$ ruby hello_world.rb
hello, world!
We'd capture the output to save it's baseline behavior:
$ ruby hello_world.rb > hello_world.rb.baseline
We now have a baseline in the file hello_world.rb.baseline
. All we need to test this program against the baseline is to run it, pipe the output into diff
, and observe that there is no diff:
$ ruby hello_world.rb | diff hello_world.rb.baseline -
$ echo $?
0
We now have a baseline test!
Now, let's use this baseline test to detect a change in the application hello_world.rb
. Modifying the program to add an additional line:
echo 'puts "hello, world!nUpdated!"' > hello_world.rb
Running the program again and checking the output against the known baseline:
$ ruby hello_world.rb | diff hello_world.rb.baseline -
1a2
> Updated!
$ echo $?
1
There is a diff, and now the tester needs to determine one of two things:
- Does the program have an error? Or,
- Does the baseline need to be updated.
In it's most essential form, that's it!