
Testing is an integral and vital part of creating software. In fact, test code is as important as your production code. When you create test code, you need to create test data for your code to work against.
This post is about the different types of test data that are used in software testing. I’ll elaborate on each type and explain what test types are used in which scenarios.
Types of Test Data
Let’s take a look in detail.
1. Valid Test Data
As the name implies, this is the data that your program expects and should operate on. You want to create tests with valid data to make sure that the program functions as expected when using data that meets your integrity and validation criteria.
For instance, if you do integration tests as part of a login use case, you will want to provide a correct username and password (in this scenario, valid data) and check that the user is logged in properly.
2. Invalid Test Data
It’s important to make sure that your program knows how to handle data that doesn’t conform to your data integrity and validation requirements.
First things first: your application must not process invalid data as valid data. The code should identify that this data is invalid and handle it accordingly. Usually, invalid input can result in one of the following:
- An error message displayed to the client
- Halting program execution
- Adding an entry in a log file
- Returning a specific HTTP status code
Invalid Data Outcomes
Invalid data usually has three possible outcomes:
- Changing the program control flow and preventing the program from continuing its execution until valid data is entered. For instance, in the example given above of a login page, the user can’t continue without providing valid credentials. Or in the case of trying to add strings in a calculator, an error will be emitted, and no calculation will take place.
- Stopping the execution of the program entirely. For example, if you run a database migration (DB change) and the data is corrupted, the program simply won’t run. It will emit an error message and exit.
- Downgraded performance and functionality. If you have a mobile game that requires credit card data to play the full game and you provide invalid data, you will only be able to play the demo version.
3. Boundary Test Data
When we write code, there are certain limitations on the values we can use that stem from the fact that we run on physical hardware. Physical hardware has its objective capacity limitations.
For example, a PC has only so much RAM to use. In addition, the CPU’s assembly language, the language we write code in, and the compiler have their own sets of restrictions.
Types of Restrictions
Thus, we can’t hold in the C language a number that is higher than 32,000 in an integer type. We can’t store a string in an integer variable in Java and so forth.
Boundary test data is intended to check how our code handles values that are close to the maximum upper limits or exceed them.
Developers usually write code with values in mind that are far from the boundaries of the machine, language, and compiler. However, in many cases values that are near or equal to the boundary are considered valid input and should be handled as such.

In addition, values that exceed the boundaries should be handled gracefully (i.e., with a dedicated error message) and not make the whole program crash (case in point: Microsoft Windows’s “blue screen of death”). Testing boundaries is especially important in the context of load and stress tests when we want to check how the machine performs under high load.
Likewise, boundary tests are especially important in the context of contract tests. Those are usually API tests that check that the API responds properly to a given input. By checking the boundaries of the input, we cover most (if not all) of the possible inputs to the API.
4. Absent Test Data
There is another possibility too: when the program gets no data at all rather than valid or invalid data. It just isn’t there. We refer to this as absent data.
Let’s examine a case when a program expects to fetch some user data from the database to validate credentials against (like in the aforementioned example) but the database doesn’t contain any user data and returns an empty result set.
This is a test case we should be aware of and implement.
As I’ve mentioned, sometimes the data required for the proper functioning of the code just isn’t where we expect it to be, whether in a database, an external service, or some other source.
Handling Absent Test Data
As in the case of invalid data, we should make sure that our code can handle such situations gracefully. And no, a message that says “Something is wrong” is not considered proper handling.
Proper handling, in this case, means preparing for such cases with a secondary data source as a backup in case of the primary source malfunctions. In cases where this is not an option, you should deploy a rapid self-healing mechanism.
In the meantime, you need to return to the client a relevant message that helps them solve the problem if possible.

Ways to Generate Test Data
There are various methods available to generate test data for software testing, each with its own advantages and disadvantages.
One method is manual test data generation, which involves entering data items manually. While this provides maximum control and granularity over the test data, it can be a time-consuming process.
Another method is to copy existing data from production environments. This method can be faster than manual generation, but data security and cleanup may be concerns when importing sensitive data.
Many Test Data Management tools on the market help you create test data for your test environments. For example, Mockaroo and equivalent tools allow you to generate random mock test data in support of software testing.
This can be a huge timesaver and go hand in hand with manual data creation if necessary. The other benefit is data security, by using fake test data you avoid reusing sensitive data and the potential of data breaches.
Data cloning is another way to generate test data, where an existing set of test data is copied and modified to create new test scenarios. This data cloning method can be useful when generating large amounts of similar data quickly, but it may not be suitable for testing unique scenarios.
Ultimately, the choice of method for generating test data will depend on the specific testing requirements and constraints.
Conclusion
In Software Testing, it is important to create solid, functioning software. There are different types of test cases, and there are different test data types you need to prepare for each.
Since creating test data is time-consuming, you can use dedicated tools and services to help you with this task in addition to manually creating your test data.
For a broader understanding of Test Data Management why not check out this TDM article on the Test Data Management lifecycle.
And why not have a look at Enov8’s Test Data Manager. A holistic Test Data Management tool covering all of the key Test Data & Data Security aspects. Including Test Data Profiling (which finds your Personally Identifiable Information), Test Data Masking, Test Data Validation, Realistic Test Data Creation, Test Data Mining & Test Data Bookings.
Enov8 Test Data Management is an important addition to any organization’s software testing optimization and data security solutions.

Post Author
This post was written by Alexander Fridman. Alexander is a veteran in the software industry with over 11 years of experience. He worked his way up the corporate ladder and has held the positions of Senior Software Developer, Team Leader, Software Architect, and CTO. Alexander is experienced in frontend development and DevOps, but he specializes in backend development.
