Types of Test Data You Should Use for Your Software Tests
by Alexander Fridman
Testing is an integral and vital part of creating software. In fact, test code is as important as your production code. When you create test code, you need to create test data for your code to work against. This post is about the different types of test data that are used in software testing. I’ll elaborate on each type and explain what test types are used in which scenarios
Types of Test Data
Valid Test Data
As the name implies, this is the data that your program expects and should operate on. You want to create tests with valid data to make sure that the program functions as expected when using data that meets your integrity and validation criteria. For instance, if you do integration tests as part of a login use case, you will want to provide a correct username and password (in this scenario, valid data) and check that the user is logged in properly.
Invalid Test Data
It’s important to make sure that your program knows how to handle data that doesn’t conform to your data integrity and validation requirements. First things first: your application must not process invalid data as valid data. The code should identify that this data is invalid and handle it accordingly. Usually, invalid input can result in one of the following:
- An error message displayed to the client
- Halting program execution
- Adding an entry in a log file
- Returning a specific HTTP status code
Invalid Data Outcomes
Invalid data usually has three possible outcomes:
- Changing the program control flow and preventing the program from continuing its execution until valid data is entered. For instance, in the example given above of a login page, the user can’t continue without providing valid credentials. Or in the case of trying to add strings in a calculator, an error will be emitted, and no calculation will take place.
- Stopping the execution of the program entirely. For example, if you run a database migration (DB change) and the data is corrupted, the program simply won’t run. It will emit an error message and exit.
- Downgraded performance and functionality. If you have a mobile game that requires credit card data to play the full game and you provide invalid data, you will only be able to play the demo version.
Boundary Test Data
When we write code, there are certain limitations on the values we can use that stem from the fact that we run on physical hardware. Physical hardware has its objective capacity limitations. For example, a PC has only so much RAM to use. In addition, the CPU’s assembly language, the language we write code in, and the compiler have their own sets of restrictions.
Types of Restrictions
Thus, we can’t hold in the C language a number that is higher than 32,000 in an integer type. We can’t store a string in an integer variable in Java and so forth. Boundary test data is intended to check how our code handles values that are close to the maximum upper limits or exceed them.
Developers usually write code with values in mind that are far from the boundaries of the machine, language, and compiler. However, in many cases values that are near or equal to the boundary are considered valid input and should be handled as such. In addition, values that exceed the boundaries should be handled gracefully (i.e., with a dedicated error message) and not make the whole program crash (case in point: Microsoft Windows’s “blue screen of death”). Testing boundaries is especially important in the context of load and stress tests when we want to check how the machine performs under high load.
Likewise, boundary tests are especially important in the context of contract tests. Those are usually API tests that check that the API responds properly to a given input. By checking the boundaries of the input, we cover most (if not all) of the possible inputs to the API.
Absent Test Data
There is another possibility too: when the program gets no data at all rather than valid or invalid data. It just isn’t there. We refer to this as absent data. Let’s examine a case when a program expects to fetch some user data from the database to validate credentials against (like in the aforementioned example) but the database doesn’t contain any user data and returns an empty result set. This is a test case we should be aware of and implement. As I’ve mentioned, sometimes the data required for the proper functioning of the code just isn’t where we expect it to be, whether in a database, an external service, or some other source.
Handling Absent Test Data
As in the case of invalid data, we should make sure that our code can handle such situations gracefully. And no, a message that says “Something is wrong” is not considered proper handling. Proper handling, in this case, means preparing for such cases with a secondary data source as a backup in case of the primary source malfunctions. In cases where this is not an option, you should deploy a rapid self-healing mechanism. In the meantime, you need to return to the client a relevant message that helps them solve the problem if possible.
Ways to Generate Test Data
Test data preparation can be a time-consuming process. Especially if you need large amounts of test data or the test data required is diverse and multifaceted. There are different ways to prepare test data, each with its own pros and cons.
Manual Test Data Generation
This is the most time-consuming method. You have to manually enter each data item. The upside is that this allows you maximum control and granularity. You know what your test data is, and you can tweak and tune it as much as you want.
Copying Existing Data From Existing Environments
If you have data in production, you can sometimes use it for your tests. It’s a great deal faster than creating all types of test data manually from scratch. This method allows you to import large volumes of data instantly. On the other hand, sometimes, for data security reasons, you don’t want to export your production data to a less secure environment. Certainly not without using methods like data masking. That’s especially true if your test data contains sensitive data like medical or financial information.
In addition, since you export the test data as a whole, a cleanup of the exported test data might be necessary to make it fit for your tests, which in turn takes an additional toll on the test preparation time.
Using Test Automation Tools
Many Test Data Management tools on the market help you create test data for your test environments. For example, Mockaroo and equivalent tools allow you to generate random mock test data in support of software testing. This can be a huge timesaver and go hand in hand with manual data creation if necessary. The other benefit is data security, by using fake test data you avoid reusing sensitive data and the potential of data breaches.
In Software Testing, it is important to create solid, functioning software. There are different types of test cases, and there are different test data types you need to prepare for each. Since creating test data is time-consuming, you can use dedicated tools and services to help you with this task in addition to manually creating your test data.
For a broader understanding of Test Data Management why not check out this TDM article on the Test Data Management lifecycle.
And why not have a look at Enov8’s Test Data Manager. A holistic Test Data Management tool covering all of the key Test Data & Data Security aspects. Including Test Data Profiling (which finds your Personally Identifiable Information), Test Data Masking, Test Data Validation, Realistic Test Data Creation, Test Data Mining & Test Data Bookings. Enov8 Test Data Management is an important addition to any organization’s software testing optimization and data security solutions.
This post was written by Alexander Fridman. Alexander is a veteran in the software industry with over 11 years of experience. He worked his way up the corporate ladder and has held the positions of Senior Software Developer, Team Leader, Software Architect, and CTO. Alexander is experienced in frontend development and DevOps, but he specializes in backend development.
20MAY, 2022 by Jane TemovMost organizations employ strong security measures to keep production data secure while being made available for day-to-day business activity. However, Data may be utilized for less secure activities like testing and training, or by third...
15MAY, 2022 by Ukpai Ugochi & Arnab Roy Chowdhury. Modified by Eric Goebelbecker.As a DevOps manager or agile team leader, how do you ensure that users’ sensitive information is properly secured? Users are on the internet daily for communication, business, etc....
15May, 2022 by Carlos Schults & Justin Reynolds. Modified by Eric Goebelbecker.Organizations today are using more data than ever before. Indeed, data plays a critical role in decision-making for everything from sales and marketing to the production and development...
15MAY, 2022 by Jane TemovRelease Management, from an enterprise software definition, is the process Release Managers use for planning, executing, and monitoring a software release. It involves coordinating developers, testers, operations staff, and end-users to ensure...
05May, 2022 by Niall Crawford & Justin Reynolds. Modified by Eric Goebelbecker.Test data is one of the most important components of software development. That’s because without accurate test data, it’s not possible to build applications that align with today's...
16APRIL, 2022 by Justin ReynoldsThe IT landscape is rapidly changing, with companies becoming increasingly distributed, cloud-driven, and agile. In order to minimize complexity and ensure operational efficiency, it’s critical to maintain full visibility and control...