Data Analysis

Types of Test Data You Should Use for Your Software Tests

APRIL, 2022

by Alexander Fridman.

 

Author Alexander Fridman

This post was written by Alexander Fridman. Alexander is a veteran in the software industry with over 11 years of experience. He worked his way up the corporate ladder and has held the positions of Senior Software Developer, Team Leader, Software Architect, and CTO. Alexander is experienced in frontend development and DevOps, but he specializes in backend development.

Testing is an integral and vital part of creating software. In fact, test code is as important as your production code. When you create test code, you need to create test data for your code to work against. This post is about the different types of test data that are used in software testing. I’ll elaborate on each type and explain what test types are used in which scenarios.

 

Enov8 Test Data Manager

*aka ‘Data Compliance Suite’

The Data Securitization and Test Data Management platform. DevSecOps your Test Data & Privacy Risks.

Types of Test Data

Valid Test Data

As the name implies, this is the data that your program expects and should operate on. You want to create tests with valid data to make sure that the program functions as expected when using data that meets your integrity and validation criteria. For instance, if you do integration tests as part of a login use case, you will want to provide a correct username and password (in this scenario, valid data) and check that the user is logged in properly.

Invalid Test Data

It’s important to make sure that your program knows how to handle data that doesn’t conform to your data integrity and validation requirements. First things first: your application must not process invalid data as valid data. The code should identify that this data is invalid and handle it accordingly. Usually, invalid input can result in one of the following:

  • An error message displayed to the client
  • Halting program execution
  • Adding an entry in a log file
  • Returning a specific HTTP status code

Invalid Data Outcomes

Invalid data usually has three possible outcomes:

  1. Changing the program control flow and preventing the program from continuing its execution until valid data is entered. For instance, in the example given above of a login page, the user can’t continue without providing valid credentials. Or in the case of trying to add strings in a calculator, an error will be emitted, and no calculation will take place.
  2. Stopping the execution of the program entirely. For example, if you run a database migration (DB change) and the data is corrupted, the program simply won’t run. It will emit an error message and exit.
  3. Downgraded performance and functionality. If you have a mobile game that requires credit card data to play the full game and you provide invalid data, you will only be able to play the demo version.

Boundary Test Data

When we write code, there are certain limitations on the values we can use that stem from the fact that we run on physical hardware. Physical hardware has its objective capacity limitations. For example, a PC has only so much RAM to use. In addition, the CPU’s assembly language, the language we write code in, and the compiler have their own sets of restrictions.

Types of Restrictions

Thus, we can’t hold in the C language a number that is higher than 32,000 in an integer type. We can’t store a string in an integer variable in Java and so forth. Boundary test data is intended to check how our code handles values that are close to the maximum upper limits or exceed them.

Developers usually write code with values in mind that are far from the boundaries of the machine, language, and compiler. However, in many cases values that are near or equal to the boundary are considered valid input and should be handled as such. In addition, values that exceed the boundaries should be handled gracefully (i.e., with a dedicated error message) and not make the whole program crash (case in point: Microsoft Windows’s “blue screen of death”). Testing boundaries is especially important in the context of load and stress tests when we want to check how the machine performs under high load.

Likewise, boundary tests are especially important in the context of contract tests. Those are usually API tests that check that the API responds properly to a given input. By checking the boundaries of the input, we cover most (if not all) of the possible inputs to the API.

Absent Test Data

There is another possibility too: when the program gets no data at all rather than valid or invalid data. It just isn’t there. We refer to this as absent data. Let’s examine a case when a program expects to fetch some user data from the database to validate credentials against (like in the aforementioned example) but the database doesn’t contain any user data and returns an empty result set. This is a test case we should be aware of and implement. As I’ve mentioned, sometimes the data required for the proper functioning of the code just isn’t where we expect it to be, whether in a database, an external service, or some other source.

Handling Absent Test Data

As in the case of invalid data, we should make sure that our code can handle such situations gracefully. And no, a message that says “Something is wrong” is not considered proper handling. Proper handling, in this case, means preparing for such cases with a secondary data source as a backup in case of the primary source malfunctions. In cases where this is not an option, you should deploy a rapid self-healing mechanism. In the meantime, you need to return to the client a relevant message that helps them solve the problem if possible.

Ways to Generate Test Data

There are various methods available to generate test data for software testing, each with its own advantages and disadvantages.

One method is manual test data generation, which involves entering data items manually. While this provides maximum control and granularity over the test data, it can be a time-consuming process.

Another method is to copy existing data from production environments. This method can be faster than manual generation, but data security and cleanup may be concerns when importing sensitive data.

Many Test Data Management tools on the market help you create test data for your test environments. For example, Mockaroo and equivalent tools allow you to generate random mock test data in support of software testing. This can be a huge timesaver and go hand in hand with manual data creation if necessary. The other benefit is data security, by using fake test data you avoid reusing sensitive data and the potential of data breaches.

Data cloning is another way to generate test data, where an existing set of test data is copied and modified to create new test scenarios. This data cloning method can be useful when generating large amounts of similar data quickly, but it may not be suitable for testing unique scenarios.

Ultimately, the choice of method for generating test data will depend on the specific testing requirements and constraints.

Conclusion

In Software Testing, it is important to create solid, functioning software. There are different types of test cases, and there are different test data types you need to prepare for each. Since creating test data is time-consuming, you can use dedicated tools and services to help you with this task in addition to manually creating your test data. 

For a broader understanding of Test Data Management why not check out this TDM article on the Test Data Management lifecycle.

And why not have a look at Enov8’s Test Data Manager. A holistic Test Data Management tool covering all of the key Test Data & Data Security aspects. Including Test Data Profiling (which finds your Personally Identifiable Information), Test Data Masking, Test Data Validation, Realistic Test Data Creation, Test Data Mining & Test Data Bookings. Enov8 Test Data Management is an important addition to any organization’s software testing optimization and data security solutions.

Other Reading

Interested in reading more about Test Data Management.
Why not start here:

Enov8 Blog: What makes a good Test Data Manager?

Enov8 Blog: TDM Strategy Design Guide Best Practices

Enov8 Blog: What is Data Masking? And how do we do it?

 

 

Relevant Articles

Advancing AI – Through DB Virtualization and TDM

Advancing AI – Through DB Virtualization and TDM

May,  2024 by Jane Temov. Author Jane Temov.  Jane is a Senior Consultant at Enov8, where she specializes in products related to IT and Test Environment Management, Enterprise Release Management, and Test Data Management. Outside of her professional work, Jane enjoys...

Enterprise Release Management: The Ultimate Guide

Enterprise Release Management: The Ultimate Guide

April,  2024 by Niall Crawford   Author Niall Crawford Niall is the Co-Founder and CIO of Enov8. He has 25 years of experience working across the IT industry from Software Engineering, Architecture, IT & Test Environment Management and Executive Leadership....

Enov8 DCT – The Data Control Tower

Enov8 DCT – The Data Control Tower

April,  2024 by Jane Temov. Author Jane Temov.  Jane is a Senior Consultant at Enov8, where she specializes in products related to IT and Test Environment Management, Enterprise Release Management, and Test Data Management. Outside of her professional work, Jane...

Understanding ERM versus SAFe

Understanding ERM versus SAFe

April,  2024 by Jane Temov. Author Jane Temov.  Jane is a Senior Consultant at Enov8, where she specializes in products related to IT and Test Environment Management, Enterprise Release Management, and Test Data Management. Outside of her professional work, Jane...

Serverless Architectures: Benefits and Challenges

Serverless Architectures: Benefits and Challenges

April,  2024 by Jane Temov. Author Jane Temov. Jane is a Senior Consultant at Enov8, where she specializes in products related to IT and Test Environment Management, Enterprise Release Management, and Test Data Management. Outside of her professional work, Jane enjoys...