Data Fabrication

What Is Data Fabrication in TDM

AUG, 2022

by Carlos Schults.

*Update from 15 Mar 2021

Author Carlos Schults

This post was written by Carlos Schults. Carlos is a .NET software developer with experience in both desktop and web development, and he’s now trying his hand at mobile. He has a passion for writing clean and concise code, and he’s interested in practices that help you improve app health, such as code review, automated testing, and continuous build.

In today’s post, we’ll answer what looks like a simple question: what is data fabrication in TDM? That’s such an unimposing question, but it contains a lot for us to unpack.
What is TDM to begin with? Isn’t data fabrication a bad thing? The answer to the last question is no, not in our context. And what about TDM? Well, it stands for Test Data Management, and it’s the process of providing quality data for your software testing process.

The problem is that providing quality data for your software testing process is way trickier than it sounds. And one of the trickiest parts of this already tricky process is obtaining the data itself. This post is all about one of the various techniques you can use to get test data. By the end of the post, you’ll know:

  • what TDM is in detail and why it’s so important
  • the shortcomings of different approaches to obtain test data
  • how test data fabrication can solve them

Sounds good? Then let’s dig in.

 

Enov8 Test Data Manager

*aka ‘Data Compliance Suite’

The Data Securitization and Test Data Management platform. DevSecOps your Test Data & Privacy Risks.

TDM Fundamentals

Let’s start with some basics on test data management. We’ll define TDM and explain why it’s essential. Feel free to skip this section if you’re already familiar with TDM. Otherwise, keep reading.

Test data management (TDM) is the process of obtaining and administrating the data needed for test automation processes, with minimal human intervention

What Is Test Data Management in Software Testing?

I’ve already talked about the what, why, and how of TDM at length before. In that post, I offered the following definition:
Test data management (TDM) is the process of obtaining and administrating the data needed for test automation processes, with minimal human intervention. TDM must ensure not only the quality of the data but also its availability.
That is, TDM must ensure that the test data exists and that it’s available to test cases when needed. And all of that, preferably, in an automated way. But let’s now take a step back and examine the need for quality when it comes to testing data.

Garbage In, Garbage Out: The Importance of High-Quality Test Data

Is quality really so important to software testing? After all, this isn’t real data used by real customers. Why care so much about it?
The answer to that can be summarized with the saying “garbage in, garbage out.” No matter what you’re doing, the quality of input matters a lot. If you give awful ingredients to a five-star chef and expect a great meal, you’re in for some disappointment.
The same reasoning applies to test data. Your test processes might be the best in the world, but if you feed bad data into your tests, you won’t get adequate results.

Data Fabrication in TDM: Why Do You Need It?

Before we get to the meat of the post where we’ll explain what data fabrication is and why you need it, we’ll take a step back and analyze the kind of problem that data fabrication solves.

Production Cloning and Why It’s Painful

One of the most-used techniques people use to obtain realistic test data is copying from production. It solves some crucial challenges related to the realism and availability of test data. You can’t get more real than the real thing.
However, production cloning comes with some problems of its own. One of them is that you can’t just copy real user data and use it as is. For security and privacy reasons, you have to obfuscate the data using techniques like data masking. Employing data masking adds another layer of complexity to the process.
Another production cloning–related challenge is the high cost, especially when it comes to infrastructure. If you need to copy 100 percent of your test data into your test environment, you’ll incur high storage and infrastructure costs. And it doesn’t stop there: nowadays it is common for organizations to have multiple test environments. So you’d be paying this enormous cost over and over again.

Our data fabrication is also called synthetic data generation or simply data generation.

How Does Data Fabrication Differ From Data Falsification? How Can (Test) Data Fabrication Help You?

We’ll now cover what data fabrication is and how it can help you. First, a disclaimer: data fabrication has another meaning, as in “creating false data to support a predetermined conclusion in scientific experiments.” Rest assured that this is certainly not what we’re talking about here!
Our data fabrication is also called synthetic data generation or simply data generation. And it is precisely what its name suggests: a technique to provide test data for test cases by generating it.
Data fabrication presents a number of benefits when compared to alternative techniques to obtain test data, particularly production cloning. That’s what we’ll cover next.

Test Data Fabrication Benefits

If you already have high-quality test data at your fingertips in the form of real data, why would you want to use data fabrication? That’s what we’ll see now: the main reasons why data fabrication might be the best solution.

It Doesn’t Require Masking or Other Obfuscation Techniques

The first big pro of data fabrication or data generation is that you don’t run the risk of exposing or leaking real user data. Since the generated data will be 100 percent synthetic, you won’t need to use techniques such as data masking to protect users’ privacy and security.
That’s one step less in the “obtaining data” phase of your TDM process. The whole process becomes easier, faster, and cheaper. As a bonus, you’ll sleep better at night knowing that you don’t run the risk of violating GDPR or other similar regulations.

It Doesn’t Require Data Subsetting

Data subsetting is the process of getting a smaller portion, a subset, of a production database and moving it somewhere else. When doing production cloning, organizations typically perform data subsetting instead of getting all the data. Since you don’t usually need the same amount of data as you have in production, data subsetting helps you keep your test environment costs down.
However, data subsetting is yet another additional step in your TDM pipeline. Getting rid of it helps you streamline your process, making it simpler, faster, and easier to manage. And leveraging data fabrication allows you to do just that; since you’re creating the test data, you can generate just the amount you need.

It Allows for Different Kinds of Tests

The fact that production cloning gives you real data might seem like a blessing, but it can often be a curse.
You might find yourself in a scenario where what you need is wrong or invalid data. For instance, let’s say you need to perform negative testing in order to see how the system behaves when fed with bad data. Well, in that situation, you do want bad data to emulate unwanted user input or invalid data coming from a third-party system (like a REST API.) While production cloning will only give you the real deal when it comes to data generation, you can generate anything you want, include faulty data.

Test Data: Fabricate It Till You Make It!

Obtaining high-quality test data is essential if you want your organization to have a healthy QA strategy. Some things are easier said than done, though, and acquiring great test data is one of those things.
There are several strategies an organization can use to come up with realistic test data. Production cloning is likely the most popular of said strategies. Another well-known approach is data fabrication, aka synthetic data generation, and that was what today’s post was all about. You now understand not only what data fabrication is but also why you need it in the first place. You also know some of the benefits it provides when compared with other techniques such as production cloning.
Keep in mind that while production cloning isn’t a silver bullet, data fabrication isn’t either. Both are tools that are worth keeping in your toolbelt. When the need arises, you can then make an informed decision about the solution that best suits your organization’s needs.

Innovate with Enov8, the IT Environment & Data Company.

Specializing in the Governance, Operation & Orchestration of your IT systems and data.

Delivering outcomes like:

  • Improved visibility of your IT Fabric,
  • Streamlined Delivery of IT Projects,
  • Operational Standardization,
  • Security & Availability,
  • DevOps / DataOps Automation,
  • Real-Time insights supporting decision making & continuous optimization.

Our Key solutions include:

 

Other Reading

Interested in reading more about Test Data Management.
Why not start here:

Enov8 Blog: What makes a good Test Data Manager?

Enov8 Blog: TDM Strategy Design Guide Best Practices

Enov8 Blog: What is Data Masking? And how do we do it?

 

 

Relevant Articles

8 DevOps Anti-Patterns to Avoid

8 DevOps Anti-Patterns to Avoid

It’s the normal case with software buzzwords that people focus so much on what something is that they forget what it is not. DevOps is no exception. To truly embrace DevOps and cherish what it is, it’s important to comprehend what it isn’t. A plethora...

An Introduction to Application Rationalization

An Introduction to Application Rationalization

In today's fast-paced digital landscape, organizations often find themselves grappling with a sprawling array of applications. While these applications are crucial for various business operations, the lack of a structured approach to managing them can lead to...

What Makes a Great Test Data Management Tool

What Makes a Great Test Data Management Tool

What Makes a Great Test Data Management Tool? In today's fast-paced IT landscape, having a robust Test Data Management (TDM) tool is crucial for ensuring quality, compliance, and efficiency in software development and testing. At Enov8, we pride ourselves on providing...

The Top Application Portfolio Management Tools

The Top Application Portfolio Management Tools

Managing an application portfolio is essential for organizations aiming to optimize their IT operations, reduce costs, and enhance overall efficiency. Application Portfolio Management (APM) tools are designed to help organizations achieve these goals by providing a...

What Is a Test Data Manager?

What Is a Test Data Manager?

Testing is a critical aspect of software development, and it requires the use of appropriate test data to ensure that the software performs optimally. Test data management (TDM) is the process of creating, storing, and managing test data to ensure its quality,...

Sprint Scheduling: A Guide to Your Agile Calendar

Sprint Scheduling: A Guide to Your Agile Calendar

Agile sprints can be a powerful, productive and collaborative event if managed properly. However, when neglected or set up incorrectly they risk becoming chaotic and inefficient. Crafting an effective schedule for your sprint is essential to ensure the success of your...