Top 5 Test Data Management Metrics You Should Be Aware

19
FFEBRUARY, 2021 by Carlos Schults
“You can’t improve what you don’t measure.” I’m sure you’re familiar with at least some variation of this phrase. The saying, often attributed to Peter Drucker, speaks to the importance of metrics as fundamental tools to enrich and improve business processes of all kinds. Metrics are crucial in many different areas. Software development and software testing are certainly no exception. That’s why today we’re here to talk about TDM metrics.  
What are TDM metrics? Why should you care about them? In short, TDM metrics are indicators you use to gauge the health of your Test Data Management approach. We already know that TDM is crucial for having a solid testing strategy. A great testing strategy is, in its turn, essential for developing and delivering high-quality software in a timely manner. So, it follows that TDM metrics play a crucial—yet often overlooked—role in a well-rounded software quality strategy. That’s what this post is all about. We’ll start with a brief but important introduction to the concept of TDM. After explaining what TDM is all about and why it’s so important, we’ll move to TDM metrics. You already know it’s impossible to improve what you don’t measure, but you might be wondering whether there are more detailed reasons to be so concerned about TDM metrics. There are, and you’ll get to know them in that section. Finally, we’ll walk you through our list of five essential TDM metrics. By the end of the post, you’ll understand not only why TDM is important but also which TDM metrics you should track and improve in order to get the most out of your testing strategy. Let’s get started.

Test Data Management: A Brief Introduction

Before we move on to TDM metrics, we’ll offer a brief “what and why” of the concept, so we’re all on the same page regarding its meaning, importance, and the reasons behind its use.

Defining TDM

TDM, as we’ve already mentioned, stands for Test Data Management. Though we do have a whole other post explaining TDM and its importance in-depth, here comes the TL;DR version: Test Data Management is the process of providing high-quality data to your test environments in a mostly or totally automated way. That’s pretty much it, really. However, keep in mind that TDM has to ensure not only the existence of test data but also its quality and its availability. Test data needs to be there when test cases need it, and it must appear in the right amounts and conditions. Additionally, don’t forget we currently live in the post-GDPR era. There are lots of privacy and security concerns when it comes to test data. For instance, one of the most popular techniques for obtaining test data is production cloning (i.e., copying real data from production servers). So, to protect users’ privacy and keep your organization compliant, there are more procedures—like data masking—you need to perform on the data before it’s ready for use.

Is It Worth Caring So Much About Test Data?

After reading the definition above, you might’ve reached the conclusion that TDM is really difficult to pull off. Maybe you’re even wondering whether it’s really worth it. Well, managing test data is indeed a challenging task. Is it worth it, though? It sure is. The importance of test data can be summarized with the old saying, “garbage in, garbage out.” If you feed poor test data to your test cases, you’ll get poor results. Full stop. It won’t matter a bit that you have a solid testing strategy with talented professionals using all of the bleeding edge tools. Don’t get me wrong; all of those things are good and necessary. But all of that investment will have been for nothing if you don’t care about providing great test data to your tests.

TDM Metrics: 5 You Should Be Aware Of

Without further ado, let’s cover five important traits of a useful test data set. By tracking and improving said characteristics, you’ll be on your way to improving your TDM approach.

Data Literacy

What we mean by data literacy is the capacity to process, understand, and analyze one’s data. Data literacy is a versatile term. It’s simultaneously a skill that professionals dealing with data should have and a desirable trait in said data. Perhaps when it comes to the trait of the data, a better name would be data readability, but the point is the same. Test data should have the capacity to be understood, processed, and analyzed, especially by different computer systems than the ones used to create it. That’s particularly important when it comes to techniques for data obtention that rely on real data being copied from production servers.

Data Security

The second item on our list is a no-brainer. It should surprise zero people that it’s essential for test data—and data in general—to be secure. Though that’s always been the case, nowadays digital security is more important than never. Organizations must employ all means necessary to ensure that sensitive data—especially users’ personal data—won’t be accessed by unauthorized actors, which includes testing professionals such as QA analysts and testers. Securing data isn’t just the morally correct thing to do. It’s the legally required thing to do due to GDPR and similar legislation across the world. But it’s also the smart thing to do. Regardless of privacy regulations, protecting user data is the right move because failing to do so harms an organization in many ways, especially by damaging an organization’s reputation. So, approaches like data masking become vital when obtaining test data from production cloning.

Data Age

Test data gets old. That might sound like a weird statement, but it’s true. What do I mean by it? In your application, there might be certain tasks or procedures that use timestamped pieces of data. The procedures might require that those dates be recent. What would happen if the data the test would use was obtained from a production snapshot from three years ago? That’s right: The tests targeting those procedures wouldn’t work well, or they might produce inconsistent results. That’s why data age might be an important metric to monitor. Depending on how important it is for your testing strategy that your test data is “fresh,” it might be useful to use techniques such as test data aging to artificially tamper with the dates in the test data.

Data Quality

We use “quality” here as an umbrella term to cover a few different traits that useful test data should have. For instance, for test data to be successfully used in test cases, it has to maintain integrity. That basically means it respects the constraints and rules of the database schema and adheres to the domain rules of the application under test. For instance, suppose your application is a program to manage schools. One of your domain rules is that a student must be enrolled in at least one course. A student not enrolled in any course is a violation. What could’ve caused such an invalid state? If you’re dealing with “fake” test data, then the answer is a faulty synthetic data generation process. On the other hand, if you’re working with data cloned from production, something might’ve gone bad when performing a process such as data subsetting. Regardless of the cause, what really matters is having mechanisms in place to enable you to detect and fix such inconsistencies in quality.

Automation

Automation is a metric/quality less related to your test data itself and more to your whole TDM approach. How widespread is the use of automation throughout your organization? Are you having data compliance processes built into your DevOps strategy? More importantly, how automated is the tracking of your data literacy, data security, data age, and data quality metrics? It’s essential to automate the process of test data profiling and verification. Only with automated alerts and monitoring will you be able to really improve the traits above and take your TDM approach to the next level.

Give Your Tests Some Love by Providing Them Great Data

Today we’ve covered yet another TDM-related topic. Namely, TDM metrics. We’ve argued that since software testing is crucial for achieving high-quality software and great test data is essential for a healthy testing strategy, what follows is that anything you do to improve your TDM approach is an important component of your overall software quality strategy. TDM metrics have been a mostly overlooked piece of the software quality puzzle. In this post, we’ve set out to change that scenario by offering a list of five TDM metrics that, if tracked and improved upon, have the ability to make your testing strategy more efficient and sound.
Carlos Schults This post was written by Carlos Schults. Carlos is a .NET software developer with experience in both desktop and web development, and he’s now trying his hand at mobile. He has a passion for writing clean and concise code, and he’s interested in practices that help you improve app health, such as code review, automated testing, and continuous build.

Relevant Articles

DevSecOps vs Cybersecurity: Understanding the Relationship

DevSecOps vs Cybersecurity: Understanding the Relationship

Both DevSecOps and cybersecurity are gaining a lot of interest and demand in the IT industry. With everything going digital, security has become one of the main focuses of every organization. And DevSecOps and cybersecurity are the supreme practices to achieve high...

What is Test Data? Understanding Its Role in Testing

What is Test Data? Understanding Its Role in Testing

Test data is the lifeblood of testing – it’s what enables us to evaluate the quality of software applications across various industries such as healthcare, insurance, finance, government, and corporate organizations. And, reminiscent of actual lifeblood, testing would...

11 Important Application Rationalization Benefits

11 Important Application Rationalization Benefits

In most enterprises, the number of applications in use has grown far beyond what’s practical to manage. And that's putting it mildly. Each department tends to adopt tools to meet its own needs, sometimes duplicating functionality that already exists elsewhere. Over...

Sprint Scheduling: A Guide to Your Agile Calendar

Sprint Scheduling: A Guide to Your Agile Calendar

Agile sprints can be a powerful, productive and collaborative event if managed properly. However, when neglected or set up incorrectly they risk becoming chaotic and inefficient. Crafting an effective schedule for your sprint is essential to ensure the success of your...

What is Enterprise IT Intelligence?

What is Enterprise IT Intelligence?

We have all heard of the term Business Intelligence (BI), coined in 1865 (in the "Cyclopaedia of Commercial and Business Anecdotes") and described more recently by Gartner as “an umbrella term that includes the applications, infrastructure and...

Database Virtualization and Ephemeral Test Environments

Database Virtualization and Ephemeral Test Environments

Introduction: Why This Matters Across every industry, enterprises are being asked to do more with less. Deliver digital services faster. Reduce costs. Strengthen compliance. And achieve all of this without compromising resilience. Yet despite significant investment in...