Select Page

What Is Data Masking and How Do We Do It?


AUGUST, 2020

by Michiel Mulders

According to the 2019 IBM Data Breach report, the average data breach in 2019 cost 3.92 million USD. Businesses in certain industries, such as healthcare, suffer more substantial losses—6.45 million USD on average. As the amount of confidential data increases, so does the need to protect it. Data masking helps secure private data from malicious third parties wanting to abuse it. In this article, we’ll explain the process of data masking and why it’s crucial for protecting sensitive data.


What Is Data Masking?

Enterprises use data masking or data obfuscation to recognize and hide sensitive data. This sensitive data can vary from personal data such as phone numbers to intellectual property. There are several different ways in which data masking takes form. The general idea is that at the end of the process, the data will be safe. A concrete example would be a credit card number that has been scrambled or blurred.

Maybe you’ve already come across different types of data masking, such as static or dynamic masking. Static masking is what we call masking of data in stasis. Dynamic data masking is when the data masking happens on demand.

Often the database processes data whenever someone requests data from a production database. What happens is that database users connect to the database through a reverse proxy. This reverse proxy processes the requests and decides if it should mask the returned data or not. This depends on the permissions of the user. We’ll cover the different techniques used in this process more in detail.

Why Is Data Masking Important?

The amount of data that exists keeps growing each year. For every database in production, an organization often has several copies in circulation for various purposes. Though production databases are often well protected, the same is not always true for these copies. They often end up with third parties who use sensitive data for malicious purposes.

As the total amount of data increases, so does the risk of a data breach. It is near impossible to create a completely foolproof system around each copy of your databases. This is true especially when not everybody with access to the data has the technological literacy we’d hope. But, an enterprise can neutralize several factors that make data breaches so expensive. A good rule of thumb is that no active database should contain unmasked data.

The idea is to secure the data from abuse but leave it similar enough for a team of developers to still use it for valid tests. The goal of data masking is to reduce the impact of a possible data breach and improve data security. Data masking achieves this because no actual sensitive customer information links to the values.

What Are the Benefits of Data Masking?

Many organizations see the reduced cost of a potential data breach as a major benefit of data masking. The factors that make data breaches so expensive are third-party breaches and compliance failures.

Now imagine that these third parties have no access to the actual data, but only to masked instances of the data. It would take away the possibility for them to abuse it. And even though data breaches can still occur, there would be no compliance failure if you’ve masked the leaked data. In other words, the enterprise that masks its data will not incur the high fees that come with compliance failures.

Another scenario would be a disgruntled employee who wants to access records for malicious purposes. Or an employee who leaks data through incompetence or negligence. Masking data makes sure that leaked data is not destructive for the enterprise.

Often companies have a lot of security around the production database. But copies for databases such as backups or databases used for testing often end up with third parties who are left unaccountable for their abuse of the sensitive data. Through several techniques, data masking renders personal records and intellectual property less prone to abuse.

Data Masking Techniques for You to Try

There are several different techniques for masking sensitive data.

Random Substitution

First off, you could use random substitution for your sensitive records. As the name suggests, random substitution can be achieved by replacing numbers with randomly selected numbers. The same applies to letters. Random substitution is useful for masking credit card numbers but also for dates, names, addresses, telephone numbers, and other types of personal records.

Algorithmic Substitution

Another technique often used for data masking is algorithmic substitution. This technique is quite similar to the first technique we discussed. Random substitution is still used, but algorithmic substitution takes into account that certain patterns are still respected. For example, postal codes or telephone numbers could be masked, but the value can still be recognized as being a postal code or a telephone number. This can be useful because the data masked with algorithmic substitution could still be used for testing, since it is not completely randomized.

Data Obfuscation or Data Blurring

Organizations commonly use data blurring to hide sensitive data. In the context of masking sensitive data, blurring means that you add a random variance to the existing values. After this masking process, the data is still an approximation of its original value, but it is modified to such an extent that it is not considered subject to a data breach. An example of this would be to add a variation of 20% of salary values, making it a less accurate representation. For example, a salary of $100,000 might be obfuscated in a range between $80,000 and $120,000.

Selective Masking

Instead of completely scrambling or blurring data values, you could use selective masking. This masks only a section of the sensitive data, thus making it harmless in data breaches. Randomizing a certain part of a telephone number or altering the domain name of an email address are examples of selective masking. Selective masking offers the same advantage as an algorithmic substitution. When you use either of these techniques, you can use data safely in testing environments because they don’t reduce data integrity.

Nulling Technique

A final technique for masking data is nulling. Nulling out or deleting data turns values into a null or empty value in the database. This method may seem quite crude, since it does reduce the data integrity. Still, nulling is a method where you can be certain that the sensitive data is safe from third parties. The obvious downside of this method is that the data will be irretrievable for everyone, thus making it useless for purposes of testing

You have the option to use the techniques we discussed in either a static or a dynamic manner. The difference between these two types of masking is explained here. Dynamic masking occurs in real-time where data is obfuscated on the spot. Alternatively, static masking makes a copy of the data to further apply one or more masking techniques. It’s important to keep in mind that the best strategy to use data masking differs depending on how you store and use your data.

Data Masking: The Right Thing to Do

Remember that data masking techniques can help you avoid the disastrous consequences of a data breach. Providing that extra layer of security to your database by blurring, nulling, or randomizing sensitive data is an absolute must. Using data masking should be a no-brainer for a company that handles databases, deals with personal records, or manages intellectual property.

These techniques could help you avoid the 3.92 million USD price tag on a data breach. This figure includes both lost revenue and fees for failing to comply with privacy laws. Plus, protecting your customers’ data is simply the right thing to do.


Michiel Mulders

This post was written by Michiel Mulders. Michiel is a passionate blockchain developer who loves writing technical content. Besides that, he loves learning about marketing, UX psychology, and entrepreneurship. When he’s not writing, he’s probably enjoying a Belgian beer!

Relevant Articles

Top TDM Metrics

Top TDM Metrics

19 FFEBRUARY, 2021 by Carlos Schults "You can't improve what you don't measure." I'm sure you're familiar with at least some variation of this phrase. The saying, often attributed to Peter Drucker, speaks to the importance of metrics as fundamental tools to enrich and...

Structured Versus Unstructured Data

Structured Versus Unstructured Data

08 FEBRUARY, 2021 by Zulaikha Greer Data is the word of the 21st century. The demand for data analysis skills has skyrocketed in the past decade. There exists an abundance of data, mostly unstructured, paired with a lack of skilled professionals and effective tools to...

Enterprise Environments: Understanding Deployment at Scale

Enterprise Environments: Understanding Deployment at Scale

04 JANUARY, 2021 by Ukpai Ugochi Have you ever wondered what would happen if you mistakenly added bugs to your codes and shipped them to users? For instance, let's say an IT firm has its primary work tree on GitHub, and a team member pushes codes with bugs to the...

What makes a good Test Environment Manager?

What makes a good Test Environment Manager?

07 DECEMBER 2020 by Daniel de Oliveira In today’s application-based world, companies are releasing more applications than ever before. Software delivery life cycles are becoming more complicated. As a result, large companies require hundreds and even thousands of test...

What makes a good Test Data Manager?

What makes a good Test Data Manager?

19 NOVEMBER, 2020 by Michiel Mulders What Makes a Good Test Data Manager? Have you implemented test data management at your organization? It will surely benefit you if your organization processes critical or sensitive business data. The importance of test data is...

The Pros and Cons of Test Data Synthetics (or Data Fabrication)

The Pros and Cons of Test Data Synthetics (or Data Fabrication)

22 October, 2020 by Louay Hazami Data privacy is one of the most pressing issues in the new digital era. Data holds so much value for normal internet users and for all types of companies that are looking to capitalize on this new resource. To keep data anonymous and...