Select Page

What Is Data Masking and How Do We Do It?

AUGUST, 2020by Michiel Mulders
According to the 2019 IBM Data Breach report, the average data breach in 2019 cost 3.92 million USD. Businesses in certain industries, such as healthcare, suffer more substantial losses—6.45 million USD on average. As the amount of confidential data increases, so does the need to protect it. Data masking helps secure private data from malicious third parties wanting to abuse it. In this article, we’ll explain the process of data masking and why it’s crucial for protecting sensitive data. 

What Is Data Masking?

Enterprises use data masking or data obfuscation to recognize and hide sensitive data. This sensitive data can vary from personal data such as phone numbers to intellectual property. There are several different ways in which data masking takes form. The general idea is that at the end of the process, the data will be safe. A concrete example would be a credit card number that has been scrambled or blurred.

Maybe you’ve already come across different types of data masking, such as static or dynamic masking. Static masking is what we call masking of data in stasis. Dynamic data masking is when the data masking happens on demand.

Often the database processes data whenever someone requests data from a production database. What happens is that database users connect to the database through a reverse proxy. This reverse proxy processes the requests and decides if it should mask the returned data or not. This depends on the permissions of the user. We’ll cover the different techniques used in this process more in detail.

Why Is Data Masking Important?

The amount of data that exists keeps growing each year. For every database in production, an organization often has several copies in circulation for various purposes. Though production databases are often well protected, the same is not always true for these copies. They often end up with third parties who use sensitive data for malicious purposes.

As the total amount of data increases, so does the risk of a data breach. It is near impossible to create a completely foolproof system around each copy of your databases. This is true especially when not everybody with access to the data has the technological literacy we’d hope. But, an enterprise can neutralize several factors that make data breaches so expensive. A good rule of thumb is that no active database should contain unmasked data.

The idea is to secure the data from abuse but leave it similar enough for a team of developers to still use it for valid tests. The goal of data masking is to reduce the impact of a possible data breach and improve data security. Data masking achieves this because no actual sensitive customer information links to the values.

What Are the Benefits of Data Masking?

Many organizations see the reduced cost of a potential data breach as a major benefit of data masking. The factors that make data breaches so expensive are third-party breaches and compliance failures.

Now imagine that these third parties have no access to the actual data, but only to masked instances of the data. It would take away the possibility for them to abuse it. And even though data breaches can still occur, there would be no compliance failure if you’ve masked the leaked data. In other words, the enterprise that masks its data will not incur the high fees that come with compliance failures.

Another scenario would be a disgruntled employee who wants to access records for malicious purposes. Or an employee who leaks data through incompetence or negligence. Masking data makes sure that leaked data is not destructive for the enterprise.

Often companies have a lot of security around the production database. But copies for databases such as backups or databases used for testing often end up with third parties who are left unaccountable for their abuse of the sensitive data. Through several techniques, data masking renders personal records and intellectual property less prone to abuse.

Data Masking Techniques for You to Try

There are several different techniques for masking sensitive data.

Random Substitution

First off, you could use random substitution for your sensitive records. As the name suggests, random substitution can be achieved by replacing numbers with randomly selected numbers. The same applies to letters. Random substitution is useful for masking credit card numbers but also for dates, names, addresses, telephone numbers, and other types of personal records.

Algorithmic Substitution

Another technique often used for data masking is algorithmic substitution. This technique is quite similar to the first technique we discussed. Random substitution is still used, but algorithmic substitution takes into account that certain patterns are still respected. For example, postal codes or telephone numbers could be masked, but the value can still be recognized as being a postal code or a telephone number. This can be useful because the data masked with algorithmic substitution could still be used for testing, since it is not completely randomized.

Data Obfuscation or Data Blurring

Organizations commonly use data blurring to hide sensitive data. In the context of masking sensitive data, blurring means that you add a random variance to the existing values. After this masking process, the data is still an approximation of its original value, but it is modified to such an extent that it is not considered subject to a data breach. An example of this would be to add a variation of 20% of salary values, making it a less accurate representation. For example, a salary of $100,000 might be obfuscated in a range between $80,000 and $120,000.

Selective Masking

Instead of completely scrambling or blurring data values, you could use selective masking. This masks only a section of the sensitive data, thus making it harmless in data breaches. Randomizing a certain part of a telephone number or altering the domain name of an email address are examples of selective masking. Selective masking offers the same advantage as an algorithmic substitution. When you use either of these techniques, you can use data safely in testing environments because they don’t reduce data integrity.

Nulling Technique

A final technique for masking data is nulling. Nulling out or deleting data turns values into a null or empty value in the database. This method may seem quite crude, since it does reduce the data integrity. Still, nulling is a method where you can be certain that the sensitive data is safe from third parties. The obvious downside of this method is that the data will be irretrievable for everyone, thus making it useless for purposes of testing

You have the option to use the techniques we discussed in either a static or a dynamic manner. The difference between these two types of masking is explained here. Dynamic masking occurs in real-time where data is obfuscated on the spot. Alternatively, static masking makes a copy of the data to further apply one or more masking techniques. It’s important to keep in mind that the best strategy to use data masking differs depending on how you store and use your data.

Data Masking: The Right Thing to Do

Remember that data masking techniques can help you avoid the disastrous consequences of a data breach. Providing that extra layer of security to your database by blurring, nulling, or randomizing sensitive data is an absolute must. Using data masking should be a no-brainer for a company that handles databases, deals with personal records, or manages intellectual property.

These techniques could help you avoid the 3.92 million USD price tag on a data breach. This figure includes both lost revenue and fees for failing to comply with privacy laws. Plus, protecting your customers’ data is simply the right thing to do.


Next Steps

Looking to address the needs of Test Data Management? Including Data Masking?

Why not ask us about DCS, our DataSec & DataOps solution.

A platform that uses automated intelligence to identify where data security exposures reside, rapidly remediate these risks without error (mask or encrypt) and centrally validate your compliance success. Solution also comes with IT delivery accelerators. Including DataView for Data Mining, Data Masking & a DataOps Library for automation.

Learn More or Share Ideas

If you’d like to learn more about Data, Release or Environment Management or perhaps just share your own ideas then feel free to contact the enov8 team. Enov8 provides a complete platform for addressing organisations “DevOps at Scale” requirements. Providing advanced “out of the box” Holistic Test Data Management, IT & Test Environment Management & Release Management capabilities.


Michiel MuldersThis post was written by Michiel Mulders. Michiel is a passionate blockchain developer who loves writing technical content. Besides that, he loves learning about marketing, UX psychology, and entrepreneurship. When he’s not writing, he’s probably enjoying a Belgian beer!

Relevant Articles

How to Manage Test Data in Software Testing

20DECEMBER, 2021 by Justin Reynolds.How to Manage Test Data in Software Testing. To compete in today’s market, software companies need to create programs that are free of bugs and vulnerabilities.  In order to accomplish this, they first need to create test data...

Test Data Management In Depth: The What and the How

09DECEMBER, 2021 by Justin Reynolds.When it comes down to it, test data is one of the most important components of software development. That’s because test data makes it possible to create applications that align with the exact needs and expectations of today’s...


06DECEMBER, 2021 by Carlos Schults.Today we're here to talk about data regulations and data compliance solutions. Why does all of this matter? HIPAA, GDPR & PCI what is the difference? When it comes to online applications, protecting your users' data is one of...

How to Value Stream DataOps?

24NOVEMBER, 2021 by Daniel PaesEnhancements on data ingestion made evident the amount of data lost when generating insights. However, without guidance from methodologies like The DataOps Manifesto, some companies are still struggling to blend data pipelines from...

HIPAA, GDPR & PCI DSS. Same, Same but Different.

19NOVEMBER, 2021 by Justin ReynoldsOrganizations today are using more data than ever before. Indeed, data is playing a critical role in decision-making for everything from sales and marketing to the production and development of new products and services.  There’s no...