Select Page

What Is Data Masking and How Do We Do It?

AUGUST, 2020by Michiel Mulders
According to the 2019 IBM Data Breach report, the average data breach in 2019 cost 3.92 million USD. Businesses in certain industries, such as healthcare, suffer more substantial losses—6.45 million USD on average. As the amount of confidential data increases, so does the need to protect it. Data masking helps secure private data from malicious third parties wanting to abuse it. In this article, we’ll explain the process of data masking and why it’s crucial for protecting sensitive data. 

What Is Data Masking?

Enterprises use data masking or data obfuscation to recognize and hide sensitive data. This sensitive data can vary from personal data such as phone numbers to intellectual property. There are several different ways in which data masking takes form. The general idea is that at the end of the process, the data will be safe. A concrete example would be a credit card number that has been scrambled or blurred.

Maybe you’ve already come across different types of data masking, such as static or dynamic masking. Static masking is what we call masking of data in stasis. Dynamic data masking is when the data masking happens on demand.

Often the database processes data whenever someone requests data from a production database. What happens is that database users connect to the database through a reverse proxy. This reverse proxy processes the requests and decides if it should mask the returned data or not. This depends on the permissions of the user. We’ll cover the different techniques used in this process more in detail.

Why Is Data Masking Important?

The amount of data that exists keeps growing each year. For every database in production, an organization often has several copies in circulation for various purposes. Though production databases are often well protected, the same is not always true for these copies. They often end up with third parties who use sensitive data for malicious purposes.

As the total amount of data increases, so does the risk of a data breach. It is near impossible to create a completely foolproof system around each copy of your databases. This is true especially when not everybody with access to the data has the technological literacy we’d hope. But, an enterprise can neutralize several factors that make data breaches so expensive. A good rule of thumb is that no active database should contain unmasked data.

The idea is to secure the data from abuse but leave it similar enough for a team of developers to still use it for valid tests. The goal of data masking is to reduce the impact of a possible data breach and improve data security. Data masking achieves this because no actual sensitive customer information links to the values.

What Are the Benefits of Data Masking?

Many organizations see the reduced cost of a potential data breach as a major benefit of data masking. The factors that make data breaches so expensive are third-party breaches and compliance failures.

Now imagine that these third parties have no access to the actual data, but only to masked instances of the data. It would take away the possibility for them to abuse it. And even though data breaches can still occur, there would be no compliance failure if you’ve masked the leaked data. In other words, the enterprise that masks its data will not incur the high fees that come with compliance failures.

Another scenario would be a disgruntled employee who wants to access records for malicious purposes. Or an employee who leaks data through incompetence or negligence. Masking data makes sure that leaked data is not destructive for the enterprise.

Often companies have a lot of security around the production database. But copies for databases such as backups or databases used for testing often end up with third parties who are left unaccountable for their abuse of the sensitive data. Through several techniques, data masking renders personal records and intellectual property less prone to abuse.

Data Masking Techniques for You to Try

There are several different techniques for masking sensitive data.

Random Substitution

First off, you could use random substitution for your sensitive records. As the name suggests, random substitution can be achieved by replacing numbers with randomly selected numbers. The same applies to letters. Random substitution is useful for masking credit card numbers but also for dates, names, addresses, telephone numbers, and other types of personal records.

Algorithmic Substitution

Another technique often used for data masking is algorithmic substitution. This technique is quite similar to the first technique we discussed. Random substitution is still used, but algorithmic substitution takes into account that certain patterns are still respected. For example, postal codes or telephone numbers could be masked, but the value can still be recognized as being a postal code or a telephone number. This can be useful because the data masked with algorithmic substitution could still be used for testing, since it is not completely randomized.

Data Obfuscation or Data Blurring

Organizations commonly use data blurring to hide sensitive data. In the context of masking sensitive data, blurring means that you add a random variance to the existing values. After this masking process, the data is still an approximation of its original value, but it is modified to such an extent that it is not considered subject to a data breach. An example of this would be to add a variation of 20% of salary values, making it a less accurate representation. For example, a salary of $100,000 might be obfuscated in a range between $80,000 and $120,000.

Selective Masking

Instead of completely scrambling or blurring data values, you could use selective masking. This masks only a section of the sensitive data, thus making it harmless in data breaches. Randomizing a certain part of a telephone number or altering the domain name of an email address are examples of selective masking. Selective masking offers the same advantage as an algorithmic substitution. When you use either of these techniques, you can use data safely in testing environments because they don’t reduce data integrity.

Nulling Technique

A final technique for masking data is nulling. Nulling out or deleting data turns values into a null or empty value in the database. This method may seem quite crude, since it does reduce the data integrity. Still, nulling is a method where you can be certain that the sensitive data is safe from third parties. The obvious downside of this method is that the data will be irretrievable for everyone, thus making it useless for purposes of testing

You have the option to use the techniques we discussed in either a static or a dynamic manner. The difference between these two types of masking is explained here. Dynamic masking occurs in real-time where data is obfuscated on the spot. Alternatively, static masking makes a copy of the data to further apply one or more masking techniques. It’s important to keep in mind that the best strategy to use data masking differs depending on how you store and use your data.

Data Masking: The Right Thing to Do

Remember that data masking techniques can help you avoid the disastrous consequences of a data breach. Providing that extra layer of security to your database by blurring, nulling, or randomizing sensitive data is an absolute must. Using data masking should be a no-brainer for a company that handles databases, deals with personal records, or manages intellectual property.

These techniques could help you avoid the 3.92 million USD price tag on a data breach. This figure includes both lost revenue and fees for failing to comply with privacy laws. Plus, protecting your customers’ data is simply the right thing to do.


Next Steps

Looking to address the needs of Test Data Management? Including Data Masking?

Why not ask us about DCS, our DataSec & DataOps solution.

A platform that uses automated intelligence to identify where data security exposures reside, rapidly remediate these risks without error (mask or encrypt) and centrally validate your compliance success. Solution also comes with IT delivery accelerators. Including DataView for Data Mining, Data Masking & a DataOps Library for automation.

Learn More or Share Ideas

If you’d like to learn more about Data, Release or Environment Management or perhaps just share your own ideas then feel free to contact the enov8 team. Enov8 provides a complete platform for addressing organisations “DevOps at Scale” requirements. Providing advanced “out of the box” Holistic Test Data Management, IT & Test Environment Management & Release Management capabilities.


Michiel MuldersThis post was written by Michiel Mulders. Michiel is a passionate blockchain developer who loves writing technical content. Besides that, he loves learning about marketing, UX psychology, and entrepreneurship. When he’s not writing, he’s probably enjoying a Belgian beer!

Relevant Articles

Environments: The ROI of TEM

16September, 2021 by Carlos SchultsLet me start with a question: as a leader in tech, are you satisfied with the budget you have? If I had to guess, I'd say the answer is no. Because of that, calculating the return on investment of the many activities in software...

Release: The Benefits of Deployment Planning

14AUGUST, 2021 by Ukpai UgochiIt is the goal of every software engineer and software development firm to continuously ship products to end users. This can only be achieved through software deployment.  In this post, we'll explore deployment and deployment planning,...

Containers – The Essentials

09SEPTEMBER, 2021 by Eric GoebelbeckerLet’s talk about container essentials. Over the past few years, containers have transitioned from the hottest new trend to essential IT architecture. But are they are good fit for you? Are you wondering whether or not you’re using...

Environments – Monoliths Versus Microservices

05AUGUST, 2021 by Alexander FridmanIn the beginning there was nothing. Then there was the monolith, though we used to simply call monoliths "software." Today we have two rival architectural types: monoliths and microservices. This post will explain what monoliths and...

What Is Your Attack Surface?

15JULY, 2021 by Justin ReynoldsCompanies go to great lengths to protect their physical environments, using deterrents like locks, fences, and cameras to ward off intruders. Yet this same logic doesn’t always translate to digital security. Corporate networks — which...

Data: What Is DevSecOps?

06JULY, 2021 by Justin ReynoldsCompanies today face increasing challenges around reducing the time and cost of software development. Many are thus using DevOps methodologies, which combine software development and IT operations to achieve continuous delivery and...