What Is Data Masking and How Do We Do It?
by Michiel Mulders
According to the 2019 IBM Data Breach report, the average data breach in 2019 cost 3.92 million USD. Businesses in certain industries, such as healthcare, suffer more substantial losses—6.45 million USD on average. As the amount of confidential data increases, so does the need to protect it. Data masking helps secure private data from malicious third parties wanting to abuse it. In this article, we’ll explain the process of data masking and why it’s crucial for protecting sensitive data.
What Is Data Masking?
Enterprises use data masking or data obfuscation to recognize and hide sensitive data. This sensitive data can vary from personal data such as phone numbers to intellectual property. There are several different ways in which data masking takes form. The general idea is that at the end of the process, the data will be safe. A concrete example would be a credit card number that has been scrambled or blurred.
Maybe you’ve already come across different types of data masking, such as static or dynamic masking. Static masking is what we call masking of data in stasis. Dynamic data masking is when the data masking happens on demand.
Often the database processes data whenever someone requests data from a production database. What happens is that database users connect to the database through a reverse proxy. This reverse proxy processes the requests and decides if it should mask the returned data or not. This depends on the permissions of the user. We’ll cover the different techniques used in this process more in detail.
Why Is Data Masking Important?
The amount of data that exists keeps growing each year. For every database in production, an organization often has several copies in circulation for various purposes. Though production databases are often well protected, the same is not always true for these copies. They often end up with third parties who use sensitive data for malicious purposes.
As the total amount of data increases, so does the risk of a data breach. It is near impossible to create a completely foolproof system around each copy of your databases. This is true especially when not everybody with access to the data has the technological literacy we’d hope. But, an enterprise can neutralize several factors that make data breaches so expensive. A good rule of thumb is that no active database should contain unmasked data.
The idea is to secure the data from abuse but leave it similar enough for a team of developers to still use it for valid tests. The goal of data masking is to reduce the impact of a possible data breach and improve data security. Data masking achieves this because no actual sensitive customer information links to the values.
What Are the Benefits of Data Masking?
Many organizations see the reduced cost of a potential data breach as a major benefit of data masking. The factors that make data breaches so expensive are third-party breaches and compliance failures.
Now imagine that these third parties have no access to the actual data, but only to masked instances of the data. It would take away the possibility for them to abuse it. And even though data breaches can still occur, there would be no compliance failure if you’ve masked the leaked data. In other words, the enterprise that masks its data will not incur the high fees that come with compliance failures.
Another scenario would be a disgruntled employee who wants to access records for malicious purposes. Or an employee who leaks data through incompetence or negligence. Masking data makes sure that leaked data is not destructive for the enterprise.
Often companies have a lot of security around the production database. But copies for databases such as backups or databases used for testing often end up with third parties who are left unaccountable for their abuse of the sensitive data. Through several techniques, data masking renders personal records and intellectual property less prone to abuse.
Data Masking Techniques for You to Try
There are several different techniques for masking sensitive data.
First off, you could use random substitution for your sensitive records. As the name suggests, random substitution can be achieved by replacing numbers with randomly selected numbers. The same applies to letters. Random substitution is useful for masking credit card numbers but also for dates, names, addresses, telephone numbers, and other types of personal records.
Another technique often used for data masking is algorithmic substitution. This technique is quite similar to the first technique we discussed. Random substitution is still used, but algorithmic substitution takes into account that certain patterns are still respected. For example, postal codes or telephone numbers could be masked, but the value can still be recognized as being a postal code or a telephone number. This can be useful because the data masked with algorithmic substitution could still be used for testing, since it is not completely randomized.
Data Obfuscation or Data Blurring
Organizations commonly use data blurring to hide sensitive data. In the context of masking sensitive data, blurring means that you add a random variance to the existing values. After this masking process, the data is still an approximation of its original value, but it is modified to such an extent that it is not considered subject to a data breach. An example of this would be to add a variation of 20% of salary values, making it a less accurate representation. For example, a salary of $100,000 might be obfuscated in a range between $80,000 and $120,000.
Instead of completely scrambling or blurring data values, you could use selective masking. This masks only a section of the sensitive data, thus making it harmless in data breaches. Randomizing a certain part of a telephone number or altering the domain name of an email address are examples of selective masking. Selective masking offers the same advantage as an algorithmic substitution. When you use either of these techniques, you can use data safely in testing environments because they don’t reduce data integrity.
A final technique for masking data is nulling. Nulling out or deleting data turns values into a null or empty value in the database. This method may seem quite crude, since it does reduce the data integrity. Still, nulling is a method where you can be certain that the sensitive data is safe from third parties. The obvious downside of this method is that the data will be irretrievable for everyone, thus making it useless for purposes of testing
You have the option to use the techniques we discussed in either a static or a dynamic manner. The difference between these two types of masking is explained here. Dynamic masking occurs in real-time where data is obfuscated on the spot. Alternatively, static masking makes a copy of the data to further apply one or more masking techniques. It’s important to keep in mind that the best strategy to use data masking differs depending on how you store and use your data.
Data Masking: The Right Thing to Do
Remember that data masking techniques can help you avoid the disastrous consequences of a data breach. Providing that extra layer of security to your database by blurring, nulling, or randomizing sensitive data is an absolute must. Using data masking should be a no-brainer for a company that handles databases, deals with personal records, or manages intellectual property.
These techniques could help you avoid the 3.92 million USD price tag on a data breach. This figure includes both lost revenue and fees for failing to comply with privacy laws. Plus, protecting your customers’ data is simply the right thing to do.
This post was written by Michiel Mulders. Michiel is a passionate blockchain developer who loves writing technical content. Besides that, he loves learning about marketing, UX psychology, and entrepreneurship. When he’s not writing, he’s probably enjoying a Belgian beer!
18 SEPTEMBER 2020 by Arnab Chowdhury Every aspect of our daily lives involves the usage of data. Be it our social media, banking account, or even while using an e-commerce site, we use data everywhere. This data may range from our names and contact information to our...
09 SEPTEMBER, 2020 by Michiel Mulders Do you want your company to scale efficiently? Look for an enterprise release manager (ERM). An ERM protects and manages the movements of releases in multiple environments. This includes build, test, and production environments....
13 JULY, 2020 by Eric Boersma Every project manager in the world shares a similar stress. They’re working on something important, and a key stakeholder sticks their head around the corner. They ask a small, innocent question. “When are we going to release that...
01 JULY, 2020 by Diego Gavilanes Ever since the dawn of time, test environments have been left for the end, which is a headache for the testing team. They might be ready to start testing but can’t because there’s no test environment. And often, the department in...
29 JUNE, 2020 by Carlos Schults In today’s post, we’ll discuss data literacy and its relevance in the context of GDPR. We start by defining data literacy and giving a brief overview of GDPR. Then we proceed to explain some of the challenges organizations might face...
23 June, 2020 by Arnab Roy Chowdhury In this digital era, online businesses have become mainstream. Consequently, online commerce has flourished—and led to loads and loads of data! Businesses need to build data centers to store information. Not only that, but if you...