Select Page

Securing Data by Masking

JANUARY, 2020by Michiel Mulders
With the cost of data breaches increasing every year, there’s a huge need for higher security standards. According to IBM’s 2019 security report, the average total cost of a data breach has risen to $3.92 million per breach.It’s no wonder why we need more advanced techniques to protect sensitive data. One of those techniques is data masking.In this post, I’ll talk you through the pros and cons of data masking and show you some of the techniques you can use to mask data.Let’s start with a detailed introduction to data masking. 

What Is Data Masking?

Data masking can be applied to any kind of application. However, most often it’s enterprise applications that deal with sensitive data. Data masking, also referred to as data obfuscation, is used to hide data.

Generally, masking data entails converting data into random characters or other formats. This makes leaked data useless to unauthorized users. However, a user with the right permissions or authorization is able to see the unmasked data.

Put simply, data masking is a very simple technique for hiding data from unauthorized users and protecting a company’s sensitive data.

Who Uses Data Masking?

In general, we can say that every enterprise that handles personal or private information should consider implementing data masking. This is especially true for larger organizations, as they often deal with large amounts of data. It’s not always easy for a larger organization to get a full view on their data. Therefore, these companies should handle their data with extreme care.

In addition to this, some companies have to deal with GDPR regulations. Let’s take a closer look at what that entails.

GDPR and Data Masking

As of the 2018 General Data Protection Regulation (GDPR) act, European companies have to comply with new requirements. This means that many businesses are looking for effective ways of protecting their sensitive data.

The most common types of information enterprises want to mask include:

  • Personally identifiable information (PII) like name, address, sex, etc.
  • Protected health records
  • Transaction history and card or payment information
  • Intellectual property

All these types of data need to be handled with care, as they are subject to GDPR. Companies dealing with these types of data should look for techniques to protect their sensitive data.

Next, let’s dive further into the need for data masking.

Why Do We Need Data Masking?

We already learned about the new GDPR requirements that were introduced in 2018. However, there are many more reasons why you want to mask your data.

  1. Protect business data from third-party vendors. Often, businesses have to share data with third-party companies like suppliers, marketing teams, or consultants. When handing over this data, the business basically loses control over data that should remain confidential. Therefore, data masking can be applied when sharing data with third-party vendors to make sure only the vendors can access the data.
  2. Safeguard against human error. Human error often lies at the source of a data leak. For example, an operator might turn off the firewall for a database. A small mistake like this one can cause a data leak. A business can safeguard themselves against such data leaks by masking their data. Then, if an attacker gets unauthorized access to the database, the data is obfuscated and useless to them.
  3. Assess data usage. Not all operations require real data. For example, testing of an application can be easily executed with randomly generated data. I even recommend to use randomly generated data during testing, as an application that’s still in the testing phase might leak sensitive data.

If you want to learn more about data masking, read about Enov8’s data compliance case study.

Data Masking Techniques

There are many data masking techniques that you can use. However, first start by assessing what data you’re using and if you’re returning the minimal required data.

Let’s say you need only the address and age of a certain user in your application. As you’re querying for the whole user object, you might just return the whole data object. But from a security standpoint, we want to return only the necessary information to reduce the risk of leaking too much information. Therefore, we can apply a view that returns only the address and age of this user.

Now that we know to limit the data we return to only the data we need, let’s explore five different data masking techniques.

1. Substitution

The substitution technique refers to substituting data with similar values. Substitution is an effective technique to replace production data with realistic data.

For example, let’s say we’re returning a user object with a name and address. In order to mask the data, we might want to substitute the user’s real name with a fake (but realistic-looking) name. This ensures the combination of name and address cannot identify a person.

2. Shuffling

Next, data shuffling refers to mixing data. However, we want to make sure we retain logical relationships between data columns in the database. Shuffling is a more advanced technique that masks data while making sure we have realistic relationships between the data.

Let’s say we have customer objects in the database that are linked with purchases. We want to retain this link between the tables for customers and their purchases. Therefore, we use shuffling to mix the values for the first and last name of the customer with another customer in the table.

Again, this technique allows you to safely use production data in a test environment.

3. Blurring

Another data masking technique is blurring, which is often used for indirect data identifiers like age.

Thousands of people have the same age. But when enough data points are available, a malicious person might be able to figure out which data points belong to which person. This would mean that the unauthorized person can still identify a user.

Therefore, we can use the blurring technique. Blurring basically anonymizes data points.

For example, let’s say your application uses the age of users. We can apply a numeric blurring function that creates random noise within a specified range of, for example, realistic-looking ages to populate the age field.

4. Credit Card Masking

Credit card masking is more tricky, as valid credit cards can be verified by using a checksum. The final digit of a credit card holds this checksum number. Therefore, we have to pay attention when masking credit cards with random numbers, as we don’t want the credit card validation to fail for our masked data. Many tools exist that can generate new credit card numbers with a valid checksum.

5. Nullification Masking

Finally, nullification masking replaces a column of data with a null value. This technique is only used for hiding highly sensitive data that cannot be mixed or blurred. By applying the nullification technique, you make sure it’s impossible to discover the original value based on the null value.

As a final masking technique, I want to introduce you to dynamic masking.

Dynamic Data Masking vs. Traditional Data Masking

Dynamic data masking is used in real-time environments where data doesn’t leave the production database. This means that we have a higher level of security for our production data.

With dynamic masking, only authorized users can view the authentic data. However, for unauthorized users, the data is scrambled on the spot, returning inauthentic data. It’s a very performant technique for data masking that protects the production database.

In contrast, traditional data masking doesn’t use such a dynamic layer that can mask the data. With a traditional approach, you make a copy of the production database and decide upon a data masking technique to be applied to the production data. When the technique has been applied, we can safely use the data for testing in our testing environment.

Get Started With Data Masking

Before you get started with data masking, assess what data you are returning. Always make sure to return the minimum required data.

Many techniques exist for masking data. If you want to use production data in your test environment, first assess the type of data you are handling. Based on that, you can choose the right data masking technique for your needs.

The easiest way to get started with data masking is the substitution technique. It allows you to simply switch data with other records making it much harder to identify or link with other records to restore the original record. But personally, I like the shuffling technique, as it allows you to retain logical relationships in your database. However, shuffling is a more advanced method for masking data.

I’ll leave you with this: If you’re working with highly sensitive data, I always recommend using the nullification method. The nullification technique ensures that no sensitive data can be exposed.


Test Data Next Steps

Looking to address the needs of Test Data Management?

Why not ask us about DCS, our DataSec & DataOps solution.

A platform that uses automated intelligence to identify where data security exposures reside, rapidly remediate these risks without error (mask or encrypt) and centrally validate your compliance success. Solution also comes with IT delivery accelerators. Including DataView for Data Mining & a DataOps Library for automation.

Innovate with Enov8, the IT Environment & Data Company.

Specializing in the Governance, Operation & Orchestration of your IT systems and data.

Delivering outcomes like:

  • Improved visibility of your IT Fabric,
  • Streamlined Delivery of IT Projects,
  • Operational Standardization,
  • Security & Availability,
  • DevOps / DataOps Automation,
  • Real-Time insights supporting decision making & continuous optimization.

Our Key solutions include:


Michiel MuldersMichiel is a passionate blockchain developer who loves writing technical content. Besides that, he loves learning about marketing, UX psychology, and entrepreneurship. When he’s not writing, he’s probably enjoying a Belgian beer!

Relevant Articles

How to Manage Test Data in Software Testing

20DECEMBER, 2021 by Justin Reynolds.How to Manage Test Data in Software Testing. To compete in today’s market, software companies need to create programs that are free of bugs and vulnerabilities.  In order to accomplish this, they first need to create test data...

Test Data Management In Depth: The What and the How

09DECEMBER, 2021 by Justin Reynolds.When it comes down to it, test data is one of the most important components of software development. That’s because test data makes it possible to create applications that align with the exact needs and expectations of today’s...


06DECEMBER, 2021 by Carlos Schults.Today we're here to talk about data regulations and data compliance solutions. Why does all of this matter? HIPAA, GDPR & PCI what is the difference? When it comes to online applications, protecting your users' data is one of...

How to Value Stream DataOps?

24NOVEMBER, 2021 by Daniel PaesEnhancements on data ingestion made evident the amount of data lost when generating insights. However, without guidance from methodologies like The DataOps Manifesto, some companies are still struggling to blend data pipelines from...

HIPAA, GDPR & PCI DSS. Same, Same but Different.

19NOVEMBER, 2021 by Justin ReynoldsOrganizations today are using more data than ever before. Indeed, data is playing a critical role in decision-making for everything from sales and marketing to the production and development of new products and services.  There’s no...