What Is Data Masking and How Do We Do It?
What Is Data Masking?
Enterprises use data masking or data obfuscation to recognize and hide sensitive data. This sensitive data can vary from personal data such as phone numbers to intellectual property. There are several different ways in which data masking takes form. The general idea is that at the end of the process, the data will be safe. A concrete example would be a credit card number that has been scrambled or blurred.
Maybe you’ve already come across different types of data masking, such as static or dynamic masking. Static masking is what we call masking of data in stasis. Dynamic data masking is when the data masking happens on demand.
Often the database processes data whenever someone requests data from a production database. What happens is that database users connect to the database through a reverse proxy. This reverse proxy processes the requests and decides if it should mask the returned data or not. This depends on the permissions of the user. We’ll cover the different techniques used in this process more in detail.
Why Is Data Masking Important?
The amount of data that exists keeps growing each year. For every database in production, an organization often has several copies in circulation for various purposes. Though production databases are often well protected, the same is not always true for these copies. They often end up with third parties who use sensitive data for malicious purposes.
As the total amount of data increases, so does the risk of a data breach. It is near impossible to create a completely foolproof system around each copy of your databases. This is true especially when not everybody with access to the data has the technological literacy we’d hope. But, an enterprise can neutralize several factors that make data breaches so expensive. A good rule of thumb is that no active database should contain unmasked data.
The idea is to secure the data from abuse but leave it similar enough for a team of developers to still use it for valid tests. The goal of data masking is to reduce the impact of a possible data breach and improve data security. Data masking achieves this because no actual sensitive customer information links to the values.
What Are the Benefits of Data Masking?
Many organizations see the reduced cost of a potential data breach as a major benefit of data masking. The factors that make data breaches so expensive are third-party breaches and compliance failures.
Now imagine that these third parties have no access to the actual data, but only to masked instances of the data. It would take away the possibility for them to abuse it. And even though data breaches can still occur, there would be no compliance failure if you’ve masked the leaked data. In other words, the enterprise that masks its data will not incur the high fees that come with compliance failures.
Another scenario would be a disgruntled employee who wants to access records for malicious purposes. Or an employee who leaks data through incompetence or negligence. Masking data makes sure that leaked data is not destructive for the enterprise.
Often companies have a lot of security around the production database. But copies for databases such as backups or databases used for testing often end up with third parties who are left unaccountable for their abuse of the sensitive data. Through several techniques, data masking renders personal records and intellectual property less prone to abuse.
Data Masking Techniques for You to Try
There are several different techniques for masking sensitive data.
First off, you could use random substitution for your sensitive records. As the name suggests, random substitution can be achieved by replacing numbers with randomly selected numbers. The same applies to letters. Random substitution is useful for masking credit card numbers but also for dates, names, addresses, telephone numbers, and other types of personal records.
Another technique often used for data masking is algorithmic substitution. This technique is quite similar to the first technique we discussed. Random substitution is still used, but algorithmic substitution takes into account that certain patterns are still respected. For example, postal codes or telephone numbers could be masked, but the value can still be recognized as being a postal code or a telephone number. This can be useful because the data masked with algorithmic substitution could still be used for testing, since it is not completely randomized.
Data Obfuscation or Data Blurring
Organizations commonly use data blurring to hide sensitive data. In the context of masking sensitive data, blurring means that you add a random variance to the existing values. After this masking process, the data is still an approximation of its original value, but it is modified to such an extent that it is not considered subject to a data breach. An example of this would be to add a variation of 20% of salary values, making it a less accurate representation. For example, a salary of $100,000 might be obfuscated in a range between $80,000 and $120,000.
Instead of completely scrambling or blurring data values, you could use selective masking. This masks only a section of the sensitive data, thus making it harmless in data breaches. Randomizing a certain part of a telephone number or altering the domain name of an email address are examples of selective masking. Selective masking offers the same advantage as an algorithmic substitution. When you use either of these techniques, you can use data safely in testing environments because they don’t reduce data integrity.
A final technique for masking data is nulling. Nulling out or deleting data turns values into a null or empty value in the database. This method may seem quite crude, since it does reduce the data integrity. Still, nulling is a method where you can be certain that the sensitive data is safe from third parties. The obvious downside of this method is that the data will be irretrievable for everyone, thus making it useless for purposes of testing
You have the option to use the techniques we discussed in either a static or a dynamic manner. The difference between these two types of masking is explained here. Dynamic masking occurs in real-time where data is obfuscated on the spot. Alternatively, static masking makes a copy of the data to further apply one or more masking techniques. It’s important to keep in mind that the best strategy to use data masking differs depending on how you store and use your data.
Data Masking: The Right Thing to Do
Remember that data masking techniques can help you avoid the disastrous consequences of a data breach. Providing that extra layer of security to your database by blurring, nulling, or randomizing sensitive data is an absolute must. Using data masking should be a no-brainer for a company that handles databases, deals with personal records, or manages intellectual property.
These techniques could help you avoid the 3.92 million USD price tag on a data breach. This figure includes both lost revenue and fees for failing to comply with privacy laws. Plus, protecting your customers’ data is simply the right thing to do.
Looking to address the needs of Test Data Management? Including Data Masking?
Why not ask us about DCS, our DataSec & DataOps solution.
A platform that uses automated intelligence to identify where data security exposures reside, rapidly remediate these risks without error (mask or encrypt) and centrally validate your compliance success. Solution also comes with IT delivery accelerators. Including DataView for Data Mining, Data Masking & a DataOps Library for automation.
Learn More or Share Ideas
If you’d like to learn more about Data, Release or Environment Management or perhaps just share your own ideas then feel free to contact the enov8 team. Enov8 provides a complete platform for addressing organisations “DevOps at Scale” requirements. Providing advanced “out of the box” Holistic Test Data Management, IT & Test Environment Management & Release Management capabilities.
20DECEMBER, 2021 by Justin Reynolds.How to Manage Test Data in Software Testing. To compete in today’s market, software companies need to create programs that are free of bugs and vulnerabilities. In order to accomplish this, they first need to create test data...
09DECEMBER, 2021 by Justin Reynolds.When it comes down to it, test data is one of the most important components of software development. That’s because test data makes it possible to create applications that align with the exact needs and expectations of today’s...
06DECEMBER, 2021 by Carlos Schults.Today we're here to talk about data regulations and data compliance solutions. Why does all of this matter? HIPAA, GDPR & PCI what is the difference? When it comes to online applications, protecting your users' data is one of...
29NOVEMBER, 2021 by Justin ReynoldsCompanies today are collecting more data than ever and using analytics to influence everything from sales and marketing to research and development. In fact, data is now one of the most valuable assets that a company can own. Yet...
24NOVEMBER, 2021 by Daniel PaesEnhancements on data ingestion made evident the amount of data lost when generating insights. However, without guidance from methodologies like The DataOps Manifesto, some companies are still struggling to blend data pipelines from...
19NOVEMBER, 2021 by Justin ReynoldsOrganizations today are using more data than ever before. Indeed, data is playing a critical role in decision-making for everything from sales and marketing to the production and development of new products and services. There’s no...