What is Data Masking? And Best Practice!
by Jane Temov
Most organizations employ strong security measures to keep production data secure while being made available for day-to-day business activity.
However, Data may be utilized for less secure activities like testing and training, or by third parties vendors not affiliated with the organization. This use of data for the “secondary purpose” might put the data at risk, resulting in regulatory breaches.
Masking data is a way of producing a replica of data that appears to be structurally identical to the original while concealing sensitive information like personal data. The version with the obfuscated information, the secure data, may then be utilized for a variety of applications, such as user training and software testing.
The primary goal of masking data is to develop a functional replacement that is both usable, and production-like, and will protect sensitive data.
The ability to mask data is important in many situations, such as when you want to access information, or functionality, while still keeping sensitive data private. Simply put, Data masking techniques employ the same data format to mimic the original data while altering important values.
In this Enov8 article, we will discuss:
- Data Masking Methods
- What Data Requires Data Masking?
- What Regulations Require Data to be Masked?
- The Types of Data Masking
- The Challenges of Data Masking
- Data Management Best Practices
Data Masking Methods
There are many methods for manipulating data, including character shuffling, word or character replacement, and encryption. Each technique has its own set of benefits.
However, when it comes to masking sensitive data the values must always be altered in some way that makes reverse engineering impossible, thus avoiding the opportunity for a data breach.
Here are some examples of supporting data security via data masking:
- Lookup Replacement. Replacing personally identifiable information like names with alternative “lookup” values. For example, replacing “James” with “William”.
- Data Values Deleting or “nulling out”. Deleting or Nulling sensitive values, for example, “Comments”.
Note: This is a useful method when the data isn’t considered to be of significant value for reuse. And/or the classified information is too “random” to be replaced with anything meaningful.
- Data Values Ciphering. Protect Ciphering the sensitive data, so that you can not easily read the data without access to the cipher method(s) employed.
Note: Ciphering sensitive data sources has its place, particularly when you want to keep the shape of the data and/or want to ensure uniqueness. However be warned, it can be reversed, so use it carefully.
- Data Encryption. Encrypting the sensitive data, so that you can not read the data without a decryption key. For example, using symmetric key encryption like AES.
Note: Although “hard to crack”, encrypting sensitive data has its downsides. Firstly it changes the shape of the data, For example, an AES string does not look like a real name.
Mary | 8cb2237d0679ca88db6464eac60da963
- Data Shuffling. Protect sensitive data by Scrambling, or Shuffling the data, so the new data no longer lines up with the original data. For example, shuffling Account Numbers, so that the customer names are now “misaligned” (or no longer reflect production reality).
- A Combination of Obfuscation Methods. Some data, dependent on data sensitivity and your security protocol, better suits certain data masking methods.
What Data Requires Data Masking?
How sensitive data is, and the need to mask data may depend on the end function, the type of data, and vulnerability to a data breach. Here are the most common data types that require data masking:
- Personally identifiable information (PII). The data that can be used to help identify individuals, for example, customers & employees. Good examples of PII include information like the first name, last name, address, passport number, driver’s license number, and social security number.
- Payment card information (PCI). Businesses that handle credit and debit card transactions must protect consumer information securely under the Payment Card Industry Data Security Standard (PCI DSS).
- Protected health information (PHI). The data that is collected by healthcare service providers to identify appropriate care. Examples include medical histories, health conditions, insurance information, demographic information, and test and laboratory results.
- Intellectual property (IP). Intellectual property (IP) is data related to creations of the mind, including inventions, business plans, designs, and specifications. Unsurprisingly, these assets are considered high-value by organizations and must be securely protected from unauthorized access and theft.
What Regulations Promote Data to be Masked?
Three notable examples of regulations that have been put in place by governments and by industry to protect personal data are HIPAA, GDPR, and PCI-DSS.
HIPAA. HIPAA is the Health Insurance Portability and Accountability Act, a law in the United States that sets national standards for the privacy and security of health information.
GDPR. The General Data Protection Regulation (GDPR) is a regulation in the European Union in the area of data protection. The GDPR was adopted on April 14, 2018, and came into force on May 25, 2018. The GDPR regulates the handling of personal data by controllers and processors within the European Union.
PCI-DSS. The Payment Card Industry Data Security Standard (PCI-DSS) is a set of security standards designed to protect cardholder data. PCI-DSS applies to all organizations that process, store or transmit credit card information. The standard is managed by the Payment Card Industry Security Standards Council (PCI SSC), an organization founded by major credit card companies.
The Types of Data Masking
Fundamentally, there are two data masking processes:
- Static data masking. Static Data Masking involves creating a duplicated database (or data set), with all or part of the data masked. This new “secured data” would be maintained separately from the production data, probably within a non-production environment.
Note: Static Data masking is the most common form of data masking and is particularly useful in the Software Development Lifecycle, where developers and testers need production-like fake data but don’t need to see sensitive personal data.
Note: In the modern world, where a lot of IT services are outsourced, it is very important to ensure your data in your Test Environments is fully desensitized. According to Gartner, 70% of Data Breaches are committed internally.
- Dynamic data masking. Dynamic data masking changes information, on the fly, in real-time, as it is accessed by the end-users. This technique is applied directly to production datasets. The technique is effective in preventing unauthorized users from viewing sensitive data by encrypting and hashing the data displayed.
Note: This method is typically more useful in the production world, where certain roles and responsibilities have different security ratings and access privileges. For example, a Doctor would have more access to the patient’s PII data than the Medical Receptionist.
What Are the Challenges of Data Masking?
Here are some of the key challenges involved in data masking:
- Data Discovery. Data by its nature is inherently complex, potentially thousands of tables & billions of data points on a single platform alone. The task of data discovery should not be underestimated & can not usually be done manually.
- Format preservation. When the masking system replaces the original data with fake data, it should preserve the original format and avoid data loss. That means the data masking solution understands the data it represents. The risk of not maintaining the original data format is the business logic could fail and the application stop working.
- Retain Referential integrity. Data inside an application is connected by relationships/dependent, for example, Primary & Foreign keys. When the masking solution changes the values, these values must be modified consistently across the dataset*.
Note*: This applies to cross-platform/multiple data sets also. For example, if Application 1 talks to Application 2 & 3, then ensure the masking rules are consistent across each.
- Value Range Integrity. Database constraints are often designed to restrict the range of values that may be entered, for example, the range of salaries. To preserve the data’s semantics, any masked data must fall within the specified range.
- Preserve Uniqueness. The masking system should apply distinct values to every sensitive data element when masking “unique data”. For example, If the table in question stores a Customer Account ID, then each customer should receive a unique Account ID after masking. There should be no duplicates.
Note: By doing this, the shape/distribution of the masked data should be retained and you avoid the potential for data contention & application breakage.
- Gender preservation. When replacing a person’s name in the database, the masking mechanism should be gender-aware and able to distinguish between male and female names.
Note: If the masking system changes names at random, the gender balance in a data set will be disrupted.
Data Masking Best Practices
Best practices data masking methods include:
- Data & Risk Discovery. Before you can safeguard your data, you must first understand what data you are dealing with and separate it into categories based on its sensitivity. A group of Subject Matter Experts, like A Data Protection Officer, Test Data Manager, Security, and Platform experts will typically work together to compile a comprehensive record of all data risks in an organization. However, this can be time-consuming.
Tip! The Enov8 Test Data Management “Profiling Module” uses AI to do this automatically.
- Data Obfuscation. Refer to “Data Masking Methods” above. The act of securing data so that sensitive risks, like customer information, are addressed without impacting the useability of the data.
Tip: The Enov8 Test Data Management “Masking Module” allows you to use the Profiling Information to generate “consistent” methods on the fly. Thus reducing months of engineering effort & potential for cross-platform mistakes.
- Data Compliance Validation. The ability to test/check that your obfuscation methods have been successful. An essential post securitization function that ensures mistakes were not made.
Tip: The Enov8 Test Data Management “Data Validation” module helps you scan areas of concern, looking for production smells i.e. indications that something has been missed.
Data Masking Conclusion
Data masking is a technique used to protect sensitive data. Obfuscating or replacing the original data with fake data, preserves the format and structure of the original while hiding the sensitive information. There are several challenges involved in data masking, but following best practices can help you overcome them. Data masking is an essential step in protecting your data and ensuring compliance with data privacy regulations.
Data Masking Next Steps
- Interested in identifying sensitive information inside real data?
- Interested in protecting sensitive data, your business information or trade secrets?
- Interested in helping testing teams by rapidly delivering test database(s) to your training or testing environment?
- Interested in implementing an Enterprise Data Masking Capability?
Why not ask for a demo of Enov8 Test Data Manager (aka Data Compliance Suite).
Enov8 Test Data Manager (aka Data Compliance Suite)
A Data Securitization and Test Data Management platform that helps you DevSecOps your Test Data & Privacy Risks. A platform that helps you identify where data sensitivity resides within a production database, rapidly remediate these risks to remove data sensitivity, avoid unauthorized disclosure and/or data breaches, and centrally validate your compliance success. The solution also comes with IT delivery accelerators to support Data DevOps (DataOps), data analysis, data mining, test data bookings, and ultimately accelerate your software delivery process.
03JUNE, 2022 by Niall Crawford & Carlos "Kami" Maldonado. Modified by Eric Goebelbecker.DevOps at scale is what we call the process of implementing DevOps culture at big, structured companies. Although the DevOps term was back in 2009, most organizations still...
Test Environment Management Explained3JUNE, 2022 by Erik Dietrich, Ukpai Ugochi, and Jane Temov. Modified by Eric GoebelbeckerMost companies spend between 45%-55% of their IT budget on non-production activities like Training, Development & Testing and lose 20-40%...
3JUNE, 2022 by Eric GoebelbeckerWhat Is Serverless Computing? Serverless computing is a cloud architecture where you don’t have to worry about buying, building, provisioning, or maintaining servers. In return for structuring your code around their APIs, your cloud...
25MAY, 2022 by Niall Crawford & Justin Reynolds. Modified by Eric Goebelbecker.So, you’ve decided to implement a Scaled Agile Framework (SAFe) and promote a continuous delivery pipeline by implementing “Agile Release Trains” (ART)*. Definition: An Agile Release...
24MAY, 2022 by Michiel Mulders. Modified by Eric Goebelbecker.With the cost of data breaches increasing every year, there’s a need for higher security standards. According to IBM’s 2021 security report, the average total cost of a data breach has risen to $4.24...
24MAY, 2022 by Keshav MalikWith the rise of agile development methodologies, the need to quickly test new features is more critical than ever. This is especially true for websites and applications that rely on real-time data and interaction. The only way to ensure...