What is Data Masking? And Best Practice!
20
MAY, 2022
by Jane Temov.
Most organizations employ strong security measures to keep production data secure while being made available for day-to-day business activity.
However, Data may be utilized for less secure activities like testing and training, or by third parties vendors not affiliated with the organization. This use of data for the “secondary purpose” might put the data at risk, resulting in regulatory breaches.
Masking data is a way of producing a replica of data that appears to be structurally identical to the original while concealing sensitive information like personal data. The version with the obfuscated information, the secure data, may then be utilized for a variety of applications, such as user training and software testing.
The primary goal of masking data is to develop a functional replacement that is both usable, and production-like, and will protect sensitive data.
The ability to mask data is important in many situations, such as when you want to access information, or functionality, while still keeping sensitive data private. Simply put, Data masking techniques employ the same data format to mimic the original data while altering important values.
In this Enov8 article, we will discuss:
- Data Masking Methods
- What Data Requires Data Masking?
- What Regulations Require Data to be Masked?
- The Types of Data Masking
- The Challenges of Data Masking
- Data Management Best Practices

Data Masking Methods
There are many methods for manipulating data, including character shuffling, word or character replacement, and encryption. Each technique has its own set of benefits.
However, when it comes to masking sensitive data the values must always be altered in some way that makes reverse engineering impossible, thus avoiding the opportunity for a data breach.
Here are some examples of supporting data security via data masking:
- Lookup Replacement. Replacing personally identifiable information like names with alternative “lookup” values. For example, replacing “James” with “William”.
- Data Values Deleting or “nulling out”. Deleting or Nulling sensitive values, for example, “Comments”.
Note: This is a useful method when the data isn’t considered to be of significant value for reuse. And/or the classified information is too “random” to be replaced with anything meaningful.
- Data Values Ciphering. Protect Ciphering the sensitive data, so that you can not easily read the data without access to the cipher method(s) employed.
Note: Ciphering sensitive data sources has its place, particularly when you want to keep the shape of the data and/or want to ensure uniqueness. However be warned, it can be reversed, so use it carefully.
- Data Encryption. Encrypting the sensitive data, so that you can not read the data without a decryption key. For example, using symmetric key encryption like AES.
Note: Although “hard to crack”, encrypting sensitive data has its downsides. Firstly it changes the shape of the data, For example, an AES string does not look like a real name.
Mary | 8cb2237d0679ca88db6464eac60da963
- Data Shuffling. Protect sensitive data by Scrambling, or Shuffling the data, so the new data no longer lines up with the original data. For example, shuffling Account Numbers, so that the customer names are now “misaligned” (or no longer reflect production reality).
And/Or
- A Combination of Obfuscation Methods. Some data, dependent on data sensitivity and your security protocol, better suits certain data masking methods.
![]()
What Data Requires Data Masking?
How sensitive data is, and the need to mask data may depend on the end function, the type of data, and vulnerability to a data breach. Here are the most common data types that require data masking:
- Personally identifiable information (PII). The data that can be used to help identify individuals, for example, customers & employees. Good examples of PII include information like the first name, last name, address, passport number, driver’s license number, and social security number.
- Payment card information (PCI). Businesses that handle credit and debit card transactions must protect consumer information securely under the Payment Card Industry Data Security Standard (PCI DSS).
- Protected health information (PHI). The data that is collected by healthcare service providers to identify appropriate care. Examples include medical histories, health conditions, insurance information, demographic information, and test and laboratory results.
- Intellectual property (IP). Intellectual property (IP) is data related to creations of the mind, including inventions, business plans, designs, and specifications. Unsurprisingly, these assets are considered high-value by organizations and must be securely protected from unauthorized access and theft.
![]()
What Regulations Promote Data to be Masked?
Three notable examples of regulations that have been put in place by governments and by industry to protect personal data are HIPAA, GDPR, and PCI-DSS.
HIPAA. HIPAA is the Health Insurance Portability and Accountability Act, a law in the United States that sets national standards for the privacy and security of health information.
GDPR. The General Data Protection Regulation (GDPR) is a regulation in the European Union in the area of data protection. The GDPR was adopted on April 14, 2018, and came into force on May 25, 2018. The GDPR regulates the handling of personal data by controllers and processors within the European Union.
PCI-DSS. The Payment Card Industry Data Security Standard (PCI-DSS) is a set of security standards designed to protect cardholder data. PCI-DSS applies to all organizations that process, store or transmit credit card information. The standard is managed by the Payment Card Industry Security Standards Council (PCI SSC), an organization founded by major credit card companies.
![]()
The Types of Data Masking
Fundamentally, there are two data masking processes:
- Static data masking. Static Data Masking involves creating a duplicated database (or data set), with all or part of the data masked. This new “secured data” would be maintained separately from the production data, probably within a non-production environment.
Note: Static Data masking is the most common form of data masking and is particularly useful in the Software Development Lifecycle, where developers and testers need production-like fake data but don’t need to see sensitive personal data.
Note: In the modern world, where a lot of IT services are outsourced, it is very important to ensure your data in your Test Environments is fully desensitized. According to Gartner, 70% of Data Breaches are committed internally.
- Dynamic data masking. Dynamic data masking changes information, on the fly, in real-time, as it is accessed by the end-users. This technique is applied directly to production datasets. The technique is effective in preventing unauthorized users from viewing sensitive data by encrypting and hashing the data displayed.
Note: This method is typically more useful in the production world, where certain roles and responsibilities have different security ratings and access privileges. For example, a Doctor would have more access to the patient’s PII data than the Medical Receptionist.
![]()
What Are the Challenges of Data Masking?
Here are some of the key challenges involved in data masking:
- Data Discovery. Data by its nature is inherently complex, potentially thousands of tables & billions of data points on a single platform alone. The task of data discovery should not be underestimated & can not usually be done manually.
- Format preservation. When the masking system replaces the original data with fake data, it should preserve the original format and avoid data loss. That means the data masking solution understands the data it represents. The risk of not maintaining the original data format is the business logic could fail and the application stop working.
- Retain Referential integrity. Data inside an application is connected by relationships/dependent, for example, Primary & Foreign keys. When the masking solution changes the values, these values must be modified consistently across the dataset*.
Note*: This applies to cross-platform/multiple data sets also. For example, if Application 1 talks to Application 2 & 3, then ensure the masking rules are consistent across each.
- Value Range Integrity. Database constraints are often designed to restrict the range of values that may be entered, for example, the range of salaries. To preserve the data’s semantics, any masked data must fall within the specified range.
- Preserve Uniqueness. The masking system should apply distinct values to every sensitive data element when masking “unique data”. For example, If the table in question stores a Customer Account ID, then each customer should receive a unique Account ID after masking. There should be no duplicates.
Note: By doing this, the shape/distribution of the masked data should be retained and you avoid the potential for data contention & application breakage.
- Gender preservation. When replacing a person’s name in the database, the masking mechanism should be gender-aware and able to distinguish between male and female names.
Note: If the masking system changes names at random, the gender balance in a data set will be disrupted.
![]()
Data Masking Best Practices
Best practices data masking methods include:
- Data & Risk Discovery. Before you can safeguard your data, you must first understand what data you are dealing with and separate it into categories based on its sensitivity. A group of Subject Matter Experts, like A Data Protection Officer, Test Data Manager, Security, and Platform experts will typically work together to compile a comprehensive record of all data risks in an organization. However, this can be time-consuming.
Tip! The Enov8 Test Data Management “Profiling Module” uses AI to do this automatically.
- Data Obfuscation. Refer to “Data Masking Methods” above. The act of securing data so that sensitive risks, like customer information, are addressed without impacting the useability of the data.
Tip: The Enov8 Test Data Management “Masking Module” allows you to use the Profiling Information to generate “consistent” methods on the fly. Thus reducing months of engineering effort & potential for cross-platform mistakes.
- Data Compliance Validation. The ability to test/check that your obfuscation methods have been successful. An essential post securitization function that ensures mistakes were not made.
Tip: The Enov8 Test Data Management “Data Validation” module helps you scan areas of concern, looking for production smells i.e. indications that something has been missed.
![]()
Data Masking Conclusion
Data masking is a technique used to protect sensitive data. Obfuscating or replacing the original data with fake data, preserves the format and structure of the original while hiding the sensitive information. There are several challenges involved in data masking, but following best practices can help you overcome them. Data masking is an essential step in protecting your data and ensuring compliance with data privacy regulations.
Data Masking Next Steps
- Interested in identifying sensitive information inside real data?
- Interested in protecting sensitive data, your business information or trade secrets?
- Interested in helping testing teams by rapidly delivering test database(s) to your training or testing environment?
- Interested in implementing an Enterprise Data Masking Capability?
Why not ask for a demo of Enov8 Test Data Manager (aka Data Compliance Suite).
![]()
Enov8 Test Data Manager (aka Data Compliance Suite)
A Data Securitization and Test Data Management platform that helps you DevSecOps your Test Data & Privacy Risks. A platform that helps you identify where data sensitivity resides within a production database, rapidly remediate these risks to remove data sensitivity, avoid unauthorized disclosure and/or data breaches, and centrally validate your compliance success. The solution also comes with IT delivery accelerators to support Data DevOps (DataOps), data analysis, data mining, test data bookings, and ultimately accelerate your software delivery process.
Post Author
This post was written by Jane Temov. Jane is a Environment, Release & DataOps Evangelist working at Enov8 in Sydney Australia.
Relevant Articles
An Introductory Guide to Guidewire Data Masking
Testing is an essential part of maintaining a healthy Guidewire environment. But because Guidewire applications handle large volumes of personally identifiable information (PII), simply copying production data for testing or training isn’t an option. This is where...
Types of Test Data: 4 to Use for Your Software Tests
Testing is an integral and vital part of creating software. In fact, test code is as important as your production code. When you create test code, you need to create test data for your code to work against. This post is about the different types of test...
SAFe Release Management in the Enterprise
In the world of enterprise software, release management is a crucial process that ensures the successful planning, execution, and monitoring of software releases. As the name suggests, release managers are responsible for coordinating various stakeholders, including...
9 Data Masking Tools to Ensure Data Privacy
As organizations collect, process, and replicate data across more systems than ever before, the risk of exposure increases dramatically. Sensitive information that’s safely stored in production databases often becomes vulnerable when copied into test, training, or...
DevSecOps vs Cybersecurity: Understanding the Relationship
Both DevSecOps and cybersecurity are gaining a lot of interest and demand in the IT industry. With everything going digital, security has become one of the main focuses of every organization. And DevSecOps and cybersecurity are the supreme practices to achieve high...
What is Test Data? Understanding Its Role in Testing
Test data is the lifeblood of testing – it’s what enables us to evaluate the quality of software applications across various industries such as healthcare, insurance, finance, government, and corporate organizations. And, reminiscent of actual lifeblood, testing would...








