Learn More About Data Masking Software
What is Data Masking Software?
Data masking is a technique used by organizations to protect sensitive data from unintended exposure. Data masking is also sometimes referred to as data obfuscation. There are a number of masking techniques, including data substitution, data shuffling, translating numbers into number ranges, nulling or deletion, character scrambling, and more. Companies use data masking software to shield sensitive data such as personally identifiable information (PII) or sensitive customer information while maintaining the data’s functional value.
Data masking software ensures that unauthorized people do not have visibility into real, sensitive data records by masking the data. Companies commonly utilize data masking to limit the sensitive data visible to their employees. This protects against both employee mistakes in handling sensitive data and malicious insider threat actors seeking to steal sensitive information.
For example, credit card numbers in a database can be redacted or replaced with false data in a billing software application so that the real numbers are not exposed and visible to frontline employees. A masked credit card number would be structurally similar and maintain the sixteen-digit credit card number format of "xxxx-xxxx-xxxx-xxxx" that the company’s billing software application expects this data to be in, while not providing the actual credit card number.
A common use case for data masking is providing non-production but realistic data for software development and testing. Applications must be developed and tested using real data to ensure the software meets the company or customer’s needs, but providing sensitive data to a development team exposes the data to people who don’t need to be authorized to view it. For example, if an educational software company is developing a solution to manage student testing data, having specific individuals’ testing information like real names, addresses, test scores, academic grades, and so on is not necessary to develop the tool. Having data based on real data but scrambled or obfuscated is sufficient to test the software. As long as the data is functionally correct, the software developers don’t need to know the precise, sensitive data to develop and test the software solution.
Data masking is most often for non-production purposes like software development and testing mentioned above, but it can also be used in production environments to control which users have access to sensitive information. For example, employees in a call center may need to look up a customer’s account information in a CRM software to process a payment but do not need access to the customer’s exact payment details like bank account and routing numbers to complete the transaction. The company must retain the actual bank account information to process the transaction, but this sensitive information does not need to be visible to the call center employee, so the company masks that data for the call center employee in their call center software application.
Other use cases for using masked data include:
- Sales demonstrations of software programs
- User training modules
- Sandbox experiments
What Types of Data Masking Software Exist?
Static data masking
Static data masking solutions allow sensitive datasets to be masked while the data is at rest. This usually entails a complete copy of a masked dataset. Most commonly, this is used for non-production use cases like providing datasets for software development and testing purposes.
Dynamic data masking
Dynamic data masking solutions allow sensitive data to be masked while the data is in use, and the masking can be based on the attributes of the person viewing it. Most commonly, this is used for in-production use cases. For example, frontline employees or employees in a specific geographic area can view the sensitive dataset dynamically masked based on their role type in real time. This software can be particularly beneficial for customer service use cases.
What are the Common Features of Data Masking Software?
The following are some core features within data masking software that can help users achieve their business goals:
Performance with large datasets: Data masking software must be able to meet the scale and speed of masking large datasets, whether the masking is performed on the database level itself, between application layers, or within the application itself. This is especially important for masking enterprise data and big datasets.
Preserving data characteristics: Some applications expect data to be in a specific format, such as a 16-digit credit card number. For masked data to be utilized in the application, the masked data must also conform to these data characteristics like number length.
Deterministic masking: Deterministic masking allows for the masked data to be consistently masked across multiple tables and applications. For example, if a data record has a first name of “Joan” then the masked name of “Claire” will appear consistently and uniformly across the masked dataset and applications it is used in. This is important especially for in-production customer service use cases where company employees interact with multiple applications like CRMs and billing applications to assist customers. Having consistently matching masked data in those disparate applications can aid in providing the best customer assistance.
Cloud-compatible data masking: Today, many companies are shifting from on-premises data stores to the cloud and are utilizing infrastructure as a service, platform as a service, and software as a service tools. Many data masking tools offer solutions to protect data regardless of where it is used.
What are the Benefits of Data Masking Software?
Reduce unintended data exposure: The main purpose of using data masking software is to protect the data from unintended exposure while maintaining the data’s usability. Data masking software obfuscates the data for audiences that are not authorized to view the data.
Improve access control to data: Data masking software enables companies to only expose data on a need-to-know basis. Using dynamic data masking, in particular, can assist a company with enabling role-based data visibility. So a frontline worker may not be able to see specific customer data like their billing address or phone number within a CRM application, but their manager would have the authorization to do so.
Meet data protection compliance regulations: Data protection regulations and data privacy laws require businesses to safeguard data such as personally identifiable information. Data masking is a technique used to limit unintended data exposure and meet data protection by design and default requirements. Data masking can assist in meeting industry or governmental regulations such as GDPR, PCI DSS, or HIPAA.
Who Uses Data Masking Software?
InfoSec and IT professionals: Information security (InfoSec) and IT professionals implement and manage data masking tools to achieve their company’s data security, data privacy, and data usage goals.
Software developers: Software developers are the end users of data masked using data masking software. Using masked data allows software developers to use test data based on real data but without the risk of using plain text.
Frontline employees: Frontline and other employees use masked data in their day-to-day interactions in the business applications necessary to complete their work. Having masked data in their applications protects them from accidentally viewing, sharing, or using data they are not authorized to use.
What are the Alternatives to Data Masking Software?
Alternatives to data masking software can replace this type of software, either partially or completely:
Data de-identification and pseudonymity software: De-identification and pseudonymity software is similar to data masking software in that it focuses on anonymization by replacing real data with artificial data. However, the difference shows in the end states; data masking obfuscates the data while retaining the original data, while de-identified data is not masked but de-identified through pseudonymization to prevent re-identification.
Synthetic data software: Synthetic data software helps companies create artificial datasets, including images, text, and other data from scratch using computer-generated imagery (CGI), generative neural networks (GANs), and heuristics. Synthetic data is most commonly used for testing and training machine learning models.
Encryption software: Encryption software protects data by converting plaintext into scrambled letters known as ciphertext, which can only be decrypted using the appropriate encryption key. Most commonly, encrypted data cannot be used within applications and must first be decrypted prior to use within applications (with some exceptions with homomorphic encryption techniques).
Software Related to Data Masking Software
Related solutions that can be used together with data masking software include:
Sensitive data discovery software: To determine what data to protect using data masking software, companies must first identify their sensitive data. Companies can use sensitive data discovery software to assist in and automate that process. These solutions search structured, unstructured, and semi-structured data stored in on-premises databases, cloud, email servers, websites, applications, etc.
Challenges with Data Masking Software
Software solutions can come with their own set of challenges.
Sensitive data discovery: To protect data using data masking techniques, the data a company wants to protect must first be identified. The type of data that companies seek to mask can include personally identifiable information (PII), protected health information (PHI), payment card industry (PCI) data, intellectual property (IP), and other important business data. Often, this data is stored across multiple company systems, including databases, applications, and user endpoints.
Defining role-based-access policies: Using dynamic masking that modifies what data is masked or visible based on a viewer’s role type requires those roles to be defined by company policy. This requires companies to invest in defining those roles for the data masking software to be effective.
Re-identification: A common concern of using masked data is the risk of it being re-identified using other context clues resulting in a data breach. This could be by combining the data with other datasets to re-identify it or simply by not masking enough data. For example, in a CRM system, if a customer’s first and last name is redacted, but not their email address, which is often a person’s first and last name, it can be easy to infer who the customer is.
How to Buy Data Masking Software
Requirements Gathering (RFI/RFP) for Data Masking Software
Users must determine their specific needs to prepare for data masking. They can answer the questions below to get a better understanding:
- What is the business purpose?
- Does the user need static data masking solutions, or do they need dynamic data masking solutions?
- What kind of data is the user trying to mask?
- Is it financial information, classified information, proprietary business information, personally identifiable information, or other sensitive data?
- Have they identified where those sensitive data stores are--on-premises or in the cloud?
- What specific software applications is that data used in?
- What APIs do they need?
- Who within the company should have the authorization to view sensitive data, and who should be served masked data?
Compare Data Masking Software Products
Create a long list
Buyers can visit g2.com’s Data Masking Software category, read reviews about data masking products, and determine which products fit their businesses’ specific needs. They can then create a list of products that match those needs.
Create a short list
After creating a long list, buyers can review their choices and eliminate some products to create a shorter, more precise list.
Conduct demos
Once a user has narrowed down their software search, they can connect with the vendor to view demonstrations of the software product and how it relates to their company’s specific use cases. They can ask about the masking methods--from substitution to shuffling and more, and where their solution sits--at the database level, between the application and the database, or within the application. Buyers can also ask about integrations with their existing tech stack, licensing methods, and pricing—whether fees are based on the number of projects, databases, executions, etc.
Selection of Data Masking Software
Choose a selection team
Buyers must determine which team is responsible for implementing and managing this software. Often, that may be someone from the IT team and the InfoSec team. They should also include end users on their selection team, such as software developers or frontline employees. It is important to have a representative from the financial team on the selection committee to ensure the license is within budget.
Negotiation
Buyers should get specific answers to the cost of the license and how it is priced, and if the data masking software is based on database size, features, or execution. They must keep in mind the company’s data masking needs for today and the future.
Final decision
The final decision will come down to whether the software solution meets the technical requirements, the usability, the implementation, other support, expected return on investment, and more. Ideally, the final decision will be made by the IT team in conjunction with InfoSec or data privacy teams, alongside input from other stakeholders like software development teams.