10 best practices for securing data in Hadoop

by CXOtoday News Desk    Apr 05, 2013

Cloud

The explosion in information technology tools and capabilities has enabled advanced analytics using Big Data. However, the benefits of this new technology area are often coupled with data privacy issues. In these large information repositories, personally identifiable information (PII), such as names, addresses and social security numbers may exist. Financial data such as credit card and account numbers might also be found in large volumes across these environments and pose serious concerns related to access. Through careful planning, testing, pre-production preparation and the appropriate use of technology, much of these concerns can be alleviated.

Dataguise, a provider of data security intelligence and protection solutions, today released ten security best practices for organizations considering or implementing Hadoop. By following these procedures to manage privacy risk, data management and security, professionals can prevent costly exposure of sensitive data, reduce their risk profile and better adhere to compliance mandates. With Hadoop security deployments among the Fortune 200, Dataguise has developed these practices and procedures from significant experience in securing these large and diverse environments.

1. Start Early! Determine the data privacy protection strategy during the planning phase of a deployment, preferably before moving any data into Hadoop. This will prevent the possibility of damaging compliance exposure for the company and avoid unpredictability in the roll out schedule.

2. Identify what data elements are defined as sensitive within your organization. Consider company privacy policies, pertinent industry regulations and governmental regulations.

3. Discover whether sensitive data is embedded in the environment, assembled or will be assembled in Hadoop.

4. Determine the compliance exposure risk based on the information collected.

5. Determine whether business analytic needs require access to real data or if desensitized data can be used. Then, choose the right remediation technique (masking or encryption). If in doubt, remember that masking provides the most secure remediation while encryption provides the most flexibility, should future needs evolve.

6. Ensure the data protection solutions under consideration support both masking and encryption remediation techniques, especially if the goal is to keep both masked and unmasked versions of sensitive data in separate Hadoop directories.

7. Ensure the data protection technology used implements consistent masking across all data files to preserve the accuracy of data analysis across every data aggregation dimensions.

8. Determine whether a tailored protection for specific data sets is required and consider dividing Hadoop directories into smaller groups where security can be managed as a unit.

9. Ensure the selected encryption solution interoperates with the company’s access control technology and that both allow users with different credentials to have the appropriate, selective access to data in the Hadoop cluster.

10. Ensure that when encryption is required, the proper technology (Java, Pig, etc.) is deployed to allow for seamless decryption and ensure expedited access to data.

By starting early and establishing processes that define sensitive data, detect that data in the Hadoop environment, analyse the risk exposure and assign the proper data protection using either masking or encryption, enterprises can remain confident their data is protected from unauthorized access. In following these guidelines, data management, security and compliance officers cognizant of the sensitive information in Hadoop can not only lower exposure risks, but increase performance for a greater return on Big Data initiatives.

“Enforcing security and compliance in Hadoop is not a simple matter and requires the right combination of people, processes and technology. The best practices presented here illuminate the important procedures required to maintain data privacy of sensitive data stored in Hadoop. As indicated above, it is critical that organizations place priority on protecting the data first to provide a strong line of defense against unlawful exposures before moving forward,” said Manmeet Singh, CEO, Dataguise. “With significant experience in securing Fortune 200 environments, we encourage practitioners to consult with experts when data exposure and non-compliance is not an option. This is the value beyond software provided by Dataguise.”

Your say
Sign in to post a comment, or Sign up for an account.