|
Effectively Managing Unstructured Data
By P. Ramsundar
Mumbai, Jan 21, 2008
A shattering majority of organizations surveyed in a recent research study conducted by EMC admitted they were not classifying their unstructured documents - a critical initial step to effective information management.
A recent report by IDC - The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010 -- says that information will grow at a CAGR of 57% between 2006 and 2010 to reach 988 exabytes*; over 95% of the digital information consists of unstructured data, of which: 80% is the contribution of organizations. It also adds, organizations including businesses of all sizes, agencies, governments, and associations - will be responsible for the security, privacy, reliability, and compliance of at least 85% of this information.
As unstructured information is increasingly integrated with legacy applications within service-oriented architectures (SOA), it is imperative that companies begin addressing ways to better manage all information assets in an integrated, holistic way. The industry is witnessing the arrival of a wide array of tools for information classification and management that will enable companies to tackle this challenge head-on.
Unstructured file types - defined as anything not stored in a database - such as spreadsheet analyses, word files, pdfs, presentations, and audio and video files are becoming increasingly important these days. What makes the situation even more grueling is the fact that such information forms have begun to form an integral part of the key day-to-day operations of an organization.
Various technologies such as data movement and automated policy execution software that have been used successfully on structured data cannot function on unstructured documents without effective classification of information and majority of organizations lack just that. Organizations until now have mostly been using a manual process to classify information and that produced very simplistic, static classifications, which were quickly outdated as the documents aged with time.
The only way to deal with this crippling challenge is to deploy an effective information management system. Information Management (IM) is a holistic approach to managing both structured and unstructured data that brings together previous independent efforts to manage structured and unstructured data such as RDBMS, ECM, enterprise search and enterprise portals.
Ironically, because they were implemented separately, these existing technologies have actually increased the barriers between structured and unstructured information and made the integration of all data types within business processes, all the more problematic.
What IM Strives to Achieve
IM primarily attempts at addressing ways best suited to leverage the value of the organization's combined information assets. Integration of the structured and unstructured data requires that there is an up to date inventory of all data. Since companies usually have a reasonable inventory of structured data, the first step involves identifying the unstructured documents that exist. IT, along with the lines of business, must determine where these documents are stored, who owns them, who uses them, which business processes require them, and the scope of their content. Then, the team must assess any related information policies that may already be in place, by asking if these policies support the requirements and what policies needs to be coined and implemented. By judging the gaps between requirements and current information available, the team creates a new set of business-based information policies that are used to classify all existing unstructured information assets.
Information Classification and Management tools
An innovative array of tools, called Information Classification and Management (ICM), has generated significant interest in this area. These tools uniquely facilitate the integration of information that a service-oriented architecture requires. Instead of forcing a move of the documents to a consolidated storage platform, the tools provide all of the information needed to fully identify each document where it currently exists. Applications running within an SOA can use this information to access, process and distribute the documents as needed, and in compliance with corporate policies.
These tools catalog the attributes and actual content of files as well as their service level agreements (SLA) requirements. Thus, companies are able to keep documents classified properly as new data is created and existing documents change and age. Careful application of automation to the discovery and classification process helps ensure the ongoing change that occurs - documents created, versions updated, copied, deleted, etc. - is accounted for and that the relationships between them and the infrastructure are kept current.
Further, as these tools can assess both file attributes and actual content, organizations are able to orchestrate the actions, such as moving documents to a secure storage platform, that keep unstructured documents in compliance with corporate information policies with minimal manual effort and greater accuracy.
Making IM Successful
Additionally, there exists a host of legal, organizational and political challenges that organizations must consider to make their IM efforts successful. Most importantly, the lines of business should be closely involved in the classification process since the business process requirements must define the relationships between the unstructured documents, structured records, and applications. Any information policies that come out of the classification process should directly be linked to the business model and support business process requirements.
Another key factor is having access to outside resources with experience utilizing the methodologies needed to map existing information assets and policies to business-based requirements and identify the gaps. This experience gives the project team a politically neutral perspective that helps companies navigate the IM planning and implementation process effectively.
What IM can accomplish
Implemented efficiently, information management can provide organizations with significant benefits. These include significantly lowered costs from better asset utilization and making information a competitive asset to business; lowered risks resulting from better security and availability of critical unstructured documents; and compliance with regulatory requirements such as SOX, HIPAA, Basel 2, etc.
Moreover, companies will witness much more efficient utilization of storage resources because IM makes it easier to classify data, automate policy-based actions, and meet SLAs while balancing infrastructure-related costs and service delivery based on the documents' value to the lines of business. IM also greatly reduces the storage volumes required using techniques such as data de-duplication to reduce redundant documents and eliminate outdated versions wherever it is suitable.
Because IM complements applications deployed in SOA, lines of business users should see substantial productivity improvements as unstructured documents are delivered to users as needed in the context of the applications they already use. As a result, users make more timely, accurate, and effective decisions.
Implementing IM in phases
A successful, well-crafted IM approach involves phases that start with the highest priority information first. The project team or sponsoring executive should make a decision up front to focus on either cost improvement or risk reduction objectives. This initial phase can also serve as proof of a concept that paves the way for subsequent phases of the IM effort.
A skilled and experienced project team should conceive methodologies that will help optimize an information infrastructure. Project resources should also have practical experience helping other organizations utilize the latest automated software technologies such as discovery and classification tools, policy management software, and virtualization technologies.
Outside experience with IM will help organizations nullify previous mistakes and challenges that had held back previous efforts. Experience with compliance, continuity and security issues are important as these continue to grow in prominence and significance. Experience will ensure that the resulting IM tools meet the information needs of the business adequately.
*Exabyte can be estimated as 10 to the eighteenth power byte
|