PetaScale Storage Gets Real: Are You Ready?


With the introduction of new digital acquisition technologies (from video cameras to sensors) and increasingly sophisticated data analysis tools, the way we handle and save our data is changing. The true value of information will evolve over time. For example, real-time data and historical data can reveal unexpected results. Old video footage can be compiled and digitized from archives to capture a previously insignificant moment in time. For businesses that rely on data to identify trends or repurpose content for monetization, there is a need to keep all of this forever.

The amount of data captured and shared between applications is staggering. In our personal lives it is not unconceivable for an individual to have a hundred thousand or more photos and thousands of home videos (my wife has 151K photos as of this week). While the world has seemingly embraced a streaming model, the desire to accumulate personal data stores has continued to grow as we find ourselves caught in the middle of changing technologies and hoarding content. We are moving from one computing platform to the next and now using the “cloud” as our virtual closet in order to make additional room on our desktops.

The complexity within businesses is compounded as different departments or organizations demand access to volumes of data to perform day-to-day “analysis” and tasks. For many companies, most data is coming from various sources and filed individually and there isn’t enough time in a day to effectively manage the data—the rate of ingest is just overwhelming. The result is that data housekeeping responsibility is pushed back to users, which makes it difficult to maintain and breaks the standard operating policy. The cloud has now become a convenient “warehouse” rather than the closet, with larger gateways and internal private cloud caches to manage the demand.

But what about the growing number of organizations with petabyte-sized or so called “petascale” archives that are not comfortable with storing all that information on the public cloud?

The cost and complexity of traditional storage technology used to manage these larger data sets at petascale are not up to the task. Relying on traditional RAID technology to maintain long-term archives of active data is not only highly inefficient, but unsustainable from a power and cooling standpoint. The high-speed disk and CPU intensive design of tier one disk is just not capable of scaling into several petabytes — not to mention exabyte-scale in a just a few years.

Data usage patterns have also changed. Until recently, data would have been mostly at rest and users were satisfied by storing data on tape. Entertainment and research industries can ingest petabytes of information or content in a day. For example, a reality TV producer could easily ingest and maintain 16 TBs of content monthly for each show created. There is a desire to not only keep all of their assets indefinitely but also be able to retrieve that information quickly for repurposing or analysis. With capacity needs in single namespace data sets outstripping tape capacities, and users demanding nearly instantaneous access, disk was at times the only viable solution.

Next generation object storage is technology that helps power the cloud for many service providers today. This technology, which has been used for reliable satellite communication transmissions in space, is now used by commercial service providers for durable, efficient data protection in the cloud. Object storage saves data as objects with IDs rather than in traditional hierarchal file structures. This enables large volumes of data to be spread geographically yet be accessible locally to users in any location at nearly tier-one storage performance. Object storage leverages cloud-grade drives for lower power requirements and highly efficient CPUs and memory.

The new technology also has the added benefit of offering more flexible durability standards, meaning that data contained in object storage can actually be more durable than a similar sized RAID 6 storage array, even when the RAID 6 array is mirrored at another location. In addition, some object storage technologies can virtually eliminate the need for time consuming and risky migration of data from one platform to another in order to maintain long-term archives.

Object storage provides tremendous cost benefits when it comes to accessible and quick access to volumes of archives. It could play a role in a tiered archive strategy where large amounts of data moves based on user-defined policies — even for large volumes of data which may seem unmanageable otherwise. With the right access technology to serve as the on-ramp to object storage, companies can take the next step of a mostly cloud-based storage infrastructure, or for others just interested in building a private cloud infrastructure offering similar benefits, but with greater control.

The question of how cost-effective is it to store everything is a very hard one to answer, but one that organizations and consumers are facing. Today’s object storage makes it easy to archive petabytes of information and store it in a way that is much less expensive than with traditional approaches. Like self-storage facilities that pop up in every U.S. city, cloud and object storage-based private clouds give you an easy way to save those data treasures you may need some day.