Leveraging Cold Cloud Storage for Unstructured Data Archives

By Chris Magyar, CIO, Seven10 Storage Software

Chris Magyar, CIO, Seven10 Storage Software

Organizations are accumulating large and ever-growing repositories of unstructured data. Business and regulatory requirements mandate that such data remain accessible and protected from damage, loss or change (compliance). Storage replication and archiving strategies have replaced traditional primary copy + backup solutions as the norm. Too often, this data remains at risk from negligence, malicious action or natural disaster. Fortunately there is a new class of ‘cold cloud storage’ offered by major cloud storage vendors that can be used to maintain an offsite, online and secure copy of such data at an aggressive price point of $0.01/GB/month.

"Storage replication and archiving strategies have replaced traditional primary copy + backup solutions as the norm"

Within healthcare, “Covered Entities” and “Business Associates” are subject to severe non-compliance penalties for failing to conform to the Backup and DR requirements specified in the HIPAA HITECH Security Rule. The author has encountered many entities within and outside Healthcare whose Backup/DR strategies are sub-optimal at best and non-conformant at worst. With reimbursements falling and budgets fixed, storage infrastructure being kept in service longer and the risk of data loss increases. Such organizations are in dire need of a solution that will leverage their existing storage investments AND provide a graceful mechanism to leverage advances in storage technology.

Other industries have comparable requirements for compliance and data protection. In some environments there is a specific requirement to maintain a copy on a dissimilar storage technology – explicitly ruling out a replication-only approach between similar devices.

Fortunately, a compelling new form of cost-compelling storage has emerged-‘cold’ cloud object storage from Google and Amazon. Both based on the S3 RESTful object interface, these offerings are branded as Google Cloud Storage Nearline and Amazon S3 Standard- Infrequent Access. This storage is priced like Amazon’s Glacier at $0.01 /GB/month, but is NOT hamstrung by glacial 3-5 hour retrieval times.

On quick review (contact vendors for exact pricing and SLA’s), cold cloud storage appears ideally suited for Disaster Recovery and perhaps even deep Archival storage that is infrequently accessed. For “copy 2” or “copy 3” data that is written once and retrieved rarely and hopefully never, 100 TB could be stored offsite for a whopping $12-15K/year. Retrievals are priced at roughly $0.01/ GB retrieved. In the event of a disaster, a one-time retrieval cost for all 100 TB would cost <= $12,000 – a bargain relative to other post-disaster recovery costs.

For Healthcare, both Google and Amazon are willing to sign the requisite Business Associate Agreements required under HIPAA HITECH Act.

But here is the critical question-how does one transparently add a tier of cold cloud storage to an existing environment that is using a large unstructured data archive to run the business?

The bulk of unstructured data is acquired and stored by business applications such as medical imaging, document management systems, document imaging systems, call recording applications, etc. Most existing applications use a “Least Common Denominator” file level protocol such as SMB or NFS to provide compatibility with the largest number of storage devices. Such applications rarely provide robust storage management features. Instead, they rely on an implementation of one or more technologies to provide storage management services such as replication, backup, DR, multi-tiering, data migration, device failover, encryption, file-to-cloud gateways, etc.

A common approach is to rely solely on proprietary storage replication features for data protection between two similar storage devices. An obvious concern is having both copies in the same data center or geographic disaster zone. Another concern is that accidental or malicious changes to the primary device will be faithfully replicated to the secondary. Snapshots may improve recovery from detected one-time events but are far less useful when undetected damage occurs gradually over time. Fail-over/fail-back between primary and secondary replicas requires manual intervention, as will recovery of lost/damaged data from a surviving replica.

Application vendors and organizations may alsorely on storage virtualization gateways, hierarchical storage managers, cloud storage gateways and the like. These products provide storage management capabilities that the application or storage vendors are unable to provide. Such technologies can be used to manage multiple tiers of existing, similar or dissimilar storage and may even be able to migrate data between existing and new tiers of storage (e.g. a new tier of cold cloud storage, replacement storage technologies, etc.).

It may be helpful to examine a diagram depicting various strategies for moving from an application storing to an existing [often replicated] NAS device to a combination of that existing NAS device PLUS a tier of cold cloud storage:

From Left to right-

1) Consider a typical archival application that only supports a single tier of NAS storage. Adding a DR tier of cold cloud storage isn’t possible– without help.
2) Consider an archival application that supports multiple tiers of NAS and S3 cloud storage and a mechanism for duplicating existing NAS storage to a cloud tier. Such a solution would be simple, but unfortunately, such applications are rare as hen’s teeth, and application vendors are slow to add such features.
3) Consider an archival application that supports multiple tiers of NAS storage. This is somewhat more common, and the app ‘only’ needs help from a minimal single-tier NAS-S3 Cloud Gateway plus a NAS-NAS data migration to onboard existing data.
4) Finally, consider an archival application that supports a single tier of NAS storage that is used in conjunction with multi-tiered Storage Gateway that can be used to manage existing NAS storage, a tier of S3 Cloud storage, and that includes the ability to migrate data from tier to tier as organizational requirements evolve.

Regardless of the means used to achieve the end, organizations managing large unstructured data archives no longer have an excuse for sub-optimal or non-compliant DR plans. It is often not a matter of whether, but when the unexpected happens. A failure to prepare can be the same as preparing to fail.

Fortunately, a new generation of offsite, online, available, durable and cost-compelling cold cloud storage has emerged from Google and Amazon to fill the gap. As storage technologies continue to evolve, we can safely assume that additional vendors will come to market with similar or even more compelling offerings.