ClearSky DataOne of the big draws towards any cloud-based storage is expandability: you aren’t responsible for maintaining drives, adding shelves, or even for the cooling and caring of the arrays. The biggest downside is typically speed. Moving data to the cloud usually means that what used to be sub-millisecond to access can now take significantly longer. ClearSky Data saw this challenge and wanted to provide a solution. We went for a bit of a deep into their storage offering at Tech Field Day 14 in order to get a better understanding of how their platform works.


I don’t want to rehash what I have previously written, but as a quick overview, ClearSky Data is able to provide very low latency access to storage over their network. This is accomplished by using leased lines that connect directly to their Points of Presence (POP). At the edge locations (e.g. your datacenters), Edge appliances are deployed that contain a read-only cache. Because the cache is read-only, in the event of a catastrophic failure, you won’t be down and out. Reads will be slower since they will now be going over the private WAN, but no data will be lost. A new Edge appliance will be installed, the cache will rebuild, and you’ll be back up and running.

The service life of an Edge device is about three years. However, ClearSky Data took the time to ensure that all of their software has some sort of monitoring in it. If your flash drives wear out sooner than foreseen, they’ll receive an alert and will swap out the device proactively. So where did they come up with the “3-year” estimate? By using leased lines. You can do the math to find out what the theoretical maximum amount of data is that you can push per hour/day/year.Taking that number and applying it against the drive yields the expected life.

The Edge devices themselves are highly-redundant (power supplies, backplanes, etc.) and are essentially a cluster-in-a-box solution. What if you hit the point where you need another Edge appliance? You can simply drop one in place and it will normalize with the existing one. Very hands off, which is exactly what you need from a managed solution such as this.


ClearSky Recovery

A glimpse of the vCenter plugin from ClearSky Data

One question that should pop into any IT professional’s mind is “how do I backup that data?”. Backing up over the network is an option, but that can saturate the network with extra traffic and burden performance. ClearSky Data recently announced a new backup offering, and we were fortunate enough to get an overview.

The solution is currently for VMware environments only and it offers integration into vCenter via a plugin. Once running, volumes will be protected by default, but a user can override that if desired. Because this is all happening on the back-end, there are no file-level protection points. In the event that you need to perform a single file recovery, you’ll need to restore a VM (to a new location), fire it up, and copy the file out of the VM. A bit of a burden, but at least the data is still recoverable.

An interesting point that was brought up was the ability to have a DR site use a different POP. Because the local POP flushes its data out to the cloud every 10 minutes, a copy can be pulled down at another POP. It will require some legwork as far as the networking, etc., and it is not a feature that is available yet, but it is possible. Having a low maintenance storage solution that can give me a 10 minute RPO for a DR site is noteworthy. Sure, this can be done with lots of other solutions, but keep in mind that ClearSky Data’s solution would be very hands off for the customer – it’s all a managed service.


Something that stood out to me after the presentation: we never went into specifics about the storage hardware. Yes, we touched on the Edge appliance, but even then we didn’t really talk about drives, RAID configurations, etc. Consider this: ClearSky Data is a storage as a service company. They have turned storage into a true commodity by making us not care what is under the hood. Yes, we may see similar abstractions from AWS, Azure, or Google Cloud, but we don’t get this visibility. As long as we get the throughput that we need, and the data is safe, does it matter what hardware is running storage?

