On-prem Cloud Storage with Igneous Systems

November 21, 2016November 21, 2016 Matt 2 Comments Hardware, IT, storage, TFD, vdm30in30 5 min read

Igneous Systems recently presented at Tech Field Day 12, and honestly, I was quite looking forward to learning more about them. All I really knew going into the presentation was that they provided ‘cloud storage, on premises’. Once we started picking into the details, things definitely got interesting, along with some mind-crippling moments when trying to digest what was presented.

A Quick Overview

Igneous’ stance is that data is growing at a tremendous rate. Things such as IoT sensors, media (4K video for example), and just raw data (think bio-medical or mineral data), chew up lots of space. In fact, one of the delegates I was talking to mentioned that he goes through petabytes a year! So, the problem is very real, even though it may not be common.

What Igneous delivers is fully-managed ‘chunks’ of storage, that come in increments of 212TB. They take care of the bulk of the configuration, monitoring, and maintenance. From the customer side, they require some network ports and outbound HTTPS access (for their remote management). The storage is accessible via the S3 protocol, which allows for cloud storage scalability, in your on-premises environment.

Patching and Updates

I already knew most of the above prior to going into Tech Field Day 12, however, there was still a lot to be revealed. Touching on the management piece, Igneous automatically rolls out updates. Their solutions typically consist of 62 components which get updated a handful at a time. The system is designed to be able to work with different patch levels, plus / minus a couple of versions, which is how they are able to roll out updates in stages. This might sound alarming to folks in that systems are automatically upgraded. I definitely understand that, but I also understand why Igneous took this approach, particularly from a support standpoint (e.g. minimize supported versions, and hopefully issues).

Fault Domains

This is where things got really interesting (in fact, my notes tapered off here as I was trying to digest what we just covered). Igneous Systems has designed a system architecture which they call RatioPerfect. The trouble with having large amounts of storage (and infrastructure) is a) identifying and alleviating bottlenecks and b) limiting fault domains.

For the hardware driving the solution, a Cisco UCS 3260 is used. The traditional method to address all of the storage in a system like this would involve some sort of SATA expander, or possibly a couple of SAS cards. The trouble with these designs is that if a card dies for whatever reason, the spike in traffic going through the remaining card(s) will rise significantly. This can lead to performance issues, and potentially data loss. Even if the cards are working fine, you still have a potential bottleneck with one card managing many drives.

Igneous started looking at solutions. It started off with realizing that losing a whole server or disk shelf was not acceptable. Next, they looked at doing them in groups of four, however, if they were putting 16 drives in a 1U server, the risk of losing %25 of their storage and performance was still too high. They focused on the problem and noticed that they were quickly moving to a 1 CPU : 1 Drive ratio, but to do that with traditional compute just isn’t feasible from a cost perspective.

So, they did the next best thing: throw traditional compute out the window and bring in a custom ARM chip for each drive! The ARM chip sits on a board which plugs into the drive, very similar to something like a SAS-to-SATA interposer. From there, the drives are plugged into an off-the-shelf JBOD array which has dual Ethernet switches with 10 GbE uplinks in each box. But wait, what are the switches for? Each of those ‘interposer’ boards actually creates two network paths out to the switches, meaning each drive has two gigabit connections (1 live, 1 redundant) directly to the network! How does the system do this without a gigabit port? Ethernet signals are sent electrically across the SAS connection – note that this isn’t encapsulation of any sort, it’s the raw signaling. All the traffic that is done over IPv6, so you don’t need to carve out a slew of your IPv4 network.

The issue that this design solves is that now, the fault domain is one drive. You don’t need to worry about controllers failing, because they are completely removed from the problem. If a drive is knocked offline, then the only compute that you loose is whatever was assigned to that drive – so no real performance drop either.

Closing thoughts

Igneous just came out of stealth about a month ago, but they already have some customers (no specific numbers were provided). With that in mind, there were some concerns that were brought up during the session.

First up, sizes. Currently, the solution is available in only 212TB chunks. Given that it is a subscription service, this might lead to odd situations (e.g. forecasting only needing 120 TB, but being stuck over provisioning). This also brings up the question as to whether it is a Capital Expense (it’s once a year, but you don’t really own it) or an Operating Expense (it’s recurring but not monthly).

When asked about Service Level Agreements, apparently nothing firm is in place. It appears to be more of a ‘SLU’ (service level understanding). I understand that the company is very young and may not have the support infrastructure needed to provide 4 hour turn around times, but this might be a big hurdle in getting storage into new customers.

Lastly, was deployment time. No firm answer was provided, but was estimated somewhere around 1 to 2 weeks. Compared to traditional storage, that is reasonably quick, however, we are talking cloud storage here. When you compare it against providers such as AWS or Azure, which is basically instant, two weeks seems long in the tooth.

Overall, I was quite impressed with the solution. A lot of time, effort, and thought was put into the design. Igneous Systems is a young company, which means they can pivot very quickly. I am looking forward to see them present at future Tech Field Day events, as I would love to see what they cook up next.

You can find all of Igneous Systems Tech Field Day 12 videos at the Tech Field Day website.

Disclaimer: I was invited to participate in Tech Field Day as a delegate. All of my expenses, including food, transportation, and lodging were covered by Gestalt IT. I did not receive any compensation to write this post, nor was I requested to write this post. Anything written above was on my own accord.

Matt That IT Guy