Wednesday, November 9, 2011

Building a SAN for backup and remote replication, part 2

In part 1, I complained about OpenFiler. In this part, I want to talk about hardware.

This is a single SAN as I built it. (Remember that you need 2 identical SANs for replication.)

SuperMicro 825TQ-R700LPB 2U rack-mount case with 700W redundant power supply
• 8 Seagate Constellation ES ST32000644NS 2TB 7200RPM SATA 3.0Gb/s 3.5" “enterprise” hard drive
• 2 Seagate Barracuda ST3160316AS 160GB 7200RPM SATA 6.0Gb/s 3.5" internal hard drives
SuperMicro X9SCM Micro-ATX motherboard
Intel Core i3-2100 dual-core processor
• Crucial 8GB (2 x 4GB) 240-Pin DDR3 SDRAM ECC Unbuffered DDR3 1333 memory (model CT2KIT51272BA1339)
HighPoint RocketRAID 2640X4 PCI-Express x4 SATA / SAS (Serial Attached SCSI) controller card
Startech PEXSATA22I PCI-Express SATA controller card

Each SAN cost $3000, so the pair was $6000 (plus software, discussed in another post).

If you want to use a SuperMicro case, get a SuperMicro motherboard. The cases often have a proprietary front panel connector. The 825TQ comes with 8 hot-swap SATA/SAS bays, two internal bays for a pair of 3.5” drives, and a built-in slim DVD-ROM drive. It’s a good case but I did have a couple of nitpicks. The hot-swap cages are a bit flimsy: every time I pull out a cage I feel like I’m going to break the handle. And – not that I was likely to use them anyhow – the SGPIO cables were incredibly short, failing to reach from the hot-swap backplane to the HighPoint controller card.

The motherboard itself included the most feature-filled BIOS I have ever seen on a Micro-ATX board. The BIOS is UEFI, onboard SATA ports can be configured as hot-swap, the text-mode display can be echoed to a serial port, and each of the onboard network adapters can act as an iSCSI host-bus adapter. Given more time, I would have loved to play with that last feature.

The processor is the cheapest Sandy Bridge available. SANs don’t need a lot of raw processing power.

The motherboard supports ECC memory, so that’s what I used. I get a little uneasy at the idea of tens of billlions of extremely transient memory bits with no error correction. If I had my way, every computer with more than 4GB of RAM would include ECC.

The motherboard also has dual Intel gigabit NICs. (Broadcom and Realtek NICs are popular low-cost alternatives – just say no.)

Storage subsystem, or why the simplest thing that could possibly work can get complicated fast

The star of any SAN is the storage subsystem, and this is where I could have done better.

I opted for Seagate Constellation ES drives. While Seagate says that the Constellation series are “enterprise” drives, in reality they are the minimum drives that you should accept in a server room. The SAS version of this series is what is known as “near-line SAS”, which is SATA guts combined with a SAS interface. Real SAS drives like Seagate’s Cheetah series have faster rotational speeds and are (supposedly) built to tighter specifications with greater reliability guarantees.

Since high performance or uptime is not a primary concern, the Constellation ES is acceptable. I have had a single drive out of the 16 drop out, and it started working again when I pulled and reattached it.

Here is an image of one of the SANs, with 11 SATA cables (8 data drives, 2 boot drives, and a DVD drive):


Messy, isn’t it? The amazing thing is how reliable it’s been.

There are 6 hard drives attached to the motherboard and another 4 to the HighPoint controller. I relegated the DVD drive to its own little controller. If I were to do this again, I would find a case with a backplane that supported SFF-8087 connectors and a compatible SAS controller. At that point, I would also go with near-line SAS drives, since they cost more or less the same as enterprise SATA drives. With one more tweak, I could reduce the number of data cables from 11 to 3.

The final tweak has to do with the thing I am least happy about the SANs as I built them. I put in a pair of consumer-grade Seagate Barracuda drives. Like all consumer-grade drives, quality is a crap shoot (perhaps Russian roulette would be a better analogy). Two of the four total drives went bad in the first couple of months of operation, and since the OS drives were not hot-swap, fixing them required shutting down the SAN to pull the failed drive and shutting it down again when the replacement drive arrived a couple of weeks later.

If I had used a hardware RAID controller instead of depending on software RAID, I could have stored the operating system on a small volume on the data drives. With software RAID, it’s only possible to boot to a mirror.

Alternatively, it turns out that the X9SCM motherboard includes an internal USB port. Although I haven’t found it yet, someone has to have posted a guide to minimizing disk writes on a Linux server to maximize flash drive life. With that guide and a cheap thumb drive, I could replace the OS drives. The thumb drive wouldn’t be redundant, but it should be possible to save and restore the OS volume without too much pain.

Which leads me to the final component of the storage subsystem: the controllers. I mentioned before that I used 4 onboard SATA ports with hot-swap enabled. The other 4 SATA ports came from a HighPoint RocketRAID 2640x4 acting as a non-RAID controller. I would not use this card again in a Linux system. I struggled to find a working driver, even trying to build my own. I finally had success with driver version 1.3 (built Dec 3 2010 09:50:48). The card is perfectly stable, but I spent a lot of time worrying that it would never work.

We must prepare for tomorrow night!

So my next SAN (assuming anyone ever lets me build one again) will use a case with SFF-8087 connectors on the hot-swap backplane, near-line SAS hard drives, and a SAS RAID controller with good Linux support. I’m guessing I would add about $400 to the cost of each SAN, mostly for the RAID controller. There would be some savings from eliminating the OS drives and the additional SATA controller.

I’m tempted by the Areca series of controllers, but I’m put off by the active cooling solution on their cards. Unless the fan uses ball bearings instead of the more common sleeve bearing, the least bit of dust will eventually cause the fan to seize and make the RAID chipset hotter than having no fan at all. This thread discusses some options. More likely, I would go with a controller that cools with a simple heat sink and depend on the case’s fans for cooling.

In the next part, I’ll talk about iSCSI Enterprise Target and DRBD.

2 comments:

Unknown said...

Good stuff. I am anxiously awaiting the next steps. I have done this twice before but gone the OpenFiler + hardware raid controller route. I no longer wish to use hardware RAID, so your implementation will be a huge help.

RAID 6 DP?

Craig Putnam said...

@c3, I went with RAID 5 with one hot spare. Linux's software RAID package offers RAID 6, but I wanted to stick with what I understood. In retrospect, RAID 6 might have been a better option.