Wednesday, November 9, 2011

Building a SAN for backup and remote replication, part 2

In part 1, I complained about OpenFiler. In this part, I want to talk about hardware.

This is a single SAN as I built it. (Remember that you need 2 identical SANs for replication.)

SuperMicro 825TQ-R700LPB 2U rack-mount case with 700W redundant power supply
• 8 Seagate Constellation ES ST32000644NS 2TB 7200RPM SATA 3.0Gb/s 3.5" “enterprise” hard drive
• 2 Seagate Barracuda ST3160316AS 160GB 7200RPM SATA 6.0Gb/s 3.5" internal hard drives
SuperMicro X9SCM Micro-ATX motherboard
Intel Core i3-2100 dual-core processor
• Crucial 8GB (2 x 4GB) 240-Pin DDR3 SDRAM ECC Unbuffered DDR3 1333 memory (model CT2KIT51272BA1339)
HighPoint RocketRAID 2640X4 PCI-Express x4 SATA / SAS (Serial Attached SCSI) controller card
Startech PEXSATA22I PCI-Express SATA controller card

Each SAN cost $3000, so the pair was $6000 (plus software, discussed in another post).

If you want to use a SuperMicro case, get a SuperMicro motherboard. The cases often have a proprietary front panel connector. The 825TQ comes with 8 hot-swap SATA/SAS bays, two internal bays for a pair of 3.5” drives, and a built-in slim DVD-ROM drive. It’s a good case but I did have a couple of nitpicks. The hot-swap cages are a bit flimsy: every time I pull out a cage I feel like I’m going to break the handle. And – not that I was likely to use them anyhow – the SGPIO cables were incredibly short, failing to reach from the hot-swap backplane to the HighPoint controller card.

The motherboard itself included the most feature-filled BIOS I have ever seen on a Micro-ATX board. The BIOS is UEFI, onboard SATA ports can be configured as hot-swap, the text-mode display can be echoed to a serial port, and each of the onboard network adapters can act as an iSCSI host-bus adapter. Given more time, I would have loved to play with that last feature.

The processor is the cheapest Sandy Bridge available. SANs don’t need a lot of raw processing power.

The motherboard supports ECC memory, so that’s what I used. I get a little uneasy at the idea of tens of billlions of extremely transient memory bits with no error correction. If I had my way, every computer with more than 4GB of RAM would include ECC.

The motherboard also has dual Intel gigabit NICs. (Broadcom and Realtek NICs are popular low-cost alternatives – just say no.)

Storage subsystem, or why the simplest thing that could possibly work can get complicated fast

The star of any SAN is the storage subsystem, and this is where I could have done better.

I opted for Seagate Constellation ES drives. While Seagate says that the Constellation series are “enterprise” drives, in reality they are the minimum drives that you should accept in a server room. The SAS version of this series is what is known as “near-line SAS”, which is SATA guts combined with a SAS interface. Real SAS drives like Seagate’s Cheetah series have faster rotational speeds and are (supposedly) built to tighter specifications with greater reliability guarantees.

Since high performance or uptime is not a primary concern, the Constellation ES is acceptable. I have had a single drive out of the 16 drop out, and it started working again when I pulled and reattached it.

Here is an image of one of the SANs, with 11 SATA cables (8 data drives, 2 boot drives, and a DVD drive):


Messy, isn’t it? The amazing thing is how reliable it’s been.

There are 6 hard drives attached to the motherboard and another 4 to the HighPoint controller. I relegated the DVD drive to its own little controller. If I were to do this again, I would find a case with a backplane that supported SFF-8087 connectors and a compatible SAS controller. At that point, I would also go with near-line SAS drives, since they cost more or less the same as enterprise SATA drives. With one more tweak, I could reduce the number of data cables from 11 to 3.

The final tweak has to do with the thing I am least happy about the SANs as I built them. I put in a pair of consumer-grade Seagate Barracuda drives. Like all consumer-grade drives, quality is a crap shoot (perhaps Russian roulette would be a better analogy). Two of the four total drives went bad in the first couple of months of operation, and since the OS drives were not hot-swap, fixing them required shutting down the SAN to pull the failed drive and shutting it down again when the replacement drive arrived a couple of weeks later.

If I had used a hardware RAID controller instead of depending on software RAID, I could have stored the operating system on a small volume on the data drives. With software RAID, it’s only possible to boot to a mirror.

Alternatively, it turns out that the X9SCM motherboard includes an internal USB port. Although I haven’t found it yet, someone has to have posted a guide to minimizing disk writes on a Linux server to maximize flash drive life. With that guide and a cheap thumb drive, I could replace the OS drives. The thumb drive wouldn’t be redundant, but it should be possible to save and restore the OS volume without too much pain.

Which leads me to the final component of the storage subsystem: the controllers. I mentioned before that I used 4 onboard SATA ports with hot-swap enabled. The other 4 SATA ports came from a HighPoint RocketRAID 2640x4 acting as a non-RAID controller. I would not use this card again in a Linux system. I struggled to find a working driver, even trying to build my own. I finally had success with driver version 1.3 (built Dec 3 2010 09:50:48). The card is perfectly stable, but I spent a lot of time worrying that it would never work.

We must prepare for tomorrow night!

So my next SAN (assuming anyone ever lets me build one again) will use a case with SFF-8087 connectors on the hot-swap backplane, near-line SAS hard drives, and a SAS RAID controller with good Linux support. I’m guessing I would add about $400 to the cost of each SAN, mostly for the RAID controller. There would be some savings from eliminating the OS drives and the additional SATA controller.

I’m tempted by the Areca series of controllers, but I’m put off by the active cooling solution on their cards. Unless the fan uses ball bearings instead of the more common sleeve bearing, the least bit of dust will eventually cause the fan to seize and make the RAID chipset hotter than having no fan at all. This thread discusses some options. More likely, I would go with a controller that cools with a simple heat sink and depend on the case’s fans for cooling.

In the next part, I’ll talk about iSCSI Enterprise Target and DRBD.

Tuesday, November 1, 2011

Building a SAN for backup and remote replication, part 1

I've often said that any idiot can build a computer and a lot of idiots do. Likewise, it is remarkably easy to build a SAN from off-the-shelf parts and open-source software, but it’s much harder to build one that works well. This series documents what I learned – and the mistakes I made – while designing and building an inexpensive iSCSI SAN solution for backup and remote replication.

A client wanted to create a disaster recovery backup system and replicate it offsite. The client had too much data for an ad-hoc solution but was too small to afford the often breathtaking prices of replication solutions from vendors like EMC. They were already replicating some data using a pair of StoreVault S500’s, but they were flakey and difficult to manage.

I designed a pair of SANs that met the following requirements:

1) Least cost. I needed the lowest possible cost while meeting the system’s functional requirements.

2) Replication. I needed to replicate data from a local device to a remote device, over a slow and insecure Internet connection.

3) Data integrity. Loss of data should be extremely unlikely.

Explicitly absent from my list of requirements were:

1) High performance. This was a backup target used by a single computer. It did not need to be fast.

2) Maximum uptime. I actually ended up with a system that has good uptime, but it wasn’t something I focused on.

Some terminology

If you’re new to iSCSI, you’ll need to know some terms. An iSCSI target is the computer that holds the actual storage. This is the SAN. An iSCSI initiator is the computer that accesses the storage. The initiator pretends to have a SCSI controller card, the network pretends to be a SCSI cable, the target pretends to be one or more SCSI drives, and everything works great until the network fails.

Picking the software, or why OpenFiler sucks

Microsoft offers an iSCSI target, free with the purchase of Windows Server. Windows Server 2008 starts at around $800.

Linux and BSD offer iSCSI targets, free with the download of your favorite distribution. There are even a few distributions that include an iSCSI target built-in and ready to run. One such distribution is OpenFiler.

It has been about a year since I evaluated OpenFiler, so maybe things have changed since then. OpenFiler is a general-purpose Linux-based file server distribution. It has a web-based GUI. I found that it had issues:

  • The port for the GUI is 446, instead of the standard port 443. Why? This is a single-purpose server; I can’t imagine what other website they would expect to serve. Fortunately, a scholar and a gentleman by the name of Justin J. Novak published some simple commands to switch the GUI to port 443.
  • Whoever(s) assembled the GUI focused on functional groupings rather than use-case scenarios. To set up iSCSI I had to construct a RAID array, allocate an LVM volume, create an iSCSI target volume, set up initiator authentication, and set up target volume authentication – all on different tabs, sub-tabs, and sections. Oh, yes, this is how things were arranged in OpenFiler – main tabs along the top of the page, sub-tabs below them, and sections, not below the sub-tabs as sub-sub-tabs, but as menus along the left-hand side of the page. And sometimes little popup windows demonstrating some web developer’s l33t coding skillz. Navigating OpenFiler’s GUI was an exercise in confusion.
  • Fortunately, if you want to do anything the least bit out of the ordinary with OpenFiler, you need to resort to the command line. This includes replication, which was touted as a feature of OpenFiler but was completely unsupported by the GUI. In fact, to get replication to work, you needed to hack the Linux boot script (for Windows users, this is the glorified equivalent of autoexec.bat). I had to go even further and manually alter the order in which daemons loaded, since LVM kept taking control of my replication volume.
  • It was difficult to add packages to OpenFiler. I tried and failed to install various VPN packages, finally concluding that OpenFiler and/or rPath Linux (the base distribution) were overtly hostile to customization.

Eventually I entirely abandoned the GUI, doing everything from the command line. At that point I realized that there was no point in using OpenFiler at all.

Instead, I fell back on my favorite server distribution: CentOS. To be fair, it’s the only server distribution I use, but it works great. On top of CentOS, I installed iSCSI Enterprise Target (iet), Distributed Replicated Block Device (DRBD), and the not-so-free replication helper drbd-proxy. I’ll get to those, but in the next article I want to talk about hardware, which brings us back to the beginning.