Monday, December 12, 2011

Building a SAN for backup and remote replication, part 3

In part 2, I laid out the hardware for my SAN and recommended that no one follow my example. In this part, I want to talk about preparing the operating system. I lied about covering iSCSI Enterprise Target and DRBD.

Since we need to differentiate between the two SANs, I will use “local device” to refer to the SAN that will go on the local network and “remote device” to refer to the remote replication target.

Installing the base system mostly consists of stripping out things that you don’t need and disabling what you can’t get rid of. I’m a big fan of turning off unneeded services as a way to increase security; I just wish that CentOS felt the same way.

I wrote these instructions as part of formal documentation for the poor guy (or girl) who comes after me who has to maintain this system and fix it when it breaks. Despite my efforts, the documentation was out of date by the time the system went live. I’ve included some notes in italics where I don’t have exact instructions since I changed the configuration on the fly.

Finally, an apology about this post's formatting. I wish the default Blogger.com tools were a bit better; as it is, everything but bold and italic markups disappear when I publish the post.

Collect settings

There are a handful of settings you’ll want to know before you install the software. Here is the list I came up with:

local-fqdn The fully-qualified domain name of the local device. This is used by DRBD to decide what part of its configuration file applies to which computer. This does not necessarily need to match the domain name of the local SAN as resolved by DNS.
local-ip The IP address of the local SAN within the office network. This is probably a private IP address.
remote-temp-ip The temporary IP address of the remote SAN while it is being built and tested within the office network.
local-gateway The gateway address of the office network.
local-subnet The IP subnet of the office network.
local-netmask The IP subnet mask of the office network in CIDR notation
local-public-ip The public IP address of the office gateway.
remote-fqdn The fully-qualified domain name of the remote SAN. This is used by DRBD to decide what part of its configuration file applies to which computer. This does not necessarily need to match the domain name of the remote SAN as resolved by DNS.
remote-ip The IP address of the remote SAN. This is a public IP address
remote-gateway The gateway address used by the remote SAN.
remote-subnet The IP subnet to which the remote device belongs.
remote-netmask The IP subnet mask of the office network in CIDR notation.
third-party-ip A public IP address of a third party that will manage the remote SAN.
ini-user The user name given by the office server that will use the iSCSI volume.
ini-password The password given by the office server that will use the iSCSI volume. Due to various operating system restrictions, this should be exactly 12 characters long.
iscsi-qualified-name The SAN IQN name, shared by both SANs.
admin-email The email address of the SAN administrator, for event notifications
host-email The email address of each SAN, which may not necessarily correspond to a real mailbox
mail-server The name of your email server or mail exchanger

Install the operating system

The CentOS installer can use either text or graphics mode. The graphics mode installer may have problems with some Intel onboard video controllers. These instructions follow a text-mode installation, and use an FTP site as the install source. You can use arrow keys to move among the fields and options in the text-mode interface. Press the spacebar to select check boxes and radio buttons.

Boot to the CentOS 5.6 install disc. At the boot prompt, enter

linux text

Choose your language and keyboard layout. In the installation method dialog, choose FTP. Under Enable IPv4 support, select Manual Configuration and deselect Enable IPv6 support. Enter the local IP address if this is the local SAN, or the temporary remote IP address if this is the remote SAN. Enter the local netmask, gateway, and DNS server. Enter the name of a server that mirrors the CentOS distribution and the FTP directory of the installation files, which should end with /os/i386 or /os/x86_64.

I can't think of a good reason to use a 32-bit operating system, especially if you are using new hardware.

Select text mode instead of VNC.

If this is the first time the hard disks have been used, the installer will prompt to initialize each empty disk. Select Yes for each disk on which you will install the operating system. Select No for the data disks.

Choose Create custom layout.

These instructions configure the device to use the swap volume on each OS drive as a separate volume. This is not very bright, since it meant that the OS could crash if one of the OS drives drops out. The right way to do this is to configure the physical swap partitions as software RAID volumes, configure a mirror on top of them, and format that mirror as a swap volume.

Select New. Change the file system type to swap. For allowable drives, choose just the first operating system drive. Set the size to half the size of physical RAM. Select OK. Select New. Change the file system type to software RAID. For allowable drives, choose just the first operating system drive. Select Fill all available space. Select OK.

Create the same two partitions on the second operating system drive.

Select RAID. Set the mount point to /. Leave the file system type as ext3. Set the RAID level to RAID1. Select OK.

Accept the defaults in each Boot Loader Configuration dialog. When prompted for the location of the boot loader, choose the first operating system hard drive (probably /dev/sda) rather than the RAID volume (/dev/md0).

Choose to configure network interface eth0. The default settings should be to activate the interface on boot and enable IPv4. Verify the IP, netmask, gateway and DNS addresses. In the Hostname Configuration dialog, enter the fully qualified domain name of the system.

In Time Zone Selection, select the correct time zone or geographical area. Most likely the system does not use UTC, so deselect that option.

Enter the root password. For ease of maintenance, this should be the same for both devices.

In the Package selection dialog, deselect each package.

Keep choosing Next until the installer starts. The installer should format the root volume and install the operating system. When the installer reports that the installation is complete, click Reboot. Remove the installer CD and wait for the system to reboot.

Disable SELinux

On the first boot, CentOS will display a Setup Agent dialog. Choose Firewall configuration. Under SELinux, choose Disabled. The Setup Agent will disappear if you don’t touch any keys after a couple of minutes - you can get it back by typing:

firstboot

Remove unneeded packages and update the system

At this point, you can remotely connect to the system with SSH. Log in as root. At the shell prompt, enter the following to remove unneeded packages:

yum -y erase fetchmail NetworkManager bluez* ccid desktop-file-utils
yum -y erase dnsmasq ifd-egate irda-utils isdn4k-utils mutt pcmciautils
yum -y erase slrn talk wpa_supplicant yp*

To update the system, enter the following:

yum -y update
reboot

Disable unneeded services

Disable unneeded services, including gpm, netfs, nfslock, pcscd, portmap, rpcgssd, rpcidmapd, and rpcsvcgssd. Use the following one-line command:

for f in gpm netfs nfslock portmap rpcgssd rpcidmapd rpcsvcgssd; do chkconfig --del $f; service $f stop; done

Configure the firewall

Replace the firewall configuration in /etc/sysconfig/iptables with the following configuration. Fill in the IP addresses for the local and remote private networks and the local and remote public IP addresses.

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:AllowedServices - [0:0]
:AllowedHosts - [0:0]

# Always allow traffic on loopback, filter all other incoming
-A INPUT -i lo -j ACCEPT
-A INPUT -j AllowedServices
-A FORWARD -j AllowedServices

## List of services to allow
# Existing connections
-A AllowedServices -m state --state ESTABLISHED,RELATED -j ACCEPT
# NTP
-A AllowedServices -p udp -m udp --dport 123 -j AllowedHosts
# SSH, iSCSI, DRBD Proxy, DRBD
-A AllowedServices -m state --state NEW -p tcp -m multiport --dports 22,3260,7788,7789 -j AllowedHosts
# ICMP
-A AllowedServices -p icmp --icmp-type any -j AllowedHosts
# Reject everything else
-A AllowedServices -j DROP

## List of hosts to allow
# The local private network
-A AllowedHosts -s local-subnet/local-netmask -j ACCEPT
# The local device's public IP
-A AllowedHosts -s local-public-ip -j ACCEPT
# The remote device’s IP
-A AllowedHosts -s remote-ip -j ACCEPT
# Optional: allow some third party to administer the devices
-A AllowedHosts -s third-party-ip -j ACCEPT
# Reject everyone else
-A AllowedHosts -j DROP

COMMIT

Note that this firewall works for both the local and remote SANs. This makes recovery easier if you have to bring the remote SAN into the local network.

Install NTP

It is not critical to synchronize time on the NTP server, but it is useful. Run the following commands to install and configure NTP:

yum -y install ntp
chkconfig ntpd on
service ntpd start

Configure Logwatch

Install the sendmail configuration compiler.

yum -y install sendmail-cf

Edit /etc/mail/sendmail.mc. Find the line

dnl define(`SMART_HOST', `smtp.your.provider')dnl

Replace it with:

define(`SMART_HOST', `mail-server')

Note that the dnl phrases are removed from the beginning and ending of the line, effectively uncommenting it. Save the file and run the following commands to compile the configuration and enable the sendmail service:

m4 /etc/mail/sendmail.mc > /etc/mail/sendmail.cf
chkconfig --level 235 sendmail on
service sendmail start

Edit /etc/logwatch/conf/logwatch.conf and add the following lines:

MailTo=admin-email
MailFrom=host-email

Limited possibilities

You know how every high school valedictorian speech includes a phrase along the lines of “unlimited possibilities?” I have found that the best way to get things done is to limit the possibilities to exactly what you need. If you are designing a SAN, forget about also making it a NAS, and a web server, and a French fry chopper. And if you’re just going to use it for backup, don’t bother with LVM. Anything beyond what you actually need is a distraction.

In the next part, I promise to get to iSCSI Enterprise Target and DRBD.

Wednesday, November 9, 2011

Building a SAN for backup and remote replication, part 2

In part 1, I complained about OpenFiler. In this part, I want to talk about hardware.

This is a single SAN as I built it. (Remember that you need 2 identical SANs for replication.)

SuperMicro 825TQ-R700LPB 2U rack-mount case with 700W redundant power supply
• 8 Seagate Constellation ES ST32000644NS 2TB 7200RPM SATA 3.0Gb/s 3.5" “enterprise” hard drive
• 2 Seagate Barracuda ST3160316AS 160GB 7200RPM SATA 6.0Gb/s 3.5" internal hard drives
SuperMicro X9SCM Micro-ATX motherboard
Intel Core i3-2100 dual-core processor
• Crucial 8GB (2 x 4GB) 240-Pin DDR3 SDRAM ECC Unbuffered DDR3 1333 memory (model CT2KIT51272BA1339)
HighPoint RocketRAID 2640X4 PCI-Express x4 SATA / SAS (Serial Attached SCSI) controller card
Startech PEXSATA22I PCI-Express SATA controller card

Each SAN cost $3000, so the pair was $6000 (plus software, discussed in another post).

If you want to use a SuperMicro case, get a SuperMicro motherboard. The cases often have a proprietary front panel connector. The 825TQ comes with 8 hot-swap SATA/SAS bays, two internal bays for a pair of 3.5” drives, and a built-in slim DVD-ROM drive. It’s a good case but I did have a couple of nitpicks. The hot-swap cages are a bit flimsy: every time I pull out a cage I feel like I’m going to break the handle. And – not that I was likely to use them anyhow – the SGPIO cables were incredibly short, failing to reach from the hot-swap backplane to the HighPoint controller card.

The motherboard itself included the most feature-filled BIOS I have ever seen on a Micro-ATX board. The BIOS is UEFI, onboard SATA ports can be configured as hot-swap, the text-mode display can be echoed to a serial port, and each of the onboard network adapters can act as an iSCSI host-bus adapter. Given more time, I would have loved to play with that last feature.

The processor is the cheapest Sandy Bridge available. SANs don’t need a lot of raw processing power.

The motherboard supports ECC memory, so that’s what I used. I get a little uneasy at the idea of tens of billlions of extremely transient memory bits with no error correction. If I had my way, every computer with more than 4GB of RAM would include ECC.

The motherboard also has dual Intel gigabit NICs. (Broadcom and Realtek NICs are popular low-cost alternatives – just say no.)

Storage subsystem, or why the simplest thing that could possibly work can get complicated fast

The star of any SAN is the storage subsystem, and this is where I could have done better.

I opted for Seagate Constellation ES drives. While Seagate says that the Constellation series are “enterprise” drives, in reality they are the minimum drives that you should accept in a server room. The SAS version of this series is what is known as “near-line SAS”, which is SATA guts combined with a SAS interface. Real SAS drives like Seagate’s Cheetah series have faster rotational speeds and are (supposedly) built to tighter specifications with greater reliability guarantees.

Since high performance or uptime is not a primary concern, the Constellation ES is acceptable. I have had a single drive out of the 16 drop out, and it started working again when I pulled and reattached it.

Here is an image of one of the SANs, with 11 SATA cables (8 data drives, 2 boot drives, and a DVD drive):


Messy, isn’t it? The amazing thing is how reliable it’s been.

There are 6 hard drives attached to the motherboard and another 4 to the HighPoint controller. I relegated the DVD drive to its own little controller. If I were to do this again, I would find a case with a backplane that supported SFF-8087 connectors and a compatible SAS controller. At that point, I would also go with near-line SAS drives, since they cost more or less the same as enterprise SATA drives. With one more tweak, I could reduce the number of data cables from 11 to 3.

The final tweak has to do with the thing I am least happy about the SANs as I built them. I put in a pair of consumer-grade Seagate Barracuda drives. Like all consumer-grade drives, quality is a crap shoot (perhaps Russian roulette would be a better analogy). Two of the four total drives went bad in the first couple of months of operation, and since the OS drives were not hot-swap, fixing them required shutting down the SAN to pull the failed drive and shutting it down again when the replacement drive arrived a couple of weeks later.

If I had used a hardware RAID controller instead of depending on software RAID, I could have stored the operating system on a small volume on the data drives. With software RAID, it’s only possible to boot to a mirror.

Alternatively, it turns out that the X9SCM motherboard includes an internal USB port. Although I haven’t found it yet, someone has to have posted a guide to minimizing disk writes on a Linux server to maximize flash drive life. With that guide and a cheap thumb drive, I could replace the OS drives. The thumb drive wouldn’t be redundant, but it should be possible to save and restore the OS volume without too much pain.

Which leads me to the final component of the storage subsystem: the controllers. I mentioned before that I used 4 onboard SATA ports with hot-swap enabled. The other 4 SATA ports came from a HighPoint RocketRAID 2640x4 acting as a non-RAID controller. I would not use this card again in a Linux system. I struggled to find a working driver, even trying to build my own. I finally had success with driver version 1.3 (built Dec 3 2010 09:50:48). The card is perfectly stable, but I spent a lot of time worrying that it would never work.

We must prepare for tomorrow night!

So my next SAN (assuming anyone ever lets me build one again) will use a case with SFF-8087 connectors on the hot-swap backplane, near-line SAS hard drives, and a SAS RAID controller with good Linux support. I’m guessing I would add about $400 to the cost of each SAN, mostly for the RAID controller. There would be some savings from eliminating the OS drives and the additional SATA controller.

I’m tempted by the Areca series of controllers, but I’m put off by the active cooling solution on their cards. Unless the fan uses ball bearings instead of the more common sleeve bearing, the least bit of dust will eventually cause the fan to seize and make the RAID chipset hotter than having no fan at all. This thread discusses some options. More likely, I would go with a controller that cools with a simple heat sink and depend on the case’s fans for cooling.

In the next part, I’ll talk about iSCSI Enterprise Target and DRBD.

Tuesday, November 1, 2011

Building a SAN for backup and remote replication, part 1

I've often said that any idiot can build a computer and a lot of idiots do. Likewise, it is remarkably easy to build a SAN from off-the-shelf parts and open-source software, but it’s much harder to build one that works well. This series documents what I learned – and the mistakes I made – while designing and building an inexpensive iSCSI SAN solution for backup and remote replication.

A client wanted to create a disaster recovery backup system and replicate it offsite. The client had too much data for an ad-hoc solution but was too small to afford the often breathtaking prices of replication solutions from vendors like EMC. They were already replicating some data using a pair of StoreVault S500’s, but they were flakey and difficult to manage.

I designed a pair of SANs that met the following requirements:

1) Least cost. I needed the lowest possible cost while meeting the system’s functional requirements.

2) Replication. I needed to replicate data from a local device to a remote device, over a slow and insecure Internet connection.

3) Data integrity. Loss of data should be extremely unlikely.

Explicitly absent from my list of requirements were:

1) High performance. This was a backup target used by a single computer. It did not need to be fast.

2) Maximum uptime. I actually ended up with a system that has good uptime, but it wasn’t something I focused on.

Some terminology

If you’re new to iSCSI, you’ll need to know some terms. An iSCSI target is the computer that holds the actual storage. This is the SAN. An iSCSI initiator is the computer that accesses the storage. The initiator pretends to have a SCSI controller card, the network pretends to be a SCSI cable, the target pretends to be one or more SCSI drives, and everything works great until the network fails.

Picking the software, or why OpenFiler sucks

Microsoft offers an iSCSI target, free with the purchase of Windows Server. Windows Server 2008 starts at around $800.

Linux and BSD offer iSCSI targets, free with the download of your favorite distribution. There are even a few distributions that include an iSCSI target built-in and ready to run. One such distribution is OpenFiler.

It has been about a year since I evaluated OpenFiler, so maybe things have changed since then. OpenFiler is a general-purpose Linux-based file server distribution. It has a web-based GUI. I found that it had issues:

  • The port for the GUI is 446, instead of the standard port 443. Why? This is a single-purpose server; I can’t imagine what other website they would expect to serve. Fortunately, a scholar and a gentleman by the name of Justin J. Novak published some simple commands to switch the GUI to port 443.
  • Whoever(s) assembled the GUI focused on functional groupings rather than use-case scenarios. To set up iSCSI I had to construct a RAID array, allocate an LVM volume, create an iSCSI target volume, set up initiator authentication, and set up target volume authentication – all on different tabs, sub-tabs, and sections. Oh, yes, this is how things were arranged in OpenFiler – main tabs along the top of the page, sub-tabs below them, and sections, not below the sub-tabs as sub-sub-tabs, but as menus along the left-hand side of the page. And sometimes little popup windows demonstrating some web developer’s l33t coding skillz. Navigating OpenFiler’s GUI was an exercise in confusion.
  • Fortunately, if you want to do anything the least bit out of the ordinary with OpenFiler, you need to resort to the command line. This includes replication, which was touted as a feature of OpenFiler but was completely unsupported by the GUI. In fact, to get replication to work, you needed to hack the Linux boot script (for Windows users, this is the glorified equivalent of autoexec.bat). I had to go even further and manually alter the order in which daemons loaded, since LVM kept taking control of my replication volume.
  • It was difficult to add packages to OpenFiler. I tried and failed to install various VPN packages, finally concluding that OpenFiler and/or rPath Linux (the base distribution) were overtly hostile to customization.

Eventually I entirely abandoned the GUI, doing everything from the command line. At that point I realized that there was no point in using OpenFiler at all.

Instead, I fell back on my favorite server distribution: CentOS. To be fair, it’s the only server distribution I use, but it works great. On top of CentOS, I installed iSCSI Enterprise Target (iet), Distributed Replicated Block Device (DRBD), and the not-so-free replication helper drbd-proxy. I’ll get to those, but in the next article I want to talk about hardware, which brings us back to the beginning.