Friday, February 8, 2008

Recovering from a corrupt Exchange data store (the easy way)

Say you have an Exchange server go down and, for some reason, you don't have backups of the mail store. Say further that you can't remount the store. There are many articles from Microsoft and others on magic tricks you can do to fix a glocked store, but it turns out that there's a very easy method that should work nicely with smaller organizations.

In this scenario, Exchange has been completely destroyed and must be rebuilt from scratch.

By default, Outlook uses cached mode for Exchange accounts. All of each user's data is stored on their local computer, in (hopefully) perfect condition. With Exchange offline, go to each user's workstation, fire up Outlook, and export the user's mailbox to a PST. Then bring up Exchange and mount an empty private data store. For each user, delete and recreate their email profile, connect to their Exchange account, and use Outlook to re-import their data from the aforementioned PST. Email has now been recovered.

Obviously this doesn't work for public data stores. But most people are just interested in their emails, contacts, and calendar, so taking care of those will give you breathing room to work on fixing the public store.

As to how I came to offer this tip, I don't want to talk about it. :)

Friday, February 1, 2008

Replication on a StoreVault S500

I purchased a pair of NetApp StoreVault S500 SAN units for a client recently. The client has a fairly large volume of data for a small business - about a terabyte's worth of images, which they expect to double in a year. One of the things they wanted to do is make an offsite backup of the data. I tried for about two years to do ad-hoc backup over the Internet, but it never worked well. NetApp offers closely-integrated replication as part of the StoreVault package, and the price is about half what you'd pay for EMC/Dell, so we went for it.

Detail the first: The S500 comes with a dozen 500GB hard drives, giving you 6 trillion bytes of raw storage. That's 5.457 terabytes. (If you're replicating the entire unit, you need a second S500, giving a total of 12,000,000,000,000 raw bytes, but moving on...) It uses a dual-parity variant of RAID 4, which eats a trillion bytes. That's now 4.547 TB. (I chose not to reserve a hot spare given the fact that the entire system is going to be replicated.) It uses a checksum scheme that stores parity on every ninth sector, which uses up, um, 1/9th of the storage. That's now 4.04TB. NetApp packs breakfast food into their file system, called WAFL, which apparently reduces fragmentation and has 10% overhead. That's now 3.64TB. One of the big features of StoreVaults is file system snapshots, which are a prerequisite for replication. Snapshot overhead is supposedly variable, but I haven't worked out how to use anything other than the default 20%. That's now 2.91TB. Oh, there's about 5% overhead for managing RAID. That's now 2.76TB.

All of this is documented in a hard-to-find whitepaper (I had to log into the reseller site to get it). What isn't documented is the massive 700GB or so you have to set aside in order to successfully set up replication. Why? Just in case you need the room, as far as I can tell. Actual space that you can see from an initiator is about 2TB. That's two extremely well-protected, massively redundant terabytes.

By way of disclaimer, I'm bad at math, but not as bad as NetApp's marketers.

Detail the second: The StoreVault Manager software is a "simplified" user interface for the StoreVault. By simplified, NetApp means buggy and unresponsive. The UI appears to be based on some kind of Web front-end pretending to be a Windows program, so there are little oddities such as not being able to click on things that should be clickable, and response times measured in seconds for almost every action. I don't know why anyone thinks that HTML / JavaScript / Ajax / whatever make good general-purpose UI's. As far as the simplification goes, I found myself repeatedly telnetting into the unit to fix problems that I couldn't resolve with the UI.

Detail the third: Initial configuration is easy, as is setting up LUNs. It takes a little fiddling to associate specific LUNs with specific initiators, but it's fairly well documented. Here is what isn't documented: in order to successfully set up LUN replication, your DNS configuration needs to be perfect. If there is a configuration problem, StoreVault Manager will offer highly detailed error messages like, "Permission denied."

When you set up a replica (and presumably whenever a replication occurs) the target unit verifies that the source unit is who it says it is by performing a reverse DNS lookup. If the source unit's PTR record doesn't match, the replication fails. Fair enough. Weirdly, the PC on which StoreVault Manager is running must be able to resolve the target unit's unqualified host name in order to perform any operations on a replica once it has been established. That could mean editing the PC's hosts file.

Oh, yes, StoreVault tech support loves to edit the /etc/hosts files on the S500's.

Detail the fourth: If you ever have to reset the StoreVault to factory defaults, be aware that you may need to manually delete the LUNs. Oh, and have a null modem cable handy, since in my case the StoreVault wasn't accessible over the network until I applied some initial settings through the serial console. Port settings are 9600-8N1-None, since that isn't documented either.

Despite the above, the StoreVault S500 seems to work.