What is a SAN and why do I care?

OR  Wine, Dogs and Winter in Minnesota

I had intended my first blog post to be a description of how I worked with a great storage team to pinpoint an i/o bottleneck on our SAN,using
Glenn Berry’s queries.

However, I realized that until a few years ago, I had no idea what a SAN is, so I figure I’ll start with an explanation.  (Caveat:  I am by no means a storage expert, I’ve just been privileged enough to work with some patient storage teams who tried as hard as they could to impart information to the confused DBA)

Many DBAs still think of their databases as sitting on something similar to a desktop computer (hence the still oft used reference to a “box”).

Once upon a time, SQL databases really did sit on “boxes.”  Inside, there were multiple drives, and each drive had a letter.  These days, though, there is frequently a much more complex hardware structure.

At its very basic level, a server has memory, CPU, storage (represented by drive letters) and something to connect all those.  Requests are issued to all of these things when a user is interacting with the system.

That concept is still there, we’re just dealing with much more abstraction.

Since I live in a place with very snowy and cold winters, work with dogs and love wine, I came up with the following analogy:

  • Me = CPU and Memory (server)
  • Wine Craving = Request (or i/o)
  • Sled = Drive Letter (space)
  • Dog = Hard Drive
  • Hitch from the sled to the dog = stuff linking it all (this depends on your SAN vendor – I’ll talk about this in a MUCH later post)

*Note:  the disk letter and the actual hard drive are separate entities

Now, suppose it is really cold outside, and there are many feet of snow.  If I’m out of wine, but I really want a bottle, I might risk life and limb and drive to my favorite wine store, Cellars. However, I decide to hitch up my dog to a sled and mush on over.

Simple, right?  If I need to get more than one bottle of wine, 15 years ago I would just get a bigger dog.

However, in recent years, there is a lot more data and a lot more requests.  One way to accommodate this is to have a SAN (“Storage Area Network”), where space can be more easily added to servers.

For instance, if I want an entire CASE of wine, my sled may not be big enough. I can now ask the SAN team to make my sled bigger (as soon as I justify my growing desire for booze).

Because the sled (drive letter) is bigger, my one little dog (actual drive) will not be able to haul it.  Instead of ordering a new dog, waiting for him to be shipped and then hitching him up, an existing sled is expanded and more dogs are added. Now we can get all of that wine home with no trouble at all, and/or the developers can store millions of xml files in the database.

Here is where understanding the difference between the drive letter (sled) and the hard drive (dog) becomes important.

The reason the storage team can accommodate me so quickly is they have many, many dogs.  These are called “spindles”, and correspond in no way to any particular server or drive letter at all until they are assigned.

In order to manage things effectively, the SAN admin will divide these into kennels (disk groups) when the SAN is being built*.  The more spindles(dogs) in a disk group, the faster the group and the more space it has to allocate to drives (or sleds).

The drive being expanded is just an abstraction called a “LUN.”  It is used only because we need to differentiate between sleds (i.e., drive E:\ and drive F:\) The actual work is being done by the kennel (and the dogs within it).

As far as i/o (cravings) go, dogs in each kennel all work together.

So, say we have two kennels.  If we put 5 dogs in 1 kennel, and 10 dogs in another, the actual capacity each kennel can handle is slightly different.  The larger kennel can handle more bottles of wine and cravings than the smaller kennel.

Suppose I need another sled, and this sled needs to be faster than the other – for instance, tempdb.  I would ask the SAN admins to give me another drive letter (LUN), and use dogs from the larger kennel so they can spread things around and get that case of wine home faster than the other.

If I also want some cheese, I might want another musher (server).  Perhaps cheese requires more memory, or faster reactions (CPU), than wine.  After all, the cravings (or requests) may be entirely different.

There are some basic rules:

  1. Once you allocate a dog to a sled (LUN) you can’t take him back.  He’s permanently attached to that sled.  You can expand a disk but you can’t shrink it.
  2. If you ask for a new sled, you CAN return it, along with all the dogs attached, to the kennel.
  3. SAN admins can move sleds from one server to another, without ever spilling any of the wine.  This is pretty nifty, because if I need to restore a backup of a database and it is so big robocopy would take forever, the admin can just detach that LUN from my server, and reattach it to another.
  4. Sleds can’t span kennels (in other words, a specific LUN comes from only one disk group).
  5.  Dogs can’t move between kennels.  The disk groups are defined when the SAN is built.
  6. Once there are no more dogs, it takes a REALLY, REALLY long time to add more.  In fact, prohibitively long (imagine all the fighting a group of dogs with new members would have…things would have to “level” before you could use the new dogs).

For now, I am leaving the hitch alone.  It would most likely require an entirely different analogy.  (although I would try really hard to involve dogs and wine in some way).  I will, however, note that there are many redundancies built into it to ensure dogs, sleds and mushers don’t incur accidents, lose wine or get injured. The hitch is frequently referred to as the “fabric”.  Different SAN vendors use different technologies to link all these things together.  It requires a LOT of specialized training and very smart people to be able to configure the fabric.

In my next post, I’ll address some common points of friction between the storage team and the DBA, as well as look into a few standard practices that are not always applicable when your database is on a SAN.

*key information on why this matters will be in next post