A few weeks ago, I was talking to a meetup of SQL Server folks here in the Hartford area about storage and SQL Server. We were going through the ideal storage layout of SQL server, and someone in the group summarized it as "so I need a minimum of five disks (LUNs) for a SQL Server?" I'd never really thought of it in those terms, so let's get down to the thinking:
First, this has mostly to do with logical layout of data. Aside from separating logs and databases on different physical media, it has nothing to do with physical layout of data. Even if you have only a couple of disks to allocate to the server, you can still logically partition it so that it's easy to evaluate performance later on.
As with almost everything else related to SQL Server, it's going to depend on the workload. The key aspects are:
- Performance sensitivity - This is not necessarily a performance intensive workload – it could be nearly idle. The key question is "if performance of this application suffers, which business processes will suffer, and how many users will suffer?" If there's a chance that someone important is going to call you up and ask you to fix a performance problem with this SQL server, it's performance sensitive. Even if you're resource constrained, it would help to have things laid out so you can evaluate performance without reconfiguring the database (which in reality includes nearly as much effort as migrating to entirely new storage).
- Recoverability – This would be dictated by whether you'll need to be able to perform up to the minute recovery of the database. If you do, you'll need to consider the physical layout of the data so that the failure of a physical disk or RAID group doesn't take out the database and its transaction logs at the same time.
Here's a quick version of the rules:
-
If your database is neither performance sensitive and you do not need the ability to recover data up to the last transaction, then you only need one or two:
- OS/Apps
- Databases and transaction logs
-
If the data in your database is critical enough that you'll want to be able to recover up to the latest transaction, you'll need three.
- OS
- Databases
- Transaction logs (you also need to make sure that these are physically separated from the databases, so that you don't lose data from both of them at the same time).
-
If your data is performance sensitive (regardless of whether the data is sensitive), you need a minimum of five, and possibly more:
- OS
- System databases (other than tempdb)
- User databases
- User database transaction logs
- One for tempdb and its logs
The principle behind this separation is the performance evaluation of these components independently of each other. If I have my user databases mixed with tempdb, and I'm having performance issues, I have no real way of telling which database is presenting the IO, and which database is starving. All I know is that performance stinks equally on both databases. More importantly for transaction logs, you want to both avoid contention between the IO that log flushes create and normal user IO, and you also want to make sure that transaction log disks get a write response time of well less than 10 milliseconds.
These five LUNs are a starting point – I often see systems with a dozen or even a couple dozen LUNs. What's the reason for adding LUNs above this five?
- Additional segregation of the unrelated workloads. I can put two different databases on two different LUNs.
- Segregation of related, but different workloads. For example, I could put my non-clustered indexes on different disk than my data files.
- Simply adding queues – each disk gets an additional queue
- Increase the granularity of restore options if you're using hardware-based snapshots, clones, or CDP. In this case, the management boundary is the LUN itself, so if you want to do rapid restore of a failed or corrupt data file using this method, then you would restore all the databases on that LUN together (whether they are corrupt or not). There are ways to do selective restore in this case, but it can take longer to perform the restore.
Remember that multiple LUNs can actually share the same physical disks, so your workloads can still step on each other if that's the case. However, it makes the troubleshooting process much easier. And if you're using the right technology, fixing the problems can be completely seamless.