Bet you though this blog was all about messaging and Exchange. Well, it’s not really. Seriously. I’m not kidding. I do think about other stuff.
Ever since I’ve started working with Enterprise Flash Drives (EFD), people have asked me how they can identify which databases are the best candidates for EFD. The thinking is clear: EFD’s are great, but they are more expensive than other more conventional Fibre Channel/SAS drives. So they understandably want to make sure they’re getting the best bang for the buck.
Now, there are some very cool advanced technologies around extending array cache with EFD (like EMC’s FASTCache and NetApp’s FlashCache – then there’s the old tried and true “Use gobs of traditional cache” approach). There are also some technologies that can automatically tier data for you, (such as EMC’s FAST and Dell’s Data Progression). Both of these techniques rely on the frequency of use, which makes sense. Put the data that the server is requesting most frequently on the fastest storage.
This is great Ron Popeil-style “Set it and forget it” storage tiering. But it’s still prudent to do some analysis to make sure that you’ve got enough of the EFD, and on the flip side you want to make sure that you don’t buy too much. Now if you’re already an EMC customer with our high end or mid range storage products, we can do some analysis of the data to determine the optimal amount of EFD for your workloads. But even that won’t take into account specific workloads that you want to accelerate. And of course if you’re using something that can’t tier for you automatically, then you’re stuck with the manual method anyway.
This post is about looking at the problem from a SQL server perspective. There are three techniques I use, and they complement each other. Sometimes I use all three, sometimes only one or two. All of them use a measure of activity divided by the size of a dataset. To coin a term, I'll call this "I/O density"). The relative ratio is what you’re looking for to identify the EFD candidates.
-
Perfmon data (LogicalDisk)
With this type of analysis, you take the GB used on each logical drive and divide that by the Reads/Sec and Writes/Sec from LogicalDisk. You might get something like this:
What’s good about this approach is that you get to see data over time, so you can see how spiky the workload is. You can also evaluate read/write mix and IO size, both of which are factors in the decision. The drawback is that it doesn’t have the resolution that the other techniques use. This is something even a rudimentary automated tiering technology could handle for you, namely “This LUN is busy. Let’s move it up a tier.” If you have multiple databases per LUN, or multiple files per LUN, you can’t tell which is driving the IO, so you could be moving more data than you need. Also, you need more than just perfmon data – you need the logical disk sizes. Perfcollect will capture all the necessary metrics in a nice little package for you. It’s up to you to extract the data to CSV and analyze it (I used Excel here).
-
Perfmon data (SQL Counters)
This involves taking a key counter for your database (<instancename>:Databases\Data File(s) Size (KB)) and dividing it by another counter for the database (<instancename>:Databases\Transactions/sec). (Perfcollect will grab this data for you as well). You might get something like this:
Again with this, you get timeseries data. This case you see a more or less steady green line plugging along all through the sample. In this case, it’s not a critical data set and I don’t need snappy response times for it, so I’ve already done something automated tiering cannot. The time series aspect of this is cool, because I can see when people start logging in about 60% of the way through the sample, and I clearly identify the database I’d want to move. The drawbacks are that (1) it identifies only transactions – not transactions that result in storage IO. If you’ve got enough server-side cache, many transactions won’t ever hit disk, (2) it doesn’t distinguish between read and write transactions or the size of the resulting IO, both of which are key metrics for evaluating the applicability of EFD, and most importantly (3) it operates at the database level. If you have multiple files in a user database, in many cases you’ll have one file busier than the others (check out the next method for an example). Of course you may know which files are busiest (for example, if you’ve partitioned your database with tiering in mind), but this doesn’t tell you if you’re in for a surprise.
-
fn_VirtualFileStats
This involves querying fn_virtualfilestats, so you need rights to log into the database.
Here’s the getSQLDBstats script. Just log into a SQL server with management tools and run it like this:
For a default instance, run:
sqlcmd –S [hostname] -i getDBstats.sql –o [output file.csv] -s”,”
For a named instance, run
sqlcmd –S [hostname]\[instancename] -i getDBstats.sql –o [outputfile] -s","
You’ll get a file that looks a little like this after you’ve formatted it in Excel.
Now that you’ve got the data, you can do some interesting things with it. Primarily, you’ll be taking FileSize and dividing it by NumberReads and NumberWrites. Eliminate the log files and you end up with something like this:
The advantages here are numerous. First, you have file-level resolution, so if there are busier files in a given database, you can avoid moving the “slacker” files. In this case, the files to the left are tempdb, which is very evenly distributed (as you'd expect). The spiky files are user databases, so you can imagine that there's some sort of partitioning in use here, or that they've added files over time. Second, you can evaluate read/write ratio and if you do some more math with BytesRead and BytesWritten, you can get get an idea of average IO size, both of which are useful data points for evaluating the bang for the buck you’ll get out of EFD. Third, it’s quick. Instead of grabbing data over a day, week or month, you can get this data in a matter of seconds, and with an Excel macro, you can have the analysis done just as quickly. The drawback is that it’s not timeseries data – this is aggregated over time since the moment the database was attached. So if you have event-based activities like end of day/month batch processing, that could significantly skew the analysis. So I always like to look at it in conjunction with the perfmon time series data to ensure the analysis is correct.
Perhaps more important, I believe that the methods used in this analysis of the Voynich mystery can be applied to difficult questions in other areas.
Posted by: sewing machine reviews | December 12, 2011 at 02:24 AM