OK, so maybe Perfmon Counter of the Week was a little optimistic. Let’s say month, OK?
This one is another disk-related counter, but I like it because there’s so much myth around it. It seems anytime I see a document around disk performance as it related to an application, I see references to “disk queue length”. And it is either really vague, like “monitor the disk queue length – if it’s high, you may have problems with the disk subsystem,” or it’s really specific like “if your disk queue length is consistently higher than 2, you may have a problem with your disk”.
And neither piece of advice is quite wrong, nor is it quite right.
Here’s the thing with disk queue length, and any other queue-oriented counter: Whether a queue length is “bad” or “good” depends on the number of resources servicing the queue.
Imagine you’re in line at the grocery store. There’s a single cashier, and there are nine people in front of you. That’s a queue length of ten, and you’ll start wondering whether you really need the quinoa your wife told you to get. Now imagine that there’s a single line with nine people in front of you, but there are 5 cashiers. You still have a queue length of ten, but you’re going to be through it in a lot less time. In fact, it’s better than having one person in front of you if there’s a single cashier, because that dude in front of you might have 63 coupons and a checkbook. If there are other cashiers, he’s only dominating one of multiple resources.
Unfortunately, that doesn’t make the quinoa look any more appetizing.
So does that mean that disk queue length is useless? Not at all. If you know how many resources there are servicing the LUN, then you can see whether it’s really a problem or not. Typically the rule of thumb is 2 per disk. So a queue length of 6 may indicate a bottleneck if you only have 2 disks in the RAID group servicing the LUN. But it’s really not a problem if you have a 10-disk R1/0 RAID group. The same goes for processor queue length.
The other aspect in which it is interesting is trending. I’m a huge fan of measuring performance stats when application performance is good so that you have something to compare it to when things go downhill. So it’s useful as a comparative value as well. Even if you don’t have performance data from the “good times”, you can still visualize it with a time-series chart.