So far, all my blog posts have been sort of event-driven. Something I read or hear about, perhaps something I’m doing for a customer tickles my fancy and I start to think about a blog post. That’s probably why my blog went dark for a year and a half. A few months ago my colleague Chuck Chiambalero suggested a “Perfmon Counter of the Week”. I think really slowly, so I’m getting cracking on it now.
About as slow as this guy
Each week, I’ll go over a perfmon counter or two that I regularly use when I’m sizing out a customer’s environment, or evaluating performance bottlenecks they may be having. All of the counters I’ll be covering are collected in perfcollect (of course only if they’re available on the host – a SQL Server counter isn’t going to be collected if you run perfcollect on an Exchange server).
Anyway, Perfcollect looks for about 600 different counters for various applications, so that’ll keep me busy until 2023, at which point I’ll be busy selling my organs to raise college money for my kids.
We’ll start off with a couple of counters that are near and dear to my heart: Average Disk sec/Read and Average Disk sec/Write. These counters indicate the response time of the disk – how long it takes for a read or write request to get acknowledged by the disk. You can grab this off of PhysicalDisk and LogicalDisk. I prefer to look at LogicalDisk unless there’s a specific reason to be looking at PhysicalDisk. As a general rule for databases, the average read latency should be less than 20ms for database files. The average write latency should be 10ms for log files. And of course since perfmon collects the data in seconds, you’ll have to multiply by a thousand in order to get your milliseconds.
General guidelines indicate that spikes should not be more than 50ms, but that’s pretty meaningless unless you’ve defined a sample interval (a 15 second sample interval is usually going to be a hell of a lot more spiky than a 2 minute or ten minute interval).
So is high disk latency ALWAYS a bad thing? Well, no. The numbers are there as a rule of thumb. If you’ve got performance problems and there are high or relatively high latencies on the disks, you’ll be advised by support to investigate and fix it. But if the latencies have been steady over time and the performance problems are new, clearly fixing the disk isn’t going to yield much for you. Which is why it’s important not to wait until you have problems to collect perfmon data.
Another factor to consider is the IO size. Logic and physics dictate that if you have two requests, one with a large amount of data, and one with a small amount of data, the smaller amount of data is going to complete more quickly on average. With spinning disks and random workloads, the vast majority of time is spent seeking data as opposed to moving the data, but it’s still noticeable in a number of workloads. So the larger the IO, the more latency you should expect. Larger IO usually implies sequential workloads, which are measured in megabytes per second rather than IOs per second, anyway.
Can everything (including latency) be healthy and performance still stink? Yes. Imagine a transaction system where the organization gets paid by the number of transactions it can complete. If the storage is a bottleneck and latencies are at 10ms, then the storage is still a bottleneck, and investigating advanced drive technologies is well advised.
Comments