Profilers and The Perils of Micro-Optimization

Wednesday 9 November, 2005, 07:35 AM

Someone recently asked:

"If you wanted to judge the best way to do something, for example code inside of a class, in otherwords benchmarking it, how could you do that?"

Earlier today I wrote a reply to this question on the Dotnet-CX mailing list. A couple of people have suggested I put this up on my blog to get it out to a wider audience. So here's a slightly edited version of what I wrote.

What's Best?

The single most important thing to bear in mind for this kind of exercise is:

It all depends on the context.

The "best" way to do something is whatever best meets your requirements in your particular scenario. This may look completely different from the best way of doing exactly the same thing in some other scenario, where your priorities and constraints may be different.

The author of the original post went on to ask:

"I'd like to take speed into account.. but would it also be possible to measure such things as how much memory footprint is taken by all methods & members of a specific Class Call?"

Nine times out of ten, if you these criteria for direction, you will almost certainly not end up with "the best way to do something". You'll end up with the way of doing something that gives a good balance of speed and memory footprint. But it might be a buggy unmaintainable mess.

If the code in question wasn't performance critical, but was production code serving some useful purpose in an application with a medium-to-long anticipate lifetime, then Congratulations! You just wrote the worst possible code - the very opposite of the stated requirement.

What's worse is people often take the lessons they think they've learned from such an exercise and compile them into a list of small scale "best practices" and proceed to apply these everywhere. This is a Bad Thing because it ignores the all important context. Just because a practice appeared to work well for some specific definition of "well" in some specific scenario doesn't mean you should blindly apply that practice everywhere.

Best != Fastest

In an awful lot of applications, the majority of the code is not performance critical. Speed is therefore often the last thing you should be optimizing for.

You will almost certainly want to put "Works correctly" higher up your list, and I'd also go for "Easily understood by others", "Well-chosen symbol names", "Minimal duplication", "Minimal dependencies on and from other code", "Intelligently chosen and well-formatted comments", as a few examples likely to be more important than any of the things in the original list above when aiming for the best possible code. (That's not an exhaustive list by the way - just a few random examples of non-performance-related goodness.)

Sometimes Speed Matters

Of course there will be the one in ten cases where the code is performance critical... (Well, probably not exactly one in ten. Pick your favourite moderately small fraction.)

But in nine out of ten of these performance-critical cases, the performance will probably turn out to have nothing to do with the specific details of how you wrote the code. Architecture often has more influence than anything else on performance. And within architectural constraints, the software design is often significant. And at lower levels, the algorithms you choose will matter far more than the hotness of your raw coding skillz.

Sometimes Speed Matters 2 - This Time It's Code

But that does still leave the 1% (or whatever; a pretty small fraction) of cases where the performance of your particular piece of code is relevant, and cannot be improved by architecture, design, or algorithm changes.

How do you know when you're writing one of those bits of code? First, you need to define performance requirements for your system so you know whether it's fast enough. When it's not meeting your performance requirements, that is when you reach for your profiler, not before.

(By the way, I'm not advocating leaving performance testing until last. On the contrary, I think it's important to test early and often. From as early as possible in the project lifecycle, you should verify that you're meeting both functional and performance requirements. Fixing performance problems late in the day is pretty tricky and often expensive, precisely because most performance issues are caused by either architecture or design problems. I'm just saying you shouldn't optimize performance until you've established that you have a performance problem.)

Run the scenario that's too slow in the profiler, and it will most likely give you a pretty good idea of where the time is being spent. Although not always - there are certain kinds of sluggishness that are all-pervading. The CLR profiler, which focuses on memory usage rather than how long specific bits of code are taking, might help you with some of these - poor memory usage is a good way of slowing everything down uniformly... And in some cases, profilers will tell you nothing useful, although fortunately this is less common. (At least to start with... You tend to run into this once you've picked all the low-hanging fruit and it's still not fast enough. The usual way of solving this is to think long and hard about what you're doing. It's not going to be easy.)

In a few cases, the profiler will actively mislead you - this happens because profilers apply a non-uniform distortion to your code's performance characteristics. Sometimes fixing the thing the profiler highlighted won't make any noticeable difference to the real-world performance.

Unfortunately, lots of developers just love to go off on micro-benchmarking exercises, where they write about 10 lines of code in a variety of different ways, and discover that one runs faster than the rest. (Or worse, one looks like it runs faster than the rest when running in the profiler.)

This is rarely useful information, because all it really tells you is that that version runs fastest in the context of your test harness. And since the test harness is just the 10 lines of code in isolation, that's not terribly helpful. There is no guarantee that the code that performed best in this micro-benchmark will run faster in all scenarios. Nor will this kind of test tell you whether the cost of making this code run faster is to make everything else in your system slower. (It might yield interesting insights into how the system works under the covers, but you should be very wary of interpolating such insights into general rules for 'the best way' of doing something.)

Also, don't rely too heavily on the profiler to work out how much better (or worse) your changes have made things. Here's a much better approach:

Get an automated test rig that can reproduce the too-slow scenario time and time again. This must be something that exercises your real program, and not some isolated piece of code from your program. If you don't have such a thing, stop what you're doing and create one. You need repeatability, which means you need automation. Otherwise you're going to end up relying on guesswork, which is a notoriously poor way of doing performance tuning. (Of course this means you need to think hard about whether this test scenario is likely to be a good representation of a real scenario. There's not much use in a carefully constructed fully automated but utterly unrealistic test rig.)
Use this test rig to measure throughput/response time/frame rate/whatever performance metric you care about. Make sure it's a release build that you're measuring. Keep a record of your measurements - write them down somewhere or save a spreadsheet of measurements for each test run.
Run the test rig with your profiler attached. Marvel at the vast difference in performance figures between this and the figures you measured in 2. (Indeed, this difference is the motivating factor behind this procedure.) Performance figures obtained while the profiler is attached aren't terribly meaningful so there's no need to keep them, although it is useful to have a clear idea of how much the profiler is distorting your system's performance. It'll help you know how close to the truth are the stories told by the profiler.
Examine the areas of code the profiler highlighted as being likely problem areas. Modify the worst of these in a way you hope will improve matters.
Measure throughput/response time/whatever in the test rig on a release build including these modifications, without the profiler attached. Compare with results from (2). If things got better, great. If not, back out the changes.
If it's not fast enough yet, repeat from (3) as necessary.

The crucial part of this is that you are measuring before and after figures outside of the profiler.

For contrast, here's the thing you absolutely don't want to do:

Run tiny little fake program showing some small code technique in debugger.
Look at the lines that light up red (or whatever) in your profiler.
Tweak those lines.
Goto 1.

This is WRONG WRONG WRONG.

But it's what a lot of people do because it's way easier than the previous set of instructions.

Sadly, it's often pointless. In particular, you never have any way of knowing whether you've achieved anything useful, because you're never taking any realistic measurements. You're just playing an abstract kind of computer game: move the red lines around in the profiler. And while this can be fun, it's not terribly useful - you may as well be playing Tetris.

The single most important point to understand about profilers is THEY CHANGE THE PERFORMANCE OF YOUR CODE. This is a form of the age-old observer's paradox - by measuring something we change it. That's a fundamental problem of course - the first set of instructions I set out are also subject to the same problem... But the thing about profilers is they often change things quite radically, sometimes to the extent that there's no recognizable relationship between the performance characteristics when running in the profiler, and the performance characteristics when running outside the profiler.

(This problem is particularly bad with intrusive profilers. It's not so bad with sampling profilers, although those have their own set of problems. Unfortunately most of the popular .NET profilers seem to be intrusive ones. This is probably because .NET makes those really easy to write.)

So you want your final definitive measurements - the measurements you use to answer the question "Did I just make things better or worse?" - to make as little difference to how the code runs as possible. This means not using the profiler to do that measurement.

Ideally you will instrument your code so this definitive measurement instrumentation is in place all the time. (This is a good idea in any server-based app. It has pros and cons in a client-side app.) This more or less solves the observer problem: if you're always observing your system's performance, even in production, you're not changing anything by observing. Observation is normal, and always on.

This means you want these definitive measurements to be as non-intrusive as possible, so they'll be pretty high-level stuff - e.g. "How long did it take that window to open? How many requests am I handling a second? What's the average time between receiving a request and sending a response." Profilers work at a much lower level of detail - how long did that method take? How many times have we run this line? It's that level of detail that is both their strength and their weakness - it helps you find the slowest code, but it's also the reason they're so intrusive and distorting.

The profiler is there to give you a good idea of where to start looking. It's not a useful tool for providing good measurements of what effect your changes are having.

What matters is whether you improved the performance metrics that are important to you, as measured on the release build.

So you need to be starting by working out what are the performance metrics that matter to you for the application you're working on.

Related Resources

Craig Andera has written a couple of excellent essays that are somewhat related to all of this, and I'd strongly recommend reading them. Craig has a wealth of practical experience in making systems go faster. You can find them here:

http://www.pluralsight.com/wiki/default.aspx/Craig/ScalabilityConsiderationsIndex.html

And of course there's Rico Mariani's blog - an oasis of detailed .NET performance tidbits.

April (2018)	(1 item)
August (2014)	(1 item)
July (2014)	(5 items)
April (2014)	(1 item)
March (2014)	(1 item)
January (2014)	(2 items)
November (2013)	(2 items)
July (2013)	(4 items)
April (2013)	(1 item)
February (2013)	(6 items)
September (2011)	(2 items)
November (2010)	(4 items)
September (2010)	(1 item)
August (2010)	(4 items)
July (2010)	(2 items)
September (2009)	(1 item)
June (2009)	(1 item)
April (2009)	(1 item)
November (2008)	(1 item)
October (2008)	(1 item)
September (2008)	(1 item)
July (2008)	(1 item)
June (2008)	(1 item)
May (2008)	(2 items)
April (2008)	(2 items)
March (2008)	(5 items)
January (2008)	(3 items)
December (2007)	(1 item)
November (2007)	(1 item)
October (2007)	(1 item)
September (2007)	(3 items)
August (2007)	(1 item)
July (2007)	(1 item)
June (2007)	(2 items)
May (2007)	(8 items)
April (2007)	(2 items)
March (2007)	(7 items)
February (2007)	(2 items)
January (2007)	(2 items)
November (2006)	(1 item)
October (2006)	(2 items)
September (2006)	(1 item)
June (2006)	(2 items)
May (2006)	(4 items)
April (2006)	(1 item)
March (2006)	(5 items)
January (2006)	(1 item)
December (2005)	(3 items)
November (2005)	(2 items)
October (2005)	(2 items)
September (2005)	(8 items)
August (2005)	(7 items)
June (2005)	(3 items)
May (2005)	(7 items)
April (2005)	(6 items)
March (2005)	(1 item)
February (2005)	(2 items)
January (2005)	(5 items)
December (2004)	(5 items)
November (2004)	(7 items)
October (2004)	(3 items)
September (2004)	(7 items)
August (2004)	(16 items)
July (2004)	(10 items)
June (2004)	(27 items)
May (2004)	(15 items)
April (2004)	(15 items)
March (2004)	(13 items)
February (2004)	(16 items)
January (2004)	(15 items)

IanG on Tap

Blog Navigation

Writing

Other Sites