IanG on Tap

Stack Traces and the TimedLock

Wednesday 12 May, 2004, 10:42 AM

I told you you hadn't heard the last of this one...

Philip Haack has posted an update to his cross-thread-stack-trace-enabled version of the TimedLock. His update incorporates my most recent set of changes, but retains his additional code for keeping track of the stack traces for all the threads holding a lock, rather than just the one that aborted acquisition due to suspected deadlock.

I also received an email from Marek Malowidzki (who I don't think has a home page, so I apologize for the absence of a link there). He came up with an ingenious solution to the problem I pointed out with Philip's implementation - the fact that it generates a stack trace every time you acquire a lock. Since we hope that deadlocks are the exception rather than the rule, and we only require the stack traces on an error condition, the effort of generating a stack trace will usually be wasted.

Performance Digression

And before anyone starts crying premature optimization, the cost of building a stack trace is non-trivial; adding this into a lock operation makes it orders of magnitude more expensive in the non-contention case. Testing on my machine here, I find that Monitor.Enter/Monitor.Exit pairs take only 30ns (about 50 clock cycles) in the non-contention case. (The more expensive ReaderWriterLock takes about 250ns, incidentally.) But when we add in the cost of generating a stack trace it goes up to 10.7us. That's microseconds, or if you prefer, 10700ns. So it's over 300 times slower!

Of course that might not matter... It all depends on how coarse your locking is. If the work you'll be doing while holding the lock is going to take milliseconds to complete, then 10us to acquire a lock isn't such a big deal. But if your locks are fine-grained, so that you are doing executing a few lines of code while holding the lock, the cost of lock acquisition is more likely to dominate.

(In practice of course, you would need to test the effective costs of these things in context. I just timed these things in a micro benchmark kind of a test. The joy of performance measurement is that the cost of a given operation can vary wildly depending on the context in which you perform that operation. You might find that in your system, building a stack trace every time you acquire a lock makes no measurable difference to your performance.)

Marek Malowidzki's Solution

Marek's solution manages to avoid the stack trace generation cost in cases where the stack trace is not required, but he is still able to retrieve stack traces for both the thread that holds the lock and the thread that failed to acquire the lock in the failure case.

(I've not shown Marek's code by the way, since it was a proof of concept rather than finished code, and people have a habit of copying sample code on the internet into production systems...)

The trick he uses to achieve this is that the thread that owns the lock performs a check when releasing it - it looks to see if any other threads tried and failed to acquire lock while the owning thread was in possession. This is done using a shared Hashtable. When a thread fails to acquire a lock, it adds an entry to this Hashtable using the object it was trying to obtain the monitor for as the key. The value associated with the object is a data structure that contains a place for the owning thread to write a stack trace once it eventually releases the lock.

You might think that this won't work - surely if the thread that owns the lock fills in the stack trace once it releases the lock, the thread that failed to acquire the lock will never see this stack trace. After all, the error condition we're dealing with here is that we deadlocked while waiting for the lock to be released... This suggests that waiting for it to be released in order to get the stack trace isn't going to be successful - the lock isn't any more likely to become available just because we happen to have a different reason for waiting for it!

But Marek has thought of that. He doesn't wait for the owning thread's stack trace at the point at which the exception is thrown. Instead, the exception has a reference back to the data structure which will eventually be populated with the stack trace - we just throw the exception without waiting. We don't wait for the owning thread to write a stack trace until we actually come to handle the exception. The exception will of course propagate up the stack in the usual fashion, and there is a good chance that by the time it reaches an exception handler, the act of propagating the exception will have caused whatever other locks the failing thread held that were bringing about the deadlock to be released. This example illustrates the typical usage model:

private static void WillDeadlock(object o1, object o2)
{
    try
    {
        using (TimedLock.Lock(o1))
        {
            Thread.Sleep(200);
            using (TimedLock.Lock(o2))
            {
                Thread.Sleep(200);
            }
        }
    }
    catch (LockTimeoutException x)
    {
        // Wait for up to 5000ms for the owning thread
        // to give us its stack trace.
        StackTrace otherStack = x.GetBlockingThreadStackTrace(5000);
        Console.WriteLine(x);
        if (otherStack == null)
        {
            Console.WriteLine("Couldn't get other stack trace");
        }
        else
        {
            Console.WriteLine("Stack trace of thread that owns lock: {0}", 
                otherStack);
        }
    }
}

If this function is called on two different threads, each passing in the same pair of objects, but in a different order, this will deadlock at the second call to TimedLock.Lock on both threads. However, when the deadlock is detected, one of the threads will throw an exception. When that exception propagates back out to the catch block, it will leave the outer using block, releasing the lock that it already holds. This clears the deadlock, allowing the other thread to proceed, so when we call the GetBlockingThreadStackTrace method (this is Marek's extension), the deadlock is already cleared. This means that the owning thread will proceed and then release the lock once it is done, at which point the stack trace becomes available. The thread on which the exception was thrown will have to wait for the owning thread to release the lock before it gets the stack trace, but because the exception had the effect of clearing the deadlock, it's now OK to do this.

Of course there are more complex scenarios in which the throwing of an exception on the thread that detects the error will not have the effect of clearing the deadlock - it might be that the exception handler which catches the exception is too deeply nested, and that the problematic locks are still held by this time. This is why Marek's GetBlockingThreadStackTrace method requires a timeout to be passed in - it cannot guarantee that it will be able to retrieve the stack trace.

Variations

There are a couple of variations one could play on this theme, both of which avoid the problem where the thread which catches the exception might not be able to retrieve the stack trace. In both cases, the solution is to avoid even attempting to retrieve the stack trace on that thread.

The simplest variation is for the owning thread to throw an exception on releasing the lock if it detects that another thread detected deadlock. This way, both threads throw an exception - both the one that failed to acquire the lock and the one that succeeded. You would want to add some kind of information to the exceptions to allow them to be correlated in your logging of course, since they would be thrown at different times on different threads. But it does mean that all the information is available without making threads wait for one another in exception handlers.

But that's not a great solution for at least two reasons. First, it means both threads encounter failure, when we only really needed to fail one of the threads to clear the deadlock. Second, it means that exceptions can now emerge from a rather surprising place. Throwing exceptions when the lock is released is a surprising thing to do - while you would reasonably expect to fail to acquire a lock, failing to release one that you acquired successfully is not something you normally have to deal with. Worse, because of the using idiom the TimedLock supports, the lock is released in a compiler-generated finally block. Throwing exceptions from finally makes me a bit queasy, but even if you don't mind that, it's just nasty that an exception can emerge from the following line of code:

Yes, a single closing brace would be all the source code you see for a potential exception source. Barf.

A better solution would be for the owning thread to simply dump a stack trace into a diagnostic log, rather than throwing an exception. This way, you get all the information you require in your logs. (This assumes you put some information into the LockTimeoutException allowing the stack traces from the owning thread and the deadlock-detecting thread to be correlated. Allocating a unique number to each detected deadlock would do the trick.) But now, we don't end up with threads waiting for each other to print the information out, and we also avoid throwing exceptions on both threads. Of course the downside is that there are a variety of different diagnostic logging strategies that people use, so I can't produce a single example that everyone can use - the logging code would be different for everyone...

How to Play Nicely with IO

Wednesday 12 May, 2004, 11:44 AM

I recently lamented the apparent lack of support for prioritizing IO in Windows.

Sean McLeod pointed out to me a paper published by Microsoft Research on this topic. The paper's title is 'Progress-based regulation of low-importance processes.' It describes a mechanism they call 'MS Manners,' which allows a low-importance process to use resources in such a way as to avoid degrading the performance of more important processes using the same resources.

I wish my virus scanner used this.

Actually, scratch that. I wish something like this was built into Windows... The OS can already prioritize use of the CPU, but with a modern PC, the CPU is often the one of the least important factors in how long things take - most of what I do seems to be network-bound or disk-bound. I think the OS should be prioritizing access to any resources whose use has a noticeable affect on the user experience.

(And I also want things like virus scanning not to churn the contents of my disk cache, or to cause Windows to decide to trim the working set of every process as it tries in vain to cache my whole hard disk... The virus scanner is reading everyone on my hard disk exactly once, so there's absolutely no need for the OS to start reallocating memory to the disk cache on its behalf. But that's a different issue.)

When to use the ReaderWriterLock

Wednesday 12 May, 2004, 04:18 PM

It's clearly a threading/scheduling kind of a day...

Matthew Adams has justly taken me to task by email over a lack of balance in my blog on the relative merits of the ReaderWriterLock and Monitor.

When I wrote that, it was mainly as a counterpoint to Sebastien Lambla's article advocating the ReaderWriterLock. But taken in isolation, my article is a little bit too down on the ReaderWriterLock. While I believe everything I wrote is technically correct, I neglected to mention that there are cases where ReaderWriterLock is A Good Thing. This may have left readers with the impression that I thought that Monitor is always a better choice; that's not what I was intending to say - I merely think that ReaderWriterLock doesn't always offer the improvements you might be expecting.

So what would a scenario where the ReaderWriterLock is a good solution look like?

As I said in the previous article, the right choice of locking primitive is determined both by the pattern of access and the level of contention. The example I showed is a bad candidate for the ReaderWriterLock because the lock protects a very small amount of data, and the code holds the lock for a very short period. In other words, the example is using a fine-grained locking strategy.

ReaderWriterLock can be a whole lot more useful is if you are locking at a much more coarse-grained level. (I.e., using locks to protect large bodies of data, rather than tiny pieces.) Matthew gave the example of an MVC environment, with several subscribers to a model. Most of the time, when these subscribers access the model, they are reading from it. In his system, these subscribers often do non-trivial amounts of work while holding a read lock on the model, with some of the processing involving network IO. Because of this potential for blocking, the observers do this work on worker threads. There is a tendency for all of the observers to want to read simultaneously because they are triggered by events raised by the model they all observe. The speed with which updates are processed is important, because many of the observers are responsible for keeping the screen up to date. (Obviously using Control.BeginInvoke where necessary when multiple threads have been used.) You only have a limited budget of time in which to process things before the UI will start to feel sluggish.

All of this makes the system likely to benefit from the ReaderWriterLock. The lock is going to be held for a relatively long time, and there will be a correspondingly high probability that multiple threads will be looking to acquire the lock concurrently. Whereas in my non-ReaderWriterLock-friendly example, the likelihood of a context switch occurring while the lock was held was very small, in Matthew's scenario, is it almost certain to happen. This is where the ReaderWriterLock shines - if most of the time you get read-only access with the odd update, it can allow the reads to progress concurrently.

This is a real system by the way, rather than a hypothetical scenario, just in case you weren't sure. It originally used Monitors, and typically took several seconds to open up the UI on a given bit of the model, as all the various views and other observers took it in turns to read and process the data they required. By changing it to use a ReaderWriterLock, this time was reduced to a fraction of a second.

ReaderWriterLock is arguably at its best when used like this - enabling a coarse grained locking strategy rather than a complex, fiddly, fine-grained strategy with many locks, without compromising performance.

April (2018)	(1 item)
August (2014)	(1 item)
July (2014)	(5 items)
April (2014)	(1 item)
March (2014)	(1 item)
January (2014)	(2 items)
November (2013)	(2 items)
July (2013)	(4 items)
April (2013)	(1 item)
February (2013)	(6 items)
September (2011)	(2 items)
November (2010)	(4 items)
September (2010)	(1 item)
August (2010)	(4 items)
July (2010)	(2 items)
September (2009)	(1 item)
June (2009)	(1 item)
April (2009)	(1 item)
November (2008)	(1 item)
October (2008)	(1 item)
September (2008)	(1 item)
July (2008)	(1 item)
June (2008)	(1 item)
May (2008)	(2 items)
April (2008)	(2 items)
March (2008)	(5 items)
January (2008)	(3 items)
December (2007)	(1 item)
November (2007)	(1 item)
October (2007)	(1 item)
September (2007)	(3 items)
August (2007)	(1 item)
July (2007)	(1 item)
June (2007)	(2 items)
May (2007)	(8 items)
April (2007)	(2 items)
March (2007)	(7 items)
February (2007)	(2 items)
January (2007)	(2 items)
November (2006)	(1 item)
October (2006)	(2 items)
September (2006)	(1 item)
June (2006)	(2 items)
May (2006)	(4 items)
April (2006)	(1 item)
March (2006)	(5 items)
January (2006)	(1 item)
December (2005)	(3 items)
November (2005)	(2 items)
October (2005)	(2 items)
September (2005)	(8 items)
August (2005)	(7 items)
June (2005)	(3 items)
May (2005)	(7 items)
April (2005)	(6 items)
March (2005)	(1 item)
February (2005)	(2 items)
January (2005)	(5 items)
December (2004)	(5 items)
November (2004)	(7 items)
October (2004)	(3 items)
September (2004)	(7 items)
August (2004)	(16 items)
July (2004)	(10 items)
June (2004)	(27 items)
May (2004)	(15 items)
April (2004)	(15 items)
March (2004)	(13 items)
February (2004)	(16 items)
January (2004)	(15 items)