Doing Work Without Threads

Thursday 23 September, 2004, 10:30 AM

There seems to be a popular notion that in order for a program to perform an operation, it must have a thread with which to do it. This is not always the case. Often, the only points at which you need a thread are at the start and end of the operation. But there's an unfortunate tendency to hog the thread for the whole duration.

(This approach of making a thread wait for an operation to complete is sometimes described as 'synchronous' - the thread's progress is synchronized with the progress of the work being done. It's also sometimes described as 'blocking' because the thread's progress is blocked until the operation completes.)

To be fair, there are good reasons for the popularity of the synchronous style. APIs that don't return until they've done what you asked them to do are usually simpler than the asynchronous alternatives. They require less code to use. The code is typically also a lot easier to read, because the flow of the code directly reflects the sequencing of operations; asynchronous code tends to have a somewhat fragmented structure. And of course sequential code is a lot easier to get right - doing everything on one thread in the order you want is much easier to do than trying to write code that handles events in whatever order they happen to occur, with whatever thread they choose to arrive on.

Nonetheless, I think the overwhelming prevalence of the synchronous approach clouds people's understanding of how things work. This seems to be particularly harmful when it comes to the way a lot of people approach the design of distributed systems - I think it may be partly responsible for the fact that RPC is so overused.

It's All Async Under the Covers

One of the main areas where I see synchronous thinking leading people astray is in networking code. It amazes me how many people think that sending or receiving a message over the network using, say, a socket is necessarily a blocking operation.

It's not - down at the level of the network device driver, it's all totally asynchronous. If a network device driver is asked to send a packet, it arranges for the memory holding that packet to be accessible to the network card (or it may elect to copy the packet into a buffer on the network card), and asks the network card to start sending it as soon as it can. The device driver then returns, usually before the packet has even started to go out over the network.

Once the hardware has been given its instructions, the CPU has nothing more to do. So there's no point in the device driver just twiddling its thumbs while it waits for the hardware to get on with it. Instead it returns, so that the CPU can go off and do something else. Once the network card has finished sending the data, it will typically raise an interrupt so that the driver can notify the sending program and release the buffers.

As for receiving data, that is initiated by the network card - when it receives a packet, it raises an interrupt. This causes the CPU to stop what it's doing and handle the interrupt. Exactly what happens at this point will vary depending on the exact context, but the important point is that the handling of incoming data is triggered by the arrival of incoming data. There doesn't need to be a thread waiting to receive the data. The data just arrives, and by the magic of interrupts, suitable processing ensues.

Meanwhile, In User Mode

Of course for most developers, the way things work in kernel mode is less important than the way networking services are presented in user mode. And since lots of the APIs are synchronous, it's hardly surprising that most people think primarily in terms of synchronous operations. For example, in .NET, you can use sockets entirely synchronously. Indeed, this seems to be encouraged. But in many cases, an asynchronous API is available.

For example, the Socket class provides Begin/End versions of all of the potentially long-running operations. (Those of you used to a more Unixy style of socket programming will of course be aware of the idea of putting a socket into non-blocking mode, which also supports an async approach. But I don't like that so much - it's more far removed from how things really work, because it doesn't offer a callback-driven approach. .NET's async model feels to me like a better match for the way things work under the covers. And since I used to write kernel mode network device drivers for a living many years ago, this appeals to me.)

The documentation in MSDN is sadly rather misleading for these methods. Here's what is says for Socket.BeginReceive:

Your callback method should implement the EndReceive method. When your application calls BeginReceive, the system will use a separate thread to execute the specified callback method, and will block on EndReceive until the Socket reads data or throws an exception

First of all your callback method doesn't implement the EndReceive method, it merely calls it. Secondly, the wording could be read as suggesting that perhaps a separate thread will be spun up as soon as you call BeginReceive. That's not what actually happens - no separate thread is used at the point "When your application calls BeginReceive..." At this stage, the main thing that happens is a call into kernel mode, in which the device drivers may take the necessary steps to make your receive buffer ready to accept the data when it arrives. (The only situation in which anything more happens at this moment is if the data had already arrived and was simply sat in a buffer somewhere waiting for you to ask for it.)

The documentation for this method is, on the whole a bit confusing. I don't think it's all that obvious that all BeginReceive really means is 'call me back when some data arrives'.

Many Receives, Few Threads

The great thing about getting the system to call you back only when there's some data available is that you can have an awful lot going on with very few threads. It's a very scalable approach. To illustrate this I've written a little example.

The example is a client of a test service. The test service itself is a very simple TCP/IP server written in C#. It isn't actually fully asynchronous. It accepts incoming connections asynchronously, but when it comes to sending the data, it sends it syncrhonously. I tried it with and without, and on the server side, there was no benefit us using async sends in this particular example, so I chose to keep it simple; only the client will be fully async. Here's the server:

using System;
using System.Collections;
using System.Net;
using System.Net.Sockets;
using System.Threading;

namespace AsyncReceive
{
    // Accepts incoming connections, and sends out a single
    // byte at regular intervals on each connection.
    public class SimpleServer
    {
        private Socket listener;

        public SimpleServer(TimeSpan interval, int backlog) :
            this(interval, backlog, 0) { }

        public SimpleServer(TimeSpan interval, int backlog, int port)
        {
            sendInterval = interval;
            listener = new Socket(AddressFamily.InterNetwork,
                SocketType.Stream, ProtocolType.Tcp);
            IPEndPoint ep = new IPEndPoint(IPAddress.Loopback, port);
            listener.Bind(ep);
            listener.Listen(backlog);

            listener.BeginAccept(new AsyncCallback(OnAccept), null);
        }

        readonly TimeSpan sendInterval;

        public EndPoint EndPoint
        {
            get { return listener.LocalEndPoint; }
        }

        // Called each time a new connection attempt comes in.
        private void OnAccept(IAsyncResult iar)
        {
            // No need to worry about concurrent access to the socket -
            // we only queue up one BeginAccept at a time, and we don't
            // ever do anything else with this socket.
            try
            {
                Socket sock = listener.EndAccept(iar);
                Sender s = new Sender(sock, sendInterval);
            }
            catch (Exception x)
            {
                Console.WriteLine("Error from EndAccept: " + x);
            }

            listener.BeginAccept(new AsyncCallback(OnAccept), null);
        }

        // We create one of these for each client.
        private class Sender
        {
            private Timer t;
            private Socket sock;

            // This ArrayList keeps all of the active senders reachable.
            //
            // It's not clear that this is necessary. By inspection, it
            // seems to work without this - having a pending Timer
            // waiting to call a method on a Sender seems to be
            // sufficient to prevent either the Timer or the Sender
            // from being collected. But I don't see this documented
            // anywhere, so I'd prefer not to rely on it.
            private static ArrayList senders = (ArrayList) new ArrayList().SyncRoot;

            public Sender(Socket socket, TimeSpan interval)
            {
                senders.Add(this);
                sock = socket;
                t = new Timer(new TimerCallback(TimerFired), null, interval, interval);
            }

            private void TimerFired(object state)
            {
                try
                {
                    byte[] buff = { 42 };
                    // Using an async Send here appears to offer no
                    // benefit, so we use the simpler sync version.
                    sock.Send(buff);
                }            
                catch (Exception x)
                {
                    Console.WriteLine("Error when sending: " + x);
                    Close();
                }
            }

            private void Close()
            {
                sock.Close();
                t.Dispose();
                senders.Remove(this);
            }
        }
    }
}

This accepts any number of incoming connections, and for each connection, it'll send out a byte every so often. (You specify the interval with a constructor parameter.) A nice feature of this server is that despite the fact that it's not using async sends, it's still very light on the threads. Most of the time it doesn't use any threads at all. It uses the System.Threading.Timer class to call it back when it's time to send some data. Each connection uses its own timer.

I've written a test program that hosts this server and then fires up 100 connections to it. (So for this test, I've chosen to host both the client and the server in the same process.) So we've got one process which is sending messages out to 100 clients periodically, and which also has to handle those incoming messages. And since the client and server are both so simple, the program spends the vast majority of the time idle. It is using less than 1% of my CPU time generating and handling about 20 messages a second in total. So the usual state for the program is that it's completely idle, and is waiting for either one of 100 timers to fire, or one of 100 sockets to receive some data.

And it's managing to do that with just 10 threads. (Actually the thread count seems to wander up and down a bit occasionally, but that's the thread pool for you. It's mostly on 10 though.) One of those will be the finalizer thread. Another will be the main application thread, which is just sat inside Console.ReadLine. According to the debugger, three of the 10 threads are not .NET threads - there are only 7 .NET threads. So after the finalizer and main threads that suggests it's using just 5 threads to service all 100 pending read operations and all 100 timers.

Anyone who thinks synchronously will presumably be surprised at how few threads are required to manage 100 concurrent read operations. I'm actually wondering why it needs as many as 5 threads... I'm guessing it's partly because timer events come in on normal thread pool threads while socket events come in on IO threads. But even so, given the incredibly small amount of work the CPU has to do to keep the client and server halves of these 100 connections serviced, it doesn't really need any more than 1 thread... However, I suspect that 10 threads is just the number of threads that you end up with as a minimum when doing sockety stuff - if I run the test with either 10, 100, or 1000 connections, the number of threads sits at around 10 in each case. (And I did check with netstat that all the connections really are there, even in the 1000 connections case.)

Show Me the Asynchrony

I've gone on for ages without actually showing you the full async code yet. So now would be a good time to take a look at the client. I've got a class called SocketClient, and I create one of these for each connection I want to make to the server. All it does is connect and then wait for data to arrive. I'll show it in pieces here so I can discuss the salient points:

using System;
using System.Threading;
using System.Net;
using System.Net.Sockets;

namespace AsyncReceive
{
    public class SocketClient
    {
        private Socket sock;
        private readonly int ID;
        private static int nextID = 1;
        private static int totalBytes;

        public SocketClient(IPEndPoint ep)
        {
            ID = nextID++;
            sock = new Socket(AddressFamily.InterNetwork,
                SocketType.Stream, ProtocolType.Tcp);
            sock.BeginConnect(ep, new AsyncCallback(OnConnectComplete), null);
        }

That constructor contains the first bit of asynchrony. Rather than calling Connect on the socket, I use BeginConnect. This kicks off the process of connecting, but rather than waiting for it to finish, it just returns straight away. So the constructor returns nice and quickly, and the connection will complete in its own sweet time. When the connection handshake is complete (or the connection has failed), the completion callback function we specified gets called:

        private void OnConnectComplete(IAsyncResult iar)
        {
            try
            {
                sock.EndConnect(iar);
                Console.WriteLine("{0} Connected", ID);
            }
            catch (Exception x)
            {
                Console.WriteLine("{0} Error from EndConnect: {1}", ID, x);
                return;
            }
            StartReceive();
        }

The error handling is pretty basic here, because my test harness didn't really need to do anything more than print the error. (In a real system this would need to notify the higher level controlling logic of the problem.) If there were no errors, it then calls this function:

        private byte[] buff = new byte[1];
        private void StartReceive()
        {
            sock.BeginReceive(buff, 0, 1, SocketFlags.None,
                new AsyncCallback(OnReceiveComplete), null);
        }

This kicks off the receive operation. The client spends almost all of its time in a state where it is waiting for this receive operation to complete, because the server only sends out data every few seconds. But the crucial point here is that although the client is waiting for data, it's not using a thread to do it. This call to BeginReceive returns immediately. No threads are consumed while we wait for the data to arrive. When the data does eventually arrive, then and only then is a thread requisitioned from the thread pool, and it will be used to call this completion function:

        void OnReceiveComplete(IAsyncResult iar)
        {
            try
            {
                int count = sock.EndReceive(iar);
                if (count == 0)
                {
                    Console.WriteLine("{0} closed by remote host", ID);
                    sock.Close();
                }
                else
                {
                    int total = Interlocked.Increment(ref totalBytes);
                    Console.WriteLine("{0} received {1} (total: {2})",
                        ID, buff[0], total);
                    StartReceive();
                }
            }
            catch (Exception x)
            {
                Console.WriteLine("{0} error from EndReceive: {1}", ID, x);
            }
        }
    }
}

This completes the receive operation. (It could examine the contents of the buff array, as they will now be populated, but since it will always just contain one byte with the value 42, there's no real point...) If the receive completed without error, it calls StartReceive again to kick off the next operation. So logically speaking this code is a loop construct - StartReceive, and OnReceiveComplete run repeatedly one after the other until there is an error or the connection is closed.

(Aside: this example illustrates another misleading aspect of the documentation. The documentation for BeginRecieve tells you that you will definitely have to put something in the final 'state' parameter of the method, and that at a very minimum it will need to be the socket. Notice that I'm passing null here. That's because I'm piggy-backing all the context I need into the target object of the callback delegate, so I don't have any use for the additional state parameter here.)

Implementation Quality

The code above illustrates the basic techniques for asynchronous socket programming, and demonstrates that it's possible for a small number of threads to service a large number of simultaneously outstanding requests. Even with 1000 connections, only 10 threads were required to service both ends of the connections.

But while this is useful for proving my point - network operations in progress don't need to consume threads - it's not a perfect demonstration of how to do these things in reality. There are a couple of issues with this code that would make it inappropriate for production purposes. First, the error handling is all localised. In practice, this is not going to be appropriate. Connection problems will most likely need some kind of higher-level handling. Second, the buffer handling approach could cause GC performance problems. Whenever you do an asynchronous socket send or receive, the buffer is pinned until the operation completes. Since this code spends most of its time with most of the sockets having outstanding read operations in progress, this means all the buffers are pinned almost all of the time. Pinning of buffers hampers the GC's operation. The best way to solve this is to allocate all your buffers up front and in one go so that they are all adjacent in the heap. For this test application, performance wasn't an issue (even 1000 connections still use less than 1% of the CPU time) so I didn't bother, but for a real server this would most likely be important.

The Thread Is Not the Operation

I hope this has made it clear that some long-running operations are quite capable of proceeding without tying up a thread. Sockets are not the only example of this. File IO can also be done asynchronously. The web service proxy classes that VS.NET generates also support asynchronous operation. Threads are merely an abstraction that enables us to perform fundamentally asynchronous operations and yet write our code in a synchronous style. This makes threads very useful, because that style is typically much simpler than an overtly asynchronous approach. But don't be mislead into thinking that this is how things really work.

(UPDATE: Mike Woodring pointed out to me that he's had an example showing exactly this technique here for ages. I had forgotten all about that. He wrote it to test the limits of how many connections a machine can have open, but it does illustrate this threadless approach perfectly.)

April (2018)	(1 item)
August (2014)	(1 item)
July (2014)	(5 items)
April (2014)	(1 item)
March (2014)	(1 item)
January (2014)	(2 items)
November (2013)	(2 items)
July (2013)	(4 items)
April (2013)	(1 item)
February (2013)	(6 items)
September (2011)	(2 items)
November (2010)	(4 items)
September (2010)	(1 item)
August (2010)	(4 items)
July (2010)	(2 items)
September (2009)	(1 item)
June (2009)	(1 item)
April (2009)	(1 item)
November (2008)	(1 item)
October (2008)	(1 item)
September (2008)	(1 item)
July (2008)	(1 item)
June (2008)	(1 item)
May (2008)	(2 items)
April (2008)	(2 items)
March (2008)	(5 items)
January (2008)	(3 items)
December (2007)	(1 item)
November (2007)	(1 item)
October (2007)	(1 item)
September (2007)	(3 items)
August (2007)	(1 item)
July (2007)	(1 item)
June (2007)	(2 items)
May (2007)	(8 items)
April (2007)	(2 items)
March (2007)	(7 items)
February (2007)	(2 items)
January (2007)	(2 items)
November (2006)	(1 item)
October (2006)	(2 items)
September (2006)	(1 item)
June (2006)	(2 items)
May (2006)	(4 items)
April (2006)	(1 item)
March (2006)	(5 items)
January (2006)	(1 item)
December (2005)	(3 items)
November (2005)	(2 items)
October (2005)	(2 items)
September (2005)	(8 items)
August (2005)	(7 items)
June (2005)	(3 items)
May (2005)	(7 items)
April (2005)	(6 items)
March (2005)	(1 item)
February (2005)	(2 items)
January (2005)	(5 items)
December (2004)	(5 items)
November (2004)	(7 items)
October (2004)	(3 items)
September (2004)	(7 items)
August (2004)	(16 items)
July (2004)	(10 items)
June (2004)	(27 items)
May (2004)	(15 items)
April (2004)	(15 items)
March (2004)	(13 items)
February (2004)	(16 items)
January (2004)	(15 items)

IanG on Tap

Blog Navigation

Writing

Other Sites