IanG on Tap

Ian Griffiths in Weblog Form (RSS 2.0)

Blog Navigation

March (2014)

(1 item)

January (2014)

(2 items)

November (2013)

(2 items)

July (2013)

(4 items)

April (2013)

(1 item)

February (2013)

(6 items)

September (2011)

(2 items)

November (2010)

(4 items)

September (2010)

(1 item)

August (2010)

(4 items)

July (2010)

(2 items)

September (2009)

(1 item)

June (2009)

(1 item)

April (2009)

(1 item)

November (2008)

(1 item)

October (2008)

(1 item)

September (2008)

(1 item)

July (2008)

(1 item)

June (2008)

(1 item)

May (2008)

(2 items)

April (2008)

(2 items)

March (2008)

(5 items)

January (2008)

(3 items)

December (2007)

(1 item)

November (2007)

(1 item)

October (2007)

(1 item)

September (2007)

(3 items)

August (2007)

(1 item)

July (2007)

(1 item)

June (2007)

(2 items)

May (2007)

(8 items)

April (2007)

(2 items)

March (2007)

(7 items)

February (2007)

(2 items)

January (2007)

(2 items)

November (2006)

(1 item)

October (2006)

(2 items)

September (2006)

(1 item)

June (2006)

(2 items)

May (2006)

(4 items)

April (2006)

(1 item)

March (2006)

(5 items)

January (2006)

(1 item)

December (2005)

(3 items)

November (2005)

(2 items)

October (2005)

(2 items)

September (2005)

(8 items)

August (2005)

(7 items)

June (2005)

(3 items)

May (2005)

(7 items)

April (2005)

(6 items)

March (2005)

(1 item)

February (2005)

(2 items)

January (2005)

(5 items)

December (2004)

(5 items)

November (2004)

(7 items)

October (2004)

(3 items)

September (2004)

(7 items)

August (2004)

(16 items)

July (2004)

(10 items)

June (2004)

(27 items)

May (2004)

(15 items)

April (2004)

(15 items)

March (2004)

(13 items)

February (2004)

(16 items)

January (2004)

(15 items)

Blog Home

RSS 2.0

Writing

Programming C# 5.0

Programming WPF

Other Sites

Interact Software

Async, await, and yield return

Friday 29 November, 2013, 06:06 PM

Sooner or later, it seems to occur to a lot of C# developers to try something like this:

// NOTE: doesn't work!
public async Task<IEnumerable<string>> GetItemsAsync()
{
    string item1 = await GetSomethingAsync();
    yield return item1;

    string item2 = await GetSomethingElseAsync();
    yield return item2;
}

This basic idea showed up recently in a question on Stack Overflow. I answered it, but since this is a recurring theme, I thought I’d write a blog post.

That code attempts to combine two C# features: iterators, and asynchronous methods. There are similarities between these: both enable you to write straightforward-looking code, which the compiler then tears apart and rewrites into something that would have been at best difficult, and often horribly contorted to write by hand; both let you write methods which return part way through execution, but are able to resume execution later on. However, you can’t use both features in the same method.

Superficially, this seems like a reasonable thing to want to do—an asynchronous method can perform any number of asynchronous operations before eventually producing a value, so surely it’s just a small leap from there to an asynchronous method that can produce several values, exposed as a sequence?

There are two problems here, though. The first is that the attempt shown above is slightly wrong-headed. It misunderstands the nature of the types involved—the method signature is not the right way to express the intended semantics. The second problem is that even if you fix this first problem, the compiler doesn’t actually know how to do what you want. Fortunately, it turns out that you don’t really need special compiler support for asynchronous lists—it turns out to be possible to use the existing non-list-oriented asynchronous support in conjunction with a library to get the desired effect.

If you’ve read much of my blog over the past year, or if you’ve read the latest edition of my book, Programming C# (which was a complete rewrite, by the way), it probably won’t surprise you that I’m going to suggest solving this problem with Rx (the Reactive Extensions for .NET).

But first, why is that example wrong-headed?

Representing Asynchronous Lists

Let’s be clear about the intent: the code above wants to provide the caller with a sequence of items, but it can’t necessarily produce all the items immediately. It needs to do some work to determine what values to return. As you can tell from the fact that it calls methods that end in Async, and that we’re having to await their results, it might take some time for the information to become available. So this code wants to be able to produce each item when it’s good and ready.

But that’s not what the method’s return type promises:

Task<IEnumerable<string>>

This says something subtly different: the method will produce its result—a sequence of items—asynchronously. It sounds like a rather picky distinction when you describe it: returning an asynchronous sequence vs. asynchronously returning a sequence. But it’s actually a rather big difference in practice.

The Task<T> type (regardless of what T may be) represents an operation that produces a single result. When you launch a task, its result is not available until the task completes. And once the task has produced its result (i.e., completed) that means it has finished—it is no longer executing. That’s actually very different from what we’re trying to do here: we want to be able to produce a value, and then maybe another value some time later, and perhaps another value later on, and so on.

So the problem with Task<IEnumerable<T>> is that the work has to be complete before it can give you anything. This does not represent what we need—we want to be able to supply multiple values, on whatever schedule we like.

Of course nothing technically requires an enumeration to be able to return items immediately. When consuming code starts retrieving items (e.g., with a foreach loop) the IEnumerable<T> implementation can just block until it has something to return. So it is possible to write an IEnumerable<T> implementation that produces results on its own schedule. And if we’re going to do that, there’s not really much benefit in returning a Task<IEnumerable<T>>—we may as well do this:

// NOTE: also doesn't work!
public async IEnumerable<string> GetItemsAsync()
... same body as before ...

That’s a slightly more appropriate signature (although as we’ll see shortly, it’s not the best approach). But the compiler doesn’t like this either—it won’t let you use async on a method with an IEnumerable<T> return type. And we wouldn’t really want it to—IEnumerable<T> can only be consumed synchronously, so support for an asynchronous implementation would offer limited benefits.

In any case, we don’t need special compiler support to implement an iterator that blocks until it’s ready to produce something—we can just write a synchronous version:

public IEnumerable<string> GetItemsAsync()
{
    string item1 = GetSomethingAsync().Result;
    yield return item1;

    string item2 = GetSomethingElseAsync().Result;
    yield return item2;
}

However, this is unsatisfactory—presumably the reason for attempting to use async and await in the first place was to take advantage of their potential efficiency improvements—they enable us not to have to block. Here, we’ve just given up, and reverted to synchronous code. (A task’s Result property blocks until the task completes.) And even if the compiler was prepared to help us, we’d just run straight into the problem that IEnumerable<T> doesn’t provide a way to consume items asynchronously. Fortunately, there’s a better way.

Instead of using IEnumerable<T>, we can use its push-oriented dual, IObservable<T>. It represents exactly the same underlying abstraction—a sequence of items—but it enables the source to decide when to produce items, rather than letting the consumer be in charge. This makes it the more appropriate of the two representations to use if you want to be able to support asynchronous item production.

I think it can be helpful to think of the relationship between these two interfaces by way of an analogy:

public string Foo()

is to

public Task<string> FooAsync()

as

public IEnumerable<string> Foos()

is to

public IObservable<string> FooSource()

The first method returns a string when you ask for it. Its asynchronous equivalent (the second method) returns a Task<string> instead, enabling it to provide a string when it’s ready. And then we have the sequence-based versions. The third method returns IEnumerable<string>, which provides strings when the caller asks. Finally, we have its asynchronous equivalent, returning IObservable<string>, meaning that it can provide each string when it’s ready.

(Unlike with Task<T>, there’s no standard idiom for naming methods that return IObservable<T>. I’ve arbitrarily appended Source to give it a different name from the one that returns an IEnumerable<T>. Don’t be tempted to put Async on the end by the way—that’s the naming convention for the Task-based Async Pattern, or TAP. It’s likely to confuse people if you return IObservable<T> when they’re expecting a Task.)

This relationship between synchronous and asynchronous method return types tells us that when someone writes Task<IEnumerable<T>>, chances are that the concept they’re trying to express is really IObservable<T>.

So, you might think we could do this:

// NOTE: doesn't work either!
public async IObservable<string> GetItemsSource()
... same body as before ...

This is a reasonable thing to want to be able to do. As it happens, C# doesn’t support it—it only supports void, Task, or Task<T> as return types of async methods. However, it doesn’t really matter—Rx makes it pretty easy to do what we want.

Implementing with Rx

Here’s how to do it:

public IObservable<string> GetItemsSource()
{
    return Observable.Create<string>(
        async obs =>
        {
            string item1 = await GetSomethingAsync();
            obs.OnNext(item1);

            string item2 = await GetSomethingElseAsync();
            obs.OnNext(item2);
        });
}

Rather than relying on compiler support, I’ve simply had to use a library function supplied by Rx, Observable.Create. This takes a delegate as its argument, and you can write that as an asynchronous method if you want. That’s what I’ve done here—it has enabled me to use almost exactly the same code as I wrote in my original (non-compiling) example.

In particular, I am free to use await expressions in the code that produces the sequence’s items. And remember, that was the original goal.

Instead of using yield return, I just call OnNext on the object Rx supplies as an argument. (It provides an IObserver<T>, by the way, which is the counterpart of the synchronous IEnumerator<T>.) Strictly speaking, I should then tell it I’ve reached the end of the sequence by calling OnCompleted before finishing, but Rx detects such sloppiness, and completes the sequence for you.

Would it be slightly cleaner if the compiler generated the extra code here for me? Yes—I’d be able to write this:

// NOTE: doesn't work!
public async IObservable<string> GetItemsAsync()
{
    string item1 = await GetSomethingAsync();
    yield return item1;

    string item2 = await GetSomethingElseAsync();
    yield return item2;
}

Would that be so much more convenient that it’s worth adding a new language feature? Probably not—this is absolutely trivial compared to the amount of heavy lifting the compiler does to enable async and await.

So the reason we don’t have specialised compiler support for asynchronous methods that produce sequences of items is that we don’t really need it. The single-result asynchronous method support combines well with library support (from Rx) to enable a pretty good solution.

I Want my IEnumerable<T>

But what if you really wanted an IEnumerable<T>? Perhaps you want to use async and await, and you want to enable clients that can consume an IObservable<T> to work efficiently with your code, but you also have some existing code that requires an IEnumerable<T>?

Well it turns out you can provide both. Rx provides a very straightforward way to transform an IObservable<T> into an IEnumerable<T>:

public IEnumerable<string> GetItemsAsEnumerable()
{
    return GetItemsSource().ToEnumerable();
}

This may work slightly better than the simple synchronous approach (in which I just used the Result property of the tasks returned by the asynchronous methods), because Rx enables the source to run ahead of the consumer. As soon as code starts to enumerate the contents of the IEnumerable<string> returned by this method (e.g., by starting a foreach loop), the code that generates the items (the asynchronous anonymous method in the last example of the preceding section) will start to run, and it will be free to generate items as fast as it likes, regardless of how quickly the calling code retrieves items. (The IEnumerable<T> implementation Rx supplies here has an internal queue to support this.) So you can potentially get a higher degree of concurrency with this approach than the straightforward synchronous technique. If you’re really lucky, by the time the consuming code finishes processing an item, the next one will already be available.

That said, you’ll get better results if your consuming code understands IObservable<T>, because if the consumer gets ahead of the producer, it would end up blocking a thread if it’s using IEnumerable<T>, whereas with IObservable<T>, you’ll only hang onto a thread for as long as you have productive work to do.

Copyright © 2002-2013, Interact Software Ltd. Content by Ian Griffiths. Please direct all Web site inquiries to webmaster@interact-sw.co.uk