IanG on Tap

Ian Griffiths in Weblog Form (RSS 2.0)

Blog Navigation

April (2013)

(1 items)

February (2013)

(6 items)

September (2011)

(2 items)

November (2010)

(4 items)

September (2010)

(1 items)

August (2010)

(4 items)

July (2010)

(2 items)

September (2009)

(1 items)

June (2009)

(1 items)

April (2009)

(1 items)

November (2008)

(1 items)

October (2008)

(1 items)

September (2008)

(1 items)

July (2008)

(1 items)

June (2008)

(1 items)

May (2008)

(2 items)

April (2008)

(2 items)

March (2008)

(5 items)

January (2008)

(3 items)

December (2007)

(1 items)

November (2007)

(1 items)

October (2007)

(1 items)

September (2007)

(3 items)

August (2007)

(1 items)

July (2007)

(1 items)

June (2007)

(2 items)

May (2007)

(8 items)

April (2007)

(2 items)

March (2007)

(7 items)

February (2007)

(2 items)

January (2007)

(2 items)

November (2006)

(1 items)

October (2006)

(2 items)

September (2006)

(1 items)

June (2006)

(2 items)

May (2006)

(4 items)

April (2006)

(1 items)

March (2006)

(5 items)

January (2006)

(1 items)

December (2005)

(3 items)

November (2005)

(2 items)

October (2005)

(2 items)

September (2005)

(8 items)

August (2005)

(7 items)

June (2005)

(3 items)

May (2005)

(7 items)

April (2005)

(6 items)

March (2005)

(1 items)

February (2005)

(2 items)

January (2005)

(5 items)

December (2004)

(5 items)

November (2004)

(7 items)

October (2004)

(3 items)

September (2004)

(7 items)

August (2004)

(16 items)

July (2004)

(10 items)

June (2004)

(27 items)

May (2004)

(15 items)

April (2004)

(15 items)

March (2004)

(13 items)

February (2004)

(16 items)

January (2004)

(15 items)

Blog Home

RSS 2.0

Writing

Programming WPF

.NET Windows Forms in a Nutshell

Mastering Visual Studio .NET

Other Sites

Interact Software


FORTRAN-Compatible Dynamic Objects in C# - Monday 1 April, 2013, 12:46 PM

The 4th version of C# (which shipped with Visual Studio 2010) added the dynamic keyword. This is most useful for dealing with COM scripting APIs such as Microsoft Office’s Automation features, which were a nightmare to work with in older versions of C#. But this language feature also makes it possible to use certain idioms that used to be the preserve of dynamic languages. For example, it lets you write this sort of questionable code:

dynamic o = new ExpandoObject();

o.Count = "1";
o.Count += 4;

Console.WriteLine(o.Count);

As you may or may not be expecting, this prints out 14.

C# lets the target object determine the semantics, i.e., not everything does the same thing when used through dynamic. In the example above, I’ve used ExpandoObject, which behaves in a way that vaguely resembles how objects work in JavaScript. In particular, if you set a property that didn’t previously exist, the object will automatically ‘expand’ by growing a new property of the specified name. (It’s not quite the same as JavaScript, which will even let you read a property that has never been set; you’ll get an exception if you try that with ExpandoObject in C#.)

This enables you to make mistakes like this, without any tiresome interference from the compiler:

dynamic o = new ExpandoObject();

o.Count = 1;
o.Cuont = o.Count + 1;

Console.WriteLine(o.Count);

Despite the fun I’ve been having with Clojure lately, I remain more of a static typing guy at heart, so I admit to being a little perplexed about why dynamic language fans prefer things their way. But presumably this example is precisely the sort of thing they’re banging on about when they boast about how much more productive their preferred languages are—it lets you get bugs like this straight into production, unencumbered by a compiler that might point out the mistake.

(If, despite choosing a dynamic language, this is the sort of thing you want to avoid, I gather the usual solution is to try to catch this sort of problem with unit tests instead. It’s easier to write a test to find this sort of bug than trying to test anything deeper about what your code is supposed to do—indeed it’s so easy that outside of the dynamic world, the compiler can do it for you. I presume the increased volume of shallow tests adds to the sensation dynamic language advocates get of feeling highly productive, at least insofar as producing high volumes of code constitutes productivity. And yet they’re so rude about ‘ceremony’ in languages they don’t like!)

When reviewing the chapter on C#’s dynamic keyword in my book, Programming C# 5.0, my father remarked “Shades of FORTAN IV!!” He’s been working with computers since he left university in the early 1960s, so he can bring some useful context to supposedly new developments in technology. As he explains:

“In FORTRAN you didn’t have to declare your variables before using them. If you used a particular name, it automatically created the variable for you. (Typing was implicit too—if the first letter of the variable’s name was in the range I to N the variable was an integer, anything else was a float. Early FORTRAN was very numeric-oriented—things like characters were really just numbers being treated in a particular way.)”

The parenthetical comment interested me, because it would enable us to avoid what might be a bug in the first example above—if the author of the code had intended the count to be an integer, then the result, 14, will not be suitable. If only we had FORTRAN IV’s ability to state what you mean by simply adding an ugly prefix to a variable name, we could get the right result while still using dynamically created properties. So I wrote a simple class to implement this:

class FortranIVObject : DynamicObject
{
    private readonly Dictionary<string, object> _values =
        new Dictionary<string, object>();

    public override bool TrySetMember(
        SetMemberBinder binder, object value)
    {
        EnsureShouting(binder.Name);

        char first = binder.Name[0];
        if (first >= 'I' && first <= 'M')
        {
            value = Convert.ToInt32(value);
        }

        _values[binder.Name] = value;

        return true;
    }

    public override bool TryGetMember(
        GetMemberBinder binder, out object result)
    {
        EnsureShouting(binder.Name);

        if (!_values.TryGetValue(binder.Name, out result))
        {
            char first = binder.Name[0];
            result = first >= 'I' && first <= 'M' ? (object) 0 : 0.0;
            _values[binder.Name] = result;
        }

        return true;
    }

    private static void EnsureShouting(string name)
    {
        if (name.Any(char.IsLower))
        {
            throw new ArgumentException(
                "FORTRAN identifiers must be UPPERCASE.");
        }
    }
}

I’m deriving from DynamicObject, because that does most of the work required to implement custom dynamic behaviour. We just have to write an override for each feature we’d like to control. In this case, when a member is set, we look for the letter prefix, and if that’s one of the letters indicating that the variable should be an integer, we coerce the incoming value. Notice that the code for getting a value is happy to let you read a variable without ever having initialized it—it defaults to a value of zero. This gets even more into the spirit of dynamic languages than JavaScript—although JavaScript doesn’t report an error when reading undefined variables, it does return a special undefined value, increasing the risk that defects in your code will be detected before you ship. Conversely, my implementation supplies a value that is more likely to allow bugs to go into production unnoticed. But that’s just icing on the cake. The main point here was to get our first example to work as intended. It looks a little different:

dynamic o = new FortranIVObject();

o.ICOUNT = "1";
o.ICOUNT += 4;

Console.WriteLine(o.ICOUNT);

Obviously, I’ve had to replace ExpandoObject with my new type. But you’ll also notice a certain amount of SHOUTING—that’s because my custom dynamic type requires identifiers to be uppercase, to make it feel more like FORTRAN. But with that and the “I” prefix in place, it works: it prints out 5, instead of 14. Rest assured that this only fixes the particular kind of bug encountered by the original snippet—we still get most of the usual danger of dynamic typing, but combined with all the convenience and aesthetic merits of Hungarian notation.

Hanging Chad Emulation

My father went on to tell me of problem he once encountered, relating to FORTRAN IV’s support for using variables without having to declare them:

“The automatic variable creation was an interesting source of errors—a colleague and I once spent a week tracking down a bug due to this. In those days all our source files were on punched cards, and in one place an incompletely punched-out chad had got folded back into place so that what had been a letter “U” became a digit “4” (thanks to how EBCDIC encoded characters on punched cards). So in one place, a variable whose name should have been “TU” became referred to as “T4”. The compiler happily created another variable called “T4” and used it in that statement. So when other parts of the program altered the value of “TU”, this particular line of code was oblivious. Unfortunately, the effect on the behaviour of the program was not obvious (it was doing speed/time calculations to simulate the dynamic performance of a numerically controlled machine tool—System 24), so the program appeared to run OK but the machined parts didn’t come out like they should! So I’m not a big fan of automatic variable creation. I thought that one of the reasons that later languages like PL/1 insisted on you declaring variables explicitly was to avoid just this sort of risk (although more likely caused by typing errors than hanging chads—reminiscent of the election recounts in Florida by which George Bush scraped into the presidency).”

Incidentally, the test/debug round trip time was pretty high, because running updated code involved putting a box of punched cards in a car and driving across London to the nearest IBM data centre. In those days, ‘cloud computing’ referred to braving London’s infamous smog to get to one of the very few places that had a computer.

Some of my readers are doubtless too young to remember that Bush election, and thus way too young to remember punched cards. They’re how we used to store data back before magnetic media became reliable and affordable. A ‘chad’ is a bit of card punched out when ‘writing’ a value by making a hole in the card. Occasionally, the machine punching out the hole didn’t quite cut through the card cleanly, leaving the chad partially attached (or ‘hanging’). Sometimes, a hanging chad would fold back and block up the hole, changing the value that the punched card reader would report next time it saw the card. This is a kind of bit rot, although it’s typically reversible: if you inspect the cards, you can see when a hanging chad has closed up. It’s harder to perform similar media integrity checks through visual inspection with today’s storage devices, sadly.

Anyway, this sort of random renaming of variables struck me as exactly the sort of thing that a dynamic language fan would probably enjoy, so I decided to add some occasional random behaviour to my dynamic object. From time to time, it will act as though the variable name supplied was different from the one the developer intended. Here’s the modified version:

class FortranIVObject : DynamicObject
{
    private readonly Dictionary<string, object> _values =
        new Dictionary<string, object>();

    private readonly Random _chadRand = new Random();

    public override bool TrySetMember(
        SetMemberBinder binder, object value)
    {
        string name = EnsureShoutingAndChadify(binder.Name);

        char first = name[0];
        if (first >= 'I' && first <= 'M')
        {
            value = Convert.ToInt32(value);
        }

        _values[name] = value;

        return true;
    }

    public override bool TryGetMember(
        GetMemberBinder binder, out object result)
    {
        string name = EnsureShoutingAndChadify(binder.Name);

        if (!_values.TryGetValue(name, out result))
        {
            char first = name[0];
            result = first >= 'I' && first <= 'M' ? (object)  0 : 0.0;
            _values[name] = result;
        }

        return true;
    }

    private string EnsureShoutingAndChadify(string name)
    {
        if (name.Any(char.IsLower))
        {
            throw new ArgumentException(
                "FORTRAN identifiers must be UPPERCASE.");
        }

        if (_chadRand.Next(50) == 1)
        {
            var sb = new StringBuilder(name);
            sb[_chadRand.Next(name.Length - 1) + 1] ^= (char) 1;

            return sb.ToString();
        }

        return name;
    }
}

With this modification, my earlier code continues to print 5 as expected most of the time, but once in a while it will print some other value, such as 4 or 0. You don’t get more dynamic than that.

I hope this code is exactly as useful to the community as it deserves to be, and I hope that you enjoy this first day of the month!

Update 2013-04-01 13:31 (GMT+1): fixed bug in the code that handles the I..M for get operations. (I was originally only going to handle that for set.) I put that in at the last minute, with inevitably buggy consequences - the original first code example wouldn't actually compile (because it was a mixture of the original code, and an update to the second snippet), and in both cases, it ended up with a value of 0.0f for both float and integer variables, due to a missing cast.


WPF Threads and Async: Conclusions - Monday 25 February, 2013, 5:41 PM

This is the sixth blog in a series exploring how asynchronous and multithreaded techniques can affect the performance of loading and displaying moderately large volumes of data in WPF applications. In this final entry, I’ll describe what I think are the most important lessons to learn from all of this.

Don’t Put 180,000 Items in a ListBox

The elephant in the room throughout this discussion has been the fact that I’ve been attempting to show 180,000 items in a ListBox. This is terrible user interface design. (Anything more than a few hundred items causes usability problems.) The scrollbar thumb becomes a very blunt instrument at this scale, and finding particular items would be a real pain. ListBox is the wrong idiom here, and it’s amazing that WPF coped as well as it did with this frankly ridiculous scenario.

The original example that motivated this series didn’t do this. The processing was more complex, reducing a large volume of data down to a much smaller set of summarized results. I didn’t want the details of the processing work to become a distraction, so I simplified that aspect of the work in the example I showed. This had the side effect of multiplying some of the numbers involved, making certain effects easier to see. (The performance impact of a change becomes more obvious when all your timings are in seconds rather than tenths of a second.) It has also had a distorting effect, but the principles are valid—the conclusions that can be drawn from this artificial example were useful in the real scenario.

So what were those conclusions?

No Silver Bullet

The first and perhaps most obvious conclusion of this series is that the asynchronous language features added in .NET 4.5 are not a silver bullet. Although I was able to improve responsiveness with a very small change in part 1, making simple use of the async and await keywords, this also incurred a considerable cost.

Of course, not everything will slow down by a factor of three—this particular example performs a large number of fairly small operations, which exacerbates the problem. That’s not a result of trying to put 180,000 items in a list by the way. The original problem didn’t generate that much output, but it still performed a large number of small asynchronous operations.

The asynchronous language features are brilliant, but as with all tools, you need to understand how they work. They can have a profound impact on the way your code executes.

Timing Matters with Data Binding

Some of the largest performance improvements came from being careful about when exactly to make information visible to WPF data binding. This is not strictly anything to do with either multithreading or asynchrony—even the original synchronous, single-threaded version of the code saw roughly a x2.5 speedup by simply presenting WPF with all of the data in one lump instead of one item at a time.

This issue isn’t unique to data binding. In all the various XAML frameworks, you can benefit from a ‘bottom-up’ approach to constructing your UI content. If you want to add a complicated bit of content to your UI, it’s best to build it completely before you attach it to the containing UI. With the alternative, top-down approach, you add each new element as the child of an element that’s already on screen, and the problem with that is that WPF (or Silverlight or WinRT, or whatever) ends up doing certain work across the whole UI to accommodate that new element. It’s smart enough to defer some work, but there are things it can’t avoid.

(If you’re wondering what top-down vs. bottom-up looks like in practice, suppose you’re adding 10 new elements, e.g., a StackPanel containing 9 children. Top-down, you’d start by adding the StackPanel to the main UI, and then add each of the children to that panel. That makes the framework do certain whole-UI work 10 times. But if you add the 9 child elements to the StackPanel before adding that panel to the live UI—a bottom-up approach—the whole-UI work only needs to be done once.)

The broad principle here is to give the UI framework updates in relatively large chunks, because this enables it to perform certain jobs once per chunk, instead of once per element.

Beware Context Switches

The single most important performance factor in these examples was the cost of context switches. Since this was a WPF application, that meant regaining control of the dispatcher thread when an asynchronous operation completes, so that the method executing an await can continue. (Attempting to finesse this with ConfigureAwait made things worse. The context switch had to happen somewhere, and taking it out of the await expression just pushed it into the data binding system where we were unable to control it.) The nature of the context switch will depend on the environment—if you use async and await inside an ASP.NET application, for example, the way it recovers the context is quite different (and can be cheaper than a WPF-dispatcher-based callback). But the basic idea remains: any time an await does work that cannot complete immediately, you’re going to have to pay a cost when the operation finally completes. ConfigureAwait may reduce that cost, but it won’t eliminate it, and if you truly need to do certain work back on the original context, you’re going to pay the price somewhere.

The flip side of this is that when await does not in fact need to wait, it’s very efficient. (Pre-C# 5, the usual explicit continuation style for using the TPL doesn’t offer this benefit. If you want code that handles the no-wait case as efficiently as await without actually using await, you have a lot of work to do.) So there are often significant performance benefits to be had by increasing the likelihood of await expressions being able to complete immediately. In this series, simply increasing the StreamReader buffer size had that effect, producing dramatic performance enhancements (much better than those ConfigureAwait could have offered even if it had been viable).

Multithreading can sometimes produce particularly egregious amounts of context switching when applied in a user interface. (This is not a new problem. With Win32, the difference between sending a message to an HWND from its owning thread and sending the same message from a worker thread could easily wipe out any potential performance gains that multithreading might have offered. Once you’re on a different thread from your UI elements, you need to start being careful about the granularity of your updates. That’s as true with modern UI frameworks as it has been for years with Win32.)

That’s not to say multithreading is bad. In the end, the very best performance came from a multithreaded solution. The key is to minimize your context switches, chunking reasonably large amounts of work into each one to amortize the costs. Some sometimes it pays to be explicit about the chunking, which brings me to my final conclusion.

Rx Rocks

The Reactive Extensions for .NET are great. Since Rx became available, I think I’ve used it in every project I’ve worked on. This is why I dedicated a whole chapter to it in my recent(ish) total rewrite of O’Reilly’s Programming C#.

In this particular case, we only used a couple of its features. First, Rx provided a convenient way to chunk the data. Its Buffer operator provides a declarative way to say: I’d like items batched into groups, but I never want a delay longer than 100ms before it gets displayed, and if items are being produced really quickly, split them into batches of no larger than 5,000. That just takes a single method call. Second, Rx also provided a way to control exactly where the context switch happened in the processing pipeline.

But this is just tiny fragment of what you can do. If you are not familiar with Rx, learn it. You owe it to yourself.


Batch Updates with INotifyCollectionChanged - Friday 22 February, 2013, 6:00 PM

Last time, I promised the conclusion to my series exploring the performance of asynchronous and multithreaded techniques for loading and displaying moderately large volumes of data in WPF applications. However, you’ll have to wait a little longer, because this is a last minute bonus entry to the series.

I received email from a couple of people making the same suggestion. (One came from Samuel Jack. And the other guy chose to remain anonymous, and haven’t heard back yet.) The suggestion was that I might be able to improve performance further with a custom implementation of INotifyCollectionChanged, the interface that ObservableCollection<T> implements to notify WPF data binding when items are added, removed, or otherwise changed.

Recall that in the previous blog entry, I pondered how fast we could possibly go, with a view to working out whether it was worth expending any more effort. I attempted to measure the minimum time required to load 180,000 items into a list box given the following constraints: items must appear as soon as some are available, and that the UI must remain responsive throughout. I concluded that on my system the best possible time was 2.2 seconds. Since my implementation with real data was taking 2.3 seconds, it didn’t seem to be worth looking to go any faster.

However, my conclusion relied on a couple of implicit assumptions. First, I was assuming that we’re using a databound ItemsControl to display the data. Obviously, it would be possible to make things faster by building something completely bespoke, but at that point you’re throwing out a large part of the benefit WPF has to offer. Second, and more subtly, I was assuming that we would be using ObservableCollection<T> as our data source. To be honest, I didn’t even give this much thought, because .NET’s collection classes are pretty good, and there’s usually not much to be gained from trying to roll your own.

As it turns out, I was right—ultimately we can’t gain much with a custom list. But I have to admit that I was right by accident, because I hadn’t originally explored this option. And it’s interesting to try this approach, because at first glance it presents a tantalising opportunity for improvement. Unfortunately, there’s a problem: although a custom collection can deliver much faster performance, there are issues with the resulting user experience.

Batching Collection Change Notifications

What might we gain with a custom collection implementation? The most obvious missing feature in ObservableCollection<T> is support for any kind of batch update. It does not offer an equivalent to the List<T> class’s AddRange method, nor does it offer any sort of BeginUpdate/EndUpdate idiom of the kind available with the classic Win32 list controls. If a custom list could provide this while still being able to deliver change notifications, perhaps we could reduce the overhead.

Here’s my attempt at an observable collection that supports batch updates:

public class BatchingObservableCollection<T> : ObservableCollection<T>
{
    private bool _inBatchUpdate;

    public IDisposable BatchUpdate()
    {
        if (_inBatchUpdate)
        {
            throw new InvalidOperationException("Batch update already in progress");
        }
        _inBatchUpdate = true;
        return new UpdateDisposable(this);
    }

    protected override void OnCollectionChanged(NotifyCollectionChangedEventArgs e)
    {
        if (!_inBatchUpdate)
        {
            base.OnCollectionChanged(e);
        }
    }

    protected override void OnPropertyChanged(PropertyChangedEventArgs e)
    {
        if (!_inBatchUpdate)
        {
            base.OnPropertyChanged(e);
        }
    }

    private void EndBatch()
    {
        if (_inBatchUpdate)
        {
            _inBatchUpdate = false;
            OnPropertyChanged(new PropertyChangedEventArgs("Count"));
            OnPropertyChanged(new PropertyChangedEventArgs("Item[]"));
            OnCollectionChanged(new NotifyCollectionChangedEventArgs(
              NotifyCollectionChangedAction.Reset));
        }
    }

    private class UpdateDisposable : IDisposable
    {
        private readonly BatchingObservableCollection<T> _parent;

        public UpdateDisposable(BatchingObservableCollection<T> parent)
        {
            _parent = parent;
        }

        public void Dispose()
        {
            _parent.EndBatch();
        }
    }
}

This provides a BatchUpdate method. It returns an IDisposable, so the idea is that you’d write a using block. (That’s why the EndBatch is private—it gets called when you Dispose the object returned by BatchUpdate.) If code inside the using block modifies the collection, it won’t raise any change notifications. At the end of the block, Dispose will be called, at which point the collection will re-enable the ability to raise events for future changes. It also has to raise some events immediately so that data binding will know that the list has changed. My code performs some property change notifications for correctness, but those aren’t the main point of interest here—the key is the collection change notification of type Reset.

(Samuel Jack’s implementation was slightly different. He implemented an AddRange method which bypassed the public Add method, adding the items directly to the underlying collection. But it ended up raising exactly the same events at the end to notify data binding of the changes.)

This use of the Reset action is at the heart of why this technique is problematic. The INotifyCollectionChange interface does have some support for reporting bulk changes. For example, you can provide an Add notification for a range of items. Unfortunately, if you try this from a collection bound to a WPF items control, WPF throws an exception back at you, complaining that it doesn’t support range actions. The only bulk operation it understands is a Reset, which is supposed to signify that the entire collection just changed. That’s definitely overkill here, but WPF seems to force us to use either that or per-item Add events. (And per-item events are precisely what we’re trying to avoid, making Reset our only option.)

Update Performance

With this custom collection in place, I first tried just adding faked items to get an idea of the best possible raw performance. Remember, when I performed a similar test last time with an ObservableCollection<T>, adding 180,000 items takes 1.5 seconds on my system, and if we yield the dispatcher thread just often enough to remain responsive, that goes up to 2.2 seconds. With this new collection it took just 0.4 seconds, a considerable improvement. And at first glance, it appears to work—it displays the items correctly, starts showing them immediately, and remains responsive.

So then I tried loading real data. I used the Rx-based chunking approach shown last time with my custom collection class. This took 1 second. Remember that the best case for simply loading the data off disk and processing it (without displaying anything) was 0.8 seconds. And now we’re able to display some data immediately, remaining responsive while the rest of the data loads, and the entire process is only 0.2 seconds slower than the raw work of reading and parsing the data. And this is well over twice as fast as the previous solution.

This would be excellent were it not for the problems it causes.

UI Problems

The downside with this technique is that it makes certain aspects of the UI flaky. It’s not as bad as I had initially thought it would be—I thought the Reset change notification might cause WPF to forget which item was selected, and to lose track of the list’s scroll position. (Repopulating a list from scratch will normally do that.) In fact, it was smart enough to keep track of both things. But there were more subtle problems.

I noticed that if I clicked on elements in the list while it was being updated, sometimes the click would have no effect. My hypothesis was that WPF was destroying and recreating the list elements each time the data source reported a Reset. To test this, I wrote a user control which picks a random background colour when constructed, and used this in the data template for my list box. And sure enough, all the list box items continuously changed colour for as long as my program updated the list, confirming my suspicion that WPF was rebuilding all the visible elements each time the list raised a Reset notification.

Without the background colour hack, you wouldn’t normally see that anything was changing—each time the list raised a Reset change notification, WPF would tear down all the list box items and load new ones, but because they were all representing the same underlying data items as before, and because it carefully preserved the selection index and scroll offset, the new items looked identical to the ones they replaced. And because WPF is careful about how and when it updates the screen you don’t see any flickering—the new items seamlessly replace the old ones, and since they will normally look identical, you don’t see it happening. Normally, the only evidence is when they start behaving oddly when you interact with them.

Once we know this is happening, it’s not surprising that clicks are going missing. If you happen to click on an item just as it’s about to be replaced with a new, identical-looking one, that click never gets handled. It was destined for an item that was removed before it had a chance to handle the input, and there’s no mechanism for reassigning that mouse input to the replacement item.

In my tests, the list box items have been just plain text, but in real applications, I often put more complex content in items controls. Clearly, if items had any children that could independently receive the focus, this continuous re-building of the items would mess that up, repeatedly resetting the focus. (And in case you’re wondering, enabling container recycling on the list control does not appear to help. With my background colour hack, I still see the colours changing with recycling enabled. And that’s to be expected—reusing the containing ListBoxItem seems unlikely to fix anything when the actual content hosted by that container is repeatedly being destroyed and recreated.)

So in although a custom list class with batch change support looks good at a first glance, but disappoints in the long run. It offers great performance, but the fact is that it causes problems when the user tries to interact with the list items, which rather reduces the value of remaining responsive to user input while the list updates. So in practice, the technique shown last time continues to be our best bet.


WPF Threads and Chunking with Rx - Wednesday 20 February, 2013, 10:44 AM

This is the fourth blog in a series exploring how asynchronous and multithreaded techniques can affect the performance of loading and displaying moderately large volumes of data in WPF applications. The first part showed how the async and await keywords alone don’t offer a panacea, and that with WPF data binding, timing can have a large impact on performance. The second part showed how minor modifications to IO settings can significantly improve performance by reducing the number of times an asynchronous method has to relinquish its thread and later regain control. The third part demonstrated some slightly unsatisfactory multithreaded approaches to the same problem. In this fourth part, I’ll show a better multithreaded solution.

Recall that last time, I showed code that produced data items on a thread pool thread. This did not perform as well as some of the asynchronous solutions, and this seems to be down to the way WPF arranges to process the items back on the dispatcher thread—it seems to end up handling them one at a time. The biggest performance improvements for the asynchronous versions came from enabling WPF to work with larger chunks of items. We need to do the same thing for our multithreaded version if it’s going to have a chance of performing well.

In short, we want our worker thread to group the items it produces into chunks, and for the dispatcher thread to receive those chunks instead of individual items. This sounds like a job for the Reactive Extensions for .NET (or Rx for short). Here’s an Rx version of our log reader:

public static IObservable<string> CreateLogReader(string folder)
{
    return Observable.Create<string>(obs =>
    {
        try
        {
            foreach (string logfile in Directory.GetFiles(folder, "*.log"))
            {
                using (var reader = new StreamReader(logfile, Encoding.UTF8,
                                                     true, 65536))
                {
                    int column = -1;
                    while (!reader.EndOfStream)
                    {
                        string line = reader.ReadLine();

                        if (line == null)
                        {
                            break;
                        }

                        string[] fields = line.Split(' ');
                        if (line.StartsWith("#Fields"))
                        {
                            column = Array.IndexOf(fields, "cs-uri-stem") - 1;
                        }
                        else if (line[0] == '#' || fields.Length < (column + 1))
                        {
                            continue;
                        }
                        if (column >= 0)
                        {
                            string uri = fields[column];
                            obs.OnNext(uri);
                        }
                    }
                }
            }
            obs.OnCompleted();
        }
        catch (Exception x)
        {
            obs.OnError(x);
        }

        return Disposable.Empty;
    });
}

This is the same logic as before, but it now provides the values it receives to anyone who subscribes to it—it no longer decides what to do with the values itself. We can now use this with some logic that makes our requirement for chunking explicit:

private async void OnLoadClick(object sender, RoutedEventArgs e)
{
    var logObv = LogReader.CreateLogReader(FolderPath).Publish();
    var chunked = logObv.Buffer(TimeSpan.FromMilliseconds(100), 5000);
    var dispObs = chunked.ObserveOnDispatcher(DispatcherPriority.Background);
    dispObs.Subscribe(logUris =>
    {
        Trace.WriteLine(logUris.Count);
        foreach (string uri in logUris)
        {
            _reader.LogUris.Add(uri);
        }
    });
    await Task.Run(() => logObv.Connect());
    await dispObs.ToTask();
}

This probably requires a little explanation. The call to Publish returns a wrapper around our observable that lets us chain together a bunch of subscribers, while providing a Connect method that we can call when we’re ready to kick it all off. The Buffer method is the key here: this tells Rx that we want items from the source grouped into bunches (or ‘buffers’) that are at most 100ms apart, but which contain no more than 5,000 items. (This means that if the underlying source is producing any items at all, we won’t have to wait more than 100ms, but if it produces more than 5,000 items within 100ms, it’ll break it into multiple batches for us, with each chunk containing no more than 5,000 items. I added the 5,000 item limit because I found that giving data binding much more than this in any single chunk caused a visible degradation in responsiveness.)

Next, we subscribe to this chunked version of the data by calling ObserveOnDispatcher. I’ve indicated that I want to process data at the Background priority level, which ensures that my handler will only run after WPF has done more important things like handling user input and updating data bindings. (If this handler ran at a higher priority than data binding, we might not see any updates until the data has finished loading.)

Finally, I call Connect, but I do it via Task.Run, to ensure that the code that does the actual work of reading data out of the log files runs on a thread pool thread.

So I’ve got the potential speed benefits of a separate thread. (As we saw in part 3, in this particular application synchronous execution on a separate thread turned out to be the fastest way to read and process the data. It only fell down when we wanted to start showing results immediately.) I’ve also explicitly arranged to batch the retrieved items into reasonably large chunks if they start coming in really fast, but if the items are not coming in their thousands, we won’t sit on data for longer than 100ms, which ensures that items will become visible reasonably promptly.

The upshot is that this starts showing me data almost immediately, and finishes loading and displaying all the data in 2.3 seconds. That’s faster than my original synchronous implementation! It’s not quite as good as the later, improved synchronous version which was able to load all the data in 0.8 seconds, but that one caused the UI to freeze while it loaded the data. The multithreaded version of that also took 0.8 seconds, and avoided freezing the UI, but it waiting until it had finished loading all the data before showing anything, so it made the user wait for longer in practice.

How Fast Can We Go?

All of the approaches that went any faster than 2.3 seconds involved preventing data binding from seeing our data until we had fully populated the collection, and you just can’t do that if you want to show some data immediately. If you want to show something as fast as possible, and the rest later, you’re committed to adding most of the rows after you’ve bound the collection to the list. So I started to wonder how much scope for improvement remained —given that we need to bind the collection before loading most of its contents, what’s the very fastest we can go?

Out of interest, I decided to try generating 180,000 lines of fake data on the UI thread after setting the DataContext. The idea here was to find out how quickly you can load that much data into a data source that is already connected to data binding. It took 1.5 seconds (and the UI was unresponsive for that time). Since the code that loads real data took 2.3 seconds, you might think that there is still room for improvement. However, I then wondered what the minimum unavoidable overhead due to remaining responsive is.

So I tried another experiment: generating 180,000 lines of fake data on the UI thread, but returning control back out to the dispatcher’s message loop every so often. (That’s a prerequisite for remaining responsive.) This turns out to ramp up the cost surprisingly quickly. If I yield the thread just once every 50,000 items (which ends up yielding only 3 times here) it takes 1.8 seconds. Yielding every 10,000 items, which is not quite enough to keep the application feeling smoothly responsive, brings us up to 2.1 seconds. So yielding just 18 times adds some 0.6 seconds to the execution time! Yielding once every 5,000 items—comparable to my real code (although that also adds time-based chunking, so it’s not identical)—it takes 2.2 seconds.

I’m not quite sure why yielding costs so much, but it presumably has something to do with exactly how WPF defers list binding work. But the bottom line is that simply generating 180,000 lines of fake data and yielding the UI thread often enough to remain responsive takes 2.2 seconds; our Rx-based multithreaded solution takes just 2.3 seconds. (And remember, this is doing real work which takes at least 0.8 seconds to complete.) So it seems that we’re within a whisker of the best possible performance, and definitely at the point of diminishing returns.

In the final part of this series, I’ll offer some conclusions.


Too Much Too Fast with WPF and Threading - Tuesday 19 February, 2013, 10:04 AM

This is the third blog in a series exploring how asynchronous and multithreaded techniques can affect performance when loading and displaying moderately large volumes of data in WPF applications. The first part showed how simply adding the async and await keywords does not necessarily deliver the performance you might hope for. It also showed that the exact moment at which you choose to make information visible to WPF data binding can have a large impact on performance. The second part showed how minor modifications to IO settings can speed things up significantly by reducing the number of times an asynchronous method has to relinquish its thread and later regain control. In this third part, I’ll show some multithreaded approaches to the same problem.

This uses the same examples as before—I’m using the LogReader class introduced in part 1, with the IO tweaks applied in part 2. By the way, all the timings I’m reporting here are for the first run after starting the program, because that’s more relevant to perceived UI performance than the more usual benchmarking approach of averaging hundreds of runs and discarding the outliers. The outliers are the ones your end users will notice, so for UI work, the first and worst cases are typically much more important than the average (and the first case is often also the worst case). In fact, I think that the common approaches to measuring performance can do great harm to responsiveness in practice, but that’s a topic for another post.

This next bit of code uses the TPL to run the synchronous log reading code on a thread pool thread. It uses the C# 5 asynchronous language support to wait for that work to complete, so that it can put the reader into the data binding context when the work completes.

private async void OnLoadClick(object sender, RoutedEventArgs e)
{
    await Task.Run(() => _reader.ReadLogs(FolderPath));
    DataContext = _reader;
}

From one point of view, this gets the best performance yet: the UI remains responsive, and it only takes 0.8 seconds to read the data.

On the other hand, we have to wait until all the processing work is done before we see any results in the UI. Instead of asking “How long did that take?” I could argue that the more important question is “How long are we making the user wait?” By that reckoning, the original naive asynchronous implementation comes out on top. It may take 3.2 seconds (once we’ve tweaked the buffer size) but it produces usable results virtually immediately. (It seems to take about 0.01 seconds to show something, at which point the limiting factor is how long it takes your video hardware to get around to refreshing the screen.)

But you may be thinking perhaps we can get the best of both worlds? Could we show some results immediately, and still be finished in around a second? It might occur to you to modify our last attempt by binding before running, i.e., going back to putting the reader in the data context before setting it running. But that turns out not to be a great plan.

Naive Updating Multithreaded Version

If you try swapping the two lines in the last method, so that the Task.Run executes after we put the reader in the data context, you’ll get the classic WPF cross-thread collection binding error: a NotSupportedException with an error message of “This type of CollectionView does not support changes to its SourceCollection from a thread different from the Dispatcher thread.” Although WPF is perfectly happy to support individual property changes from random threads, it doesn’t like that with collection changes.

However, with .NET 4.5, Microsoft made it possible to avoid this problem. You must protect all access to the collection with some sort of locking mechanism, which you must make available to WPF. There are various ways to do that. I’ll add a property to hold an object used for locking, and I’ll make my code acquire that when it updates the collection:

public class LogReader
{
    public LogReader()
    {
        LogUris = new ObservableCollection<string>();
        CollectionLock = new object();
    }

    public object CollectionLock { get; private set; }
        
    public ObservableCollection<string> LogUris { get; set; }

    public void ReadLogs(string folder)
    {

...

        lock (CollectionLock)
        {
            LogUris.Add(uri);
        }
...

I also have to tell WPF that this is how the collection is protected. I’ve done that with the following in my code-behind’s constructor:

BindingOperations.EnableCollectionSynchronization(
    _reader.LogUris, _reader.CollectionLock);

With this in place, we can safely update the log data collection after having put the reader into the data context. The reading occurs on a worker thread, enabling the UI to remain responsive, and thanks to enabling the cross-thread collection change handling, we get our immediate UI updates.

However, this turns out to be slower than even the naive asynchronous version. Measuring the time here has turned out to be difficult, because WPF seems to queue up all of the work required for collection updates, and I’ve not found a reliable way to discover the exact moment at which it finishes draining that queue, but using a manual stopwatch I’ve found that it takes over 5 seconds to finish populating the list. (It seems to be about the same with or without the 64K buffer IO tweak, by the way. The overheads of cross-thread collection change handling are so dominant that the previously effectively modification is now lost in the noise.) That’s a lot slower than our naive asynchronous implementation was after the buffer size tweak—that took 3.2 seconds to load all the data (and it showed something immediately too).

The problem with this multithreaded approach is that we’ve lost any chance for batching. The asynchronous approaches shown in earlier parts of this series were all single-threaded. Because everything happened on the UI thread, we were forcing data binding to process our updates in chunks because we were only relinquishing the thread after having produced a chunk of items. But by moving the log reading to a separate thread, we’ve lost that influence over the UI thread.

Mind you, getting data binding to process things in chunks purely by monopolizing the dispatcher thread is a somewhat questionable technique. Perhaps if we were a bit more deliberate about things we could do better. So I’ll show a more explicit approach to chunking in the next part of this series.

I was going to end this post here, but then I received some feedback.

ConfigureAwait

Petr Onderka sent me an email in response to the second part of this series, in which he suggested an alternative way to avoid the overhead of funneling too much work through the WPF dispatcher. He pointed out that the Task-based Asynchronous Pattern provides a way to disable that, with the following simple modification to the code that reads a line from a log file:

string line = await reader.ReadLineAsync().ConfigureAwait(false);

That call to ConfigureAwait declares that we don’t care about which context the method continues on. The upshot is that when a read that cannot complete immediately does eventually finish, the deferred execution of the rest of the method will happen on a thread pool thread. This means our await no longer incurs any WPF dispatcher overhead. But of course, it also means that all our list updates will happen on a worker thread, so we’ll need to use the same tricks as before to avoid problems: either we’ll need to wait until we’re done before making the list visible to data binding, or we’ll have to enable cross-thread change notification handling.

If we use the first approach—setting the DataContext after the work is complete—the ConfigureAwait solution in conjunction with the 64K StreamReader buffer takes 1.1 seconds. This is not a massive improvement: the comparable asynchronous version without ConfigureAwait took just 1.2 seconds. But remember, we had already neutralized most of the dispatcher costs—using a 64K buffer reduced the number of threads switches from about 40,000 to about 600. If I choose not to use the 64K buffer, then without ConfigureAwait we saw last time that it took 2.8 seconds (as long as we set the DataContext after loading the data). In this case, adding ConfigureAwait makes a bigger difference, bringing it down to 1.8 seconds.

So to summarize, ConfigureAwait on its own makes a useful difference—it gets us from 2.8 seconds down to 1.8. But using 64K buffers alone was significantly more effective, getting us down to 1.2 seconds. Combining the techniques can improve matters slightly further, but it’s marginal, bringing us down to 1.1 seconds. This is not as good as a synchronous code running on a worker thread, which took only 0.8 seconds. The asynchronous version still has to pay a price for continuing after an await expression. Taking the WPF dispatcher out of the picture helps, but it doesn’t bring the cost down to zero, so in any case, the most important thing is to reduce the number of times that potentially asynchronous operations are unable to complete immediately.

What about the alternative scenario, in which we set the DataContext first? (Remember, we need to do that if we want to start seeing initial results immediately, instead of having to wait until the work is complete.) If we’re using ConfigureAwait to avoid continuing via the dispatcher thread, our collection updates will happen on a worker thread, so we’ll need to enable cross-thread change notification, just like in the task-based multi-threaded solution. Unsurprisingly, we hit exactly the same problem: we’ve lost the ability to force WPF to handle the changes in reasonably large chunks, and it seems to process each change individually. So the irony is that by adding a ConfigureAwait call to prevent switching to the dispatcher thread in our own code, we actually cause a lot more work to be delivered through the dispatcher. It just happens in a different place (inside data binding’s cross-thread collection change handling).

As with the multi-threaded version shown earlier, I can’t time this scenario accurately, because I’ve not found a way to discover in code precisely when WPF finishes handling the updates. But using a manual stopwatch, this seems to take about 6 seconds. So that’s slightly slower than the using synchronous code on a separate thread, and much slower than our simple asynchronous implementation with buffer size tweaks applied (3.2 seconds).

The only comparison that makes ConfigureAwait look good is if we go back to the original naive asynchronous implementation without 64K buffers in which we set the data context before starting work. That took 8.5 seconds, and 6 seconds is obviously an improvement on that. But tweaking buffer sizes was a more fruitful approach because that got the asynchronous data-context-first approach down to 3.2 seconds. (As with in the data-context-last scenario, the ConfigureAwait option seems not to get a measurable benefit from buffer size changes, taking 6 seconds in either case.) So if we want immediate results, then once we’ve applied our most effective modification—changing the buffer size—we are better off keeping things on the UI thread. ConfigureAwait is not a good choice in this scenario.

Next time, I’ll show how we can use Rx to take explicit control of how we group our changes into chunks, and manage the number of context switches. This will enable us to get the benefit of immediate results, but with overall performance significantly closer to that of synchronous code.


Tweaking Async IO in WPF - Friday 15 February, 2013, 5:19 PM

In my previous blog, I compared synchronous and asynchronous methods for reading data out of some web server log files and showing them in a WPF user interface. The first synchronous version took 2.6 seconds to process about 180,000 lines, and the naive asynchronous equivalent took 8.5 seconds. With a subtle change of order, I was able to get these figures down to 1 second for the synchronous code, and 2.8 seconds for the asynchronous code. The trick was to prevent WPF’s data binding system from seeing the data until we had finished loading all of it.

I said last time that this is not a fair comparison. This test flatters the synchronous code—it takes 1 second to load the data, but at that point, no data binding work has occurred at all, so there will be more processing required before anything appears on screen. (With the asynchronous version, some of that work is already done by the time the load completes.) Unfortunately, it’s not easy to measure how long that deferred work takes. Even so, measuring manually with a stopwatch, the synchronous version doesn’t seem to be taking significantly longer than a second. (In any case, trying to show 180,000 lines in a ListBox is pretty bad UI design. In practice you’d want to do something to reduce that, so in a better-designed front end, the binding overhead would likely be lower anyway.) So the numbers here are in the right ballpark—it’s a little unfair, but only a little. The asynchronous version is almost three times slower in this example.

Fortunately, we can improve this. We just need to understand why asynchrony is costing us so much here.

Any time you use await with an asynchronous operation that cannot complete immediately, the containing async method will relinquish the thread. (That’s the whole point of using async and await.) When the operation completes, the method needs to regain control of the thread so that it can continue, and this is where we’re paying a heavy price. In my example, the asynchronous method runs on a WPF dispatcher thread, so regaining control will involve posting a message that the WPF message pump will handle. (In other words, every time the await keyword is made to wait, we incur the same overhead as the SynchronizationContext class’s Post method would.)

Remember, I’m processing about 180,000 lines of text here, and the code executes an await expression for each of those. It turns out that roughly 40,000 of those do not complete immediately, so the main reason our fastest asynchronous code takes about 1.8 seconds longer than the fastest synchronous code is that it’s posting 40,000 messages through WPF’s dispatcher loop.

(As I mentioned last time, I’m running these tests on a pretty old system. I built my current desktop in 2008. You would probably see faster results on something newer.)

The proportion of calls to ReadLineAsync that cannot complete immediately is a function of the average line length and the size of the internal buffers used by StreamReader. I can’t change what’s in the files, so the way to reduce this overhead is to use a larger buffer. Here’s a change to the line that creates the StreamReader:

using (var reader = new StreamReader(logfile, Encoding.UTF8, true, 65536))

That gets the number of non-immediate completions from about 40,000 down to roughly 600. This reduces the execution time from 2.8 to 1.2 seconds. And if we apply this tweak to the naive asynchronous version, it comes down from 8.5 to 3.2 seconds. (This is because we’re now only asking WPF to process 600 batches of about 300 lines each instead of 40,000 batches of four or five lines each. So the data binding overhead, while still high, is now a lot better.) This means we can get the benefits offered by the original naive version—seeing useful data immediately and remaining responsive—while only have to wait for 3.2 seconds, just a little longer than our original synchronous attempt.

We could speed things up further still by reading the entire file in one go. This would minimize the IO overheads, but it would introduce two problems. First, it would increase memory usage, and if you’re working with very large log files, that might be a problem. Second, it would prevent us from showing any data until we have all the data, so we wouldn’t be able to provide immediate results, either with synchronous or asynchronous code.

In case you’re wondering, the synchronous version also benefits from this larger buffer—it comes down to 0.8 seconds. But the gap is narrowing—at 1.2 seconds, the asynchronous version is only giving away 0.4 seconds, which is not a massive price to pay for remaining responsive. Nonetheless, it’s still taking 1.5 times as long, which might offend you. (And if you were processing larger volumes of data—400MB of log files rather than 40MB, for example—the extra delay might be more significant.) So you might wonder whether it would be better to attempt to remove the WPF dispatcher from the picture entirely. One way to do this would be to use multithreading instead of an asynchronous method. I’ll show how that works out next time.


Too Much, Too Fast with WPF and Async - Thursday 14 February, 2013, 11:18 AM

.NET 4.5 added asynchronous language features for C# and VB. For the most part, this has made it much easier to improve a user interface’s responsiveness—you can use asynchronous APIs to perform potentially slow work in a way that will not cause your user interface to freeze, and yet you can use simple programming techniques that look very similar to those used in single-threaded code. However, this is not a panacea. There are some kinds of slow, IO-oriented work (i.e., the kind of work that often benefits most from asynchrony) where a simple application of these techniques won’t help as much as you might hope.

For example, you can run into trouble if you’re doing something that’s slow, but not quite slow enough. If you want to display a large amount of data from a file, a naive asynchronous (or, for that matter, multithreaded) approach can run into problems. The async and await keywords deal easily with long waits, but if you’re doing something slightly more busy you may need to apply these techniques with a bit more subtlety. This is the first in a series of blog posts exploring these issues.

The other parts can be found here:

The following code reads web server .log files. It reads all the files in a folder, and picks out the cs-uri-stem column, putting the result in an ObservableCollection<string>. We can bind that to a ListBox to show all of the URLs that have been fetched. (It’s a little crude—it ignores query strings for example, but for the log data I’m looking at, that happens not to matter, and the processing details aren’t the main point here.)

using System;
using System.IO;
using System.Collections.ObjectModel;

public class LogReader
{
    public LogReader()
    {
        LogUris = new ObservableCollection<string>();
    }

    public ObservableCollection<string> LogUris { get; set; }

    public void ReadLogs(string folder)
    {
        foreach (string logfile in Directory.GetFiles(folder, "*.log"))
        {
            using (StreamReader reader = File.OpenText(logfile))
            {
                int column = -1;
                while (!reader.EndOfStream)
                {
                    string line = reader.ReadLine();
                    if (line == null) { break; }

                    string[] fields = line.Split(' ');
                    if (line.StartsWith("#Fields"))
                    {
                        column = Array.IndexOf(fields, "cs-uri-stem") - 1;
                    }
                    else if (line[0] == '#' || fields.Length < (column + 1))
                    {
                        continue;
                    }
                    if (column >= 0)
                    {
                        string uri = fields[column];
                        LogUris.Add(uri);
                    }
                }
            }
        }
    }
}

I have a folder with about 40MB of log files, containing about 180,000 log entries. If I process this with my LogReader, and use it as the data source for a WPF ListBox, my aging desktop system takes about 2.6 seconds to load the information in that folder.

That’s clearly slow enough to be annoying, so you might think this would be an obvious candidate for going asynchronous. We’re loading a moderately large quantity of data from disk, so at least some of the work will be IO bound, and that’s where async usually shines.

Naive Asynchronous Implementation

At this point, an overenthusiastic developer might think “Ooh—async and await!” and make the obvious modifications. With .NET 4.5 and C#5 this is pretty simple. First, we need to change the signature of the method, adding the async keyword to enable the language features, and returning a Task to enable asynchronous completion, error reporting, and composition with other tasks:

public async Task ReadLogsAsync(string folder)

We also need to change the line of code that reads from the file in the middle of the loop to use the asynchronous alternative to ReadLine:

string line = await reader.ReadLineAsync();

With these changes in place, the UI remains responsive while the data loads, and we start seeing the log entries in the UI immediately. Those are two useful improvements on the synchronous version, but we’ve paid a significant price: the code now takes about 8.5 seconds to run.

Comparing this with the synchronous code’s 2.6 seconds is a little misleading. For one thing, the synchronous version takes more like 4 seconds the very first time it runs. (My system drive is a pair of SSDs, but these logs live on my larger data drive, which is of the spinning rust variety.) It only gets up to full speed once the files are in the OS’s cache, and under those circumstances, asynchronous code has much less to offer. The whole point is to free up threads while slow IO is in progress, but if everything you need is already in memory, the IO won’t be so slow. So we’re being slightly unfair on the asynchronous version by removing its main opportunity to show an advantage. However, since the IO only slows things down by about 1.4 seconds when things aren’t in the cache, it’s hard to see how the asynchronous code might make up the deficit even in more favourable conditions.

Now this might scare you off the C# asynchronous language features, because it makes them seem slow, but this is not a fair test. The reason this takes so much longer is that we’ve given the program much more work to do. When the simple, synchronous version runs on the UI thread, WPF does very little immediate work for each item we add to the LogUris collection. Data binding will detect the change—we’ve bound a ListBox to that collection, so it’ll be looking for change notification events—but WPF won’t fully process those changes until our code has finished with the UI thread. Data binding defers its work until the dispatcher thread has no higher priority work to do.

However, when we run the asynchronous version, the ReadLineAsync method will keep relinquishing the thread—it’ll do that every time the StreamReader class’s ReadLineAsync method does not complete immediately. (Even though the files are in the cache, the OS still has to copy the data into our process, so although file reads are very quick, they will not all complete synchronously.) And each time our method relinquishes the thread, WPF’s data binding takes that opportunity to process all the work that it had deferred.

In the non-async case, WPF processes all of the additions to LogUris as a single bulk operation, but by going async we’ve made it process lots of smaller batches. This does have two benefits: log entries start appearing in the list almost immediately, and the application remains responsive to user input while the data loads. But we’ve paid a heavy price for it: it now takes over three times as long for the operation to finish.

We can verify that this is behind the slow down by modifying the UI code that does the data binding. Instead of putting the LogReader in the DataContext when the app starts up, we can wait until after the log data has been read:

private async void OnLoadClick(object sender, RoutedEventArgs e)
{
    await _reader.ReadLogsAsync(FolderPath);
    DataContext = _reader;
}

This means that data binding doesn’t get to see the data source until we’ve finished loading the data. With that modification, it only takes about 2.8 seconds to run. This implies that there’s a small overhead for using asynchronous code—it takes 0.2 seconds longer than the synchronous version, a barely perceptible increase when you’ve already been waiting 2.6 seconds. And in exchange for that, the UI remains responsive while we’re waiting for the work to complete. (However, as we’ll see shortly, this result is slightly misleading.)

Although this is much faster than the 8.5 second case, we’ve lost something: that slower example produced useful results in the UI much faster. In fact, a user might prefer the slower version in practice, because if useful data appears immediately, you might not even notice that it takes three times longer to finish populating the list—it was probably going to take a lot longer than 8.5 seconds to scroll down through the whole list anyway. So by one important measure, the naive asynchronous method is better: it provides useful information to the user sooner.

I said earlier that comparing this 2.8 second asynchronous code with our original 2.6 second synchronous version wasn’t entirely fair. We can try the same trick of holding back the data source until we’re ready with the synchronous case:

private void OnLoadClick(object sender, RoutedEventArgs e)
{
    _reader.ReadLogs(FolderPath);
    DataContext = _reader;
}

This drops the time to 1 second! So it turns out that although the original synchronous code was forcing WPF to update the ListBox in a single batch operation, there were still significant data binding overheads. Binding to the LogUris collection caused WPF to attach a change notification handler. And because we did this before populating the list, this handler ran for each of the 180,000 entries added to that list. Although that change handler ultimately deferred the work until our code relinquished the UI thread, it turns out that arranging to defer work is not completely free. Of the 2.6 seconds the original version took, only 1 second of that actually went into reading and processing the data—the rest was all spent in data binding’s event handlers!

So the asynchronous version isn’t looking so good now. It’s a lot faster than our first 8.5 second effort, but the 2.8 second version now looks pretty slow in comparison with the equivalent synchronous code’s 1 second.

In fact, this is still not a fair comparison. For now, I’ll leave you to ponder why. I shall explain next time.


Native WinRT Inheritance - Sunday 25 September, 2011, 2:57 PM

In my previous blog post on Real Native WinRT development, I wrote some native C++ code that created an instance of the Windows.UI.Xaml.Application class, and called its Application.Run method. Because I was using pure native C++ instead of the compiler’s WinRT language projection that sane developers will use, this very simple task took a short helper function and 15 lines of code. (If you read that article, you may recall that there was an unresolved bug—the Run method was failing. I’ve since updated the post to fix that. It turned out that I was using the wrong threading model. Surprisingly, you have to initialize the UI from a multithreaded context.)

While the code was verbose compared to the equivalent when using the WinRT language projections—a mere two lines—it’s about to get even more complicated. Although my previous example did technically run, it turns out that to do anything useful in a WinRT Xaml application, we need to derive a type from the built-in Application class. (That’s because you have to override at least one method to be able to show a non-empty UI.) With the WinRT language projections, inheritance is straightforward:

ref class App :  public Windows::UI::Xaml::Application
{
    ...
};

But I’m choosing to do this natively because I like to understand what’s going on under the covers. In the raw native world, inheritance looks rather more complex.

Just to be clear I’m doing this to learn how WinRT really works. I would NOT normally write a C++ WinRT application this way.

WinRT uses COM for its underpinnings, and classic COM never supported implementation inheritance. So the first challenge was to work out how WinRT maps the concept of inheritance into a COM-like world. I’ve not yet found any clear documentation for this, but there’s a clue in the Application class documentation: the Attributes section shows that the class has a ComposableAttribute with a first argument of Windows.UI.Xaml.IApplicationFactory. I’ve not found any documentation for that interface, but its definition lives in Windows.UI.Xaml-coretypes.h, and from that we can see that this interface defines a single member:

HRESULT CreateInstance(
  /* [in] */ IInspectable *outer,
  /* [out] */ IInspectable **inner,
  /* [out][retval] */ Windows::UI::Xaml::IApplication **instance)

But where do I get an object that implements this IApplicationFactory interface? Last time, I called RoActivateInstance to create an instance of the Application class. But if I want to get hold of the factory for that class I need a different API: RoGetActivationFactory. All WinRT classes provide an activation factory, which is what RoActivateInstance uses to create new objects. (It’s very similar in concept to a classic COM class factory.) But when doing something fancier, such as inheritance, we need to talk directly to that factory, rather than letting RoActivateInstance do that for us.

Inheritance and COM

Here’s how WinRT makes inheritance work in a COM-like world. It appears to involve three objects:

We provide the ‘outer’ object, while the ‘inner’ and the wrapper objects are provided by the WinRT type from which we are deriving. These three objects correspond to the three arguments to IApplicationFactory.CreateInstance. (I’ve modified the order, because to me, this order makes more sense: we start with the base type, we override virtual methods, and the result is a new type that is the combination of the base object and the overrides.)

Although this system provides a way to model inheritance, it’s actually quite different from how inheritance looks in ordinary C++, C#, or even in the C++ language projection for WinRT. First of all, it’s based on object instances, rather than types. But also, unlike with C++ (or C#) inheritance, we have to define overrides for all virtual methods. That’s because our ‘outer’ object defines overrides by implementing a special ‘overrides’ interface specific to the base class. If you look at a non-sealed WinRT class, you’ll usually see that it implements two interfaces. For example, the Application class implements IApplication and IApplicationOverrides. That second interface defines the methods which inheriting objects can override. (Remember, in COM everything looks like a method, including properties, so virtual properties use this same mechanism.)

Since COM interface implementation is an all-or-nothing prospect, we have to implement every overridable method. If we don’t really want to override a method, we just call back to the base implementation, using the ‘inner’ object returned by the CreateInstance method.

If you use the C++ WinRT language projection, you can see the compiler generating exactly this sort of code. If you pass the /d1ZWtokens switch, the compiler will display the full token stream of the code it’s compiling, including any generated code. (Beware: this will slow down compilation considerably, as it dumps several megabytes of code into Visual Studio’s mysteriously slow Output panel.) Actually it generates a bit more than a plain old call to the base class. Here’s the code it produces in an ordinary WinRT project (with the WinRT projection enabled) for a method that I’m not overriding:

inline long __stdcall ::HighLevelWinRTClient::App::
  __cli_Windows_UI_Xaml_IApplicationOverrides____cli_OnInitialize()
{
    struct Windows::UI::Xaml::IApplicationOverrides^ __tempValue;
    long __hr = __cli_baseclass->__cli_QueryInterface(
        const_cast<class Platform::Guid%>(
            reinterpret_cast<const class Platform::Guid%>(
                __uuidof(struct Windows::UI::Xaml::IApplicationOverrides^))),
        reinterpret_cast<void**>(&__tempValue));

    if (!__hr) 
    {
        __hr = __tempValue->__cli_OnInitialize();
    }

    return __hr ; 
}

This uses a class member called __cli_baseclass, which is where the compiler stores that ‘inner’ IInspectable that comes back from CreateInstance. It calls QueryInterface to ask for the IApplicationOverrides interface, and uses that to call the base class’s original implementation of the method. This seems pretty inefficient—it’s clearly going to be quicker to ask for that interface just once and store that, rather than doing a QueryInterface every time the method runs. I’ll take that more efficient approach when I write my derived class by hand—the joy of real native code is that you can control this sort of thing.

And the horror of real native code is that you have to write this sort of thing, whether you want control over it or not. We now have some work to do: we must implement this IApplicationOverrides interface on an object which we pass into the activation factory’s CreateInstance method as the ‘outer’ object. Since this is a WinRT interface it derives from IInspectable. (You can see that the first argument of IApplicationFactory.CreateInstance takes an IInspectable* argument.) Remember that IInspectable is the new interface from which all WinRT interfaces derive, and it derives from the classic COM IUnknown interface. So we’ll need to supply a working COM object with a viable implementation of IInspectable. While I could write that by hand, it’s not much fun. Fortunately, it turns out that I don’t have to write it completely from scratch, even in this fully-native world. I can turn to a new library, called WRL.

WRL (Windows Runtime Library?)

I’m not entirely sure what WRL stands for because I’ve not found the documentation for it yet, but I’m guessing it might be the Window Runtime Library. Whatever it stands for, it appears to be the successor to ATL, which was a widely used library for writing COM components in C++. If you’re familiar with ATL, you’ll feel at home with WRL, but of course WRL knows about WinRT things like IInspectable, which is why we’re using it. Here’s the class definition for my Application-derived class:

namespace LowLevelWinRTClient
{
  using namespace Windows::ApplicationModel::Activation;

  class DerivedApp :
      public Microsoft::WRL::RuntimeClass<Windows::UI::Xaml::IApplicationOverrides>
  {
    InspectableClass(L"LowLevelWinRTClient.DerivedApp", TrustLevel::BaseTrust)

    Microsoft::WRL::ComPtr<Windows::UI::Xaml::IApplicationOverrides> _spBaseImplementation;

  public:
    DerivedApp(void);
    ~DerivedApp(void);

    void SetBase(Windows::UI::Xaml::IApplicationOverrides* pBaseImplementation)
    {
      _spBaseImplementation = pBaseImplementation;
    }

    HRESULT STDMETHODCALLTYPE OnInitialize();
    HRESULT STDMETHODCALLTYPE OnActivated(IActivatedEventArgs *args);
    HRESULT STDMETHODCALLTYPE OnLaunched(ILaunchActivatedEventArgs *args);
    HRESULT STDMETHODCALLTYPE OnFileActivated(IFileActivatedEventArgs *args);
    HRESULT STDMETHODCALLTYPE OnSearchActivated(ISearchActivatedEventArgs *args);
    HRESULT STDMETHODCALLTYPE OnSharingTargetActivated(IShareTargetActivatedEventArgs *args);
    HRESULT STDMETHODCALLTYPE OnFilePickerActivated(IFilePickerActivatedEventArgs *args);
  };
}

There are a few things to notice here. The first is that this derives from WRL’s RuntimeClass base type. This implements the methods of the IUnknown and IInspectable base interfaces that all WinRT interfaces derive from. So all we have to do is implement the methods specific to whichever interfaces we want to offer.

I’ve passed the IApplicationOverrides interface as a template argument to RuntimeClass—WRL needs to know which interface(s) we want to implement to implement the IUnknown::QueryInterface and IInspectable::GetIids methods correctly. WLR defines several RuntimeClass types, each taking a different number of interfaces as template arguments, but since I only need to implement one interface on this particular object, I’m using the one-argument version.

The next point of interest is the line beginning InspectableClass. That’s a macro which provides information necessary to implement IInspectable. As you may recall, IInspectable offers methods to discover an object’s type name and its trust level, so as you’d expect, we need to provide these to the macro, so that RuntimeClass can implement IInspectable correctly.

I’ve also declared a field to hold the ‘outer’ object—the base implementation that we use for methods that we don’t really want to override. (We can also use it when our overrides need to call the base class as part of their implementation.) I’ve written a public helper method to set that base reference because I’ve chosen to make this IApplicationOverrides object standalone—it won’t actually create its own base instance. You probably wouldn’t do it this way for real, but for this example, I wanted to keep things that are separate COM objects as separate classes, just to make it easier to see all the moving parts required for WinRT inheritance.

Finally, we implement all the methods defined by IApplicationOverrides. (Again, I couldn’t find documentation for this class, so I got this method list by looking at the relevant header file.)

Derived Implementation

Most of my method implementations look something like this:

HRESULT DerivedApp::OnInitialize()
{
    return _spBaseImplementation->OnInitialize();
}

I’m just calling the corresponding method on the base class—this is how we choose not to override a method. If there are arguments, we just pass them straight through. Six of my seven methods defer to the base class like this.

The only method I’m truly overriding is OnLaunched. In the world of the C++ WinRT language projection, the C++ Metro template contains code which provides the application’s window with its initial content, and then activates it, like this:

void App::OnLaunched(LaunchActivatedEventArgs^ pArgs)
{
  Window::Current->Content = ref new MainPage();
  Window::Current->Activate();
}

Here’s raw native code that does roughly the same thing.

HRESULT DerivedApp::OnLaunched(ILaunchActivatedEventArgs *args)
{
  using namespace Windows::UI::Xaml;

  ComPtr<IWindowStatics> spWindowStatics;
  HRESULT hr = Windows::Foundation::GetActivationFactory(
    ComHstring::Make(RuntimeClass_Windows_UI_Xaml_Window),
    &spWindowStatics);
  ComPtr<IWindow> spCurrentWindow;
  if (SUCCEEDED(hr))
  {
    hr = spWindowStatics->get_Current(&spCurrentWindow);
  }

  ComPtr<Windows::UI::Xaml::Markup::IXamlReaderStatics> spXamlReaderStatics;
  if (SUCCEEDED(hr))
  {
    hr = Windows::Foundation::GetActivationFactory(
      ComHstring::Make(RuntimeClass_Windows_UI_Xaml_Markup_XamlReader),
      &spXamlReaderStatics);
  }
  ComPtr<IUIElement> spContent;
  if (SUCCEEDED(hr))
  {
    wchar_t const* xamlContent =
      L"<Grid xmlns=\"http://schemas.microsoft.com/winfx/2006/xaml/presentation\">"
      L"  <TextBlock Text=\"Hello, world\" />"
      L"</Grid>";
    ComPtr<IInspectable> spContentAsInspectable;
    hr = spXamlReaderStatics->Load(
      ComHstring::Make(xamlContent),
      &spContentAsInspectable);
    if (SUCCEEDED(hr))
    {
      hr = spContentAsInspectable.As(&spContent);
    }
  }
  if (SUCCEEDED(hr))
  {
    hr = spCurrentWindow->put_Content(spContent.Get());
  }
  if (SUCCEEDED(hr))
  {
      hr = spCurrentWindow->Activate();
  }

  return hr;
}

Again, roughly 10 times as much code!

OK, that’s not quite true. The original code relies on the project defining a UserControl-derived class which it just instantiates and uses as content. I didn’t want to tackle writing a UserControl in raw native code just yet, so instead, I used the WinRT XamlReader class to load some Xaml from a string constant. About half the code you see here is related to that.

On the other hand, it’s actually slightly more complex than it looks. I got bored of writing code to create and destroy HSTRINGs, so I wrote a little helper, ComHstring, which I’m using in a few places in that code. It provides implicit conversions to HSTRING, along with a destructor that automatically deletes the HSTRING once the temporary returned by its Make method is done with. (As far as I can tell, WRL doesn’t have an HSTRING wrapper, which is slightly surprising.)

Static Methods

In that last example, you’ll notice a couple of calls to GetActivationFactory, which is a WRL wrapper around RoGetActivationFactory. Earlier I mentioned that as a way of getting access to the mechanism for inheritance, but it’s also the way to get access to the static methods defined by a class. Classic COM doesn’t have a concept of static methods—methods are always presented through an interface pointer, so there’s always an instance involved. When WinRT classes define static methods, they appear on the activation factory.

Instantiating the Derived Type

Now that we have an implementation of IApplicationOverrides, we’re ready to create an instance of our Application-derived type. Here’s the code:

ComPtr<Windows::UI::Xaml::IApplicationFactory> spApplicationFactory;
hr = Windows::Foundation::GetActivationFactory(
  ComHstring::Make(L"Windows.UI.Xaml.Application"),
  &spApplicationFactory);
CheckHresult(hr, L"GetActivationFactory");
    
ComPtr<DerivedApp> spDerivedApp = Make<DerivedApp>();
ComPtr<Windows::UI::Xaml::IApplicationOverrides> spDerivedAppOverrides = spDerivedApp;

ComPtr<Windows::UI::Xaml::IApplication> spApplication;
ComPtr<IInspectable> spInner;
hr = spApplicationFactory->CreateInstance(
  spDerivedApp.Get(),
  &spInner,
  &spApplication);
CheckHresult(hr, L"Application CreateInstance");

ComPtr<Windows::UI::Xaml::IApplicationOverrides> spBaseImplementation;
hr = spInner.As(&spBaseImplementation);
CheckHresult(hr, L"QI for base IApplicationOverrides");
spDerivedApp->SetBase(spBaseImplementation.Get());

(ComPtr is a WRL smart pointer. It automates some aspects of COM that otherwise get tedious fast. It’s similar to ATL’s CComPtr, although some operations that CComPtr performed implicitly now require explicit code, largely because those implicit operations were responsible for a lot of bugs in code written by people who didn’t really understand how CComPtr works. With the new ComPtr, a failure to understand how it works is more likely to lead to a compiler error than a runtime bug.)

We get the Application class’s activation factory (using GetActivationFactory which is a wrapper for RoGetActivationFactory). We create an instance of our IApplicationOverrides class—the Make<T> method I’m using here is a WRL helper that creates and initializes RuntimeClass-based object. We then call the factory’s CreateInstance method, passing in our overrides implementation. CreateInstance passes back two interface pointers. One is the ‘inner’ object, which we then pass into our overrides object—remember, it needs that to be able to call the base class for methods that it doesn’t wish to override. (Notice I’m doing the QueryInterface for IApplicationOverrides during this construction phase, instead of doing it in every single method invocation as the C++ compiler seems to in its projection.)

At the end of all this, the spApplication variable points to our finished object—it is an interface pointer to the wrapper generated by the Application class which combines our overrides with the base implementation. So we can now run that to start the application:

hr = spApplication->Run();

When I run my application, I see the “Hello, world” TextBox that I created in my OnLaunched method appearing, verifying that I have successfully overridden that method in my derived class.

What could be simpler?


Real Native WinRT Development - Friday 16 September, 2011, 4:05 PM

I’ve been at Microsoft’s //build/ conference this week, where they announced WinRT. WinRT is a new API for building Windows applications that use the new ‘Metro’ style. Under the covers this uses a lot of COM-based technology. However, we’ve seen very little of that COM layer.

The various languages that support WinRT (C#, VB, JavaScript, and C++) all define “projections” to make the new WinRT-based objects easier to work with. After all, who really wants to use COM if they can help it? Even the vast majority of the C++ examples we've been shown have used the C++ projection. Slightly confusingly, this hasn’t always been made very clear—some presenters have shown C++ examples and talked about native code and COM while showing us things that are actually the C++ projection.

If you want to crow about how C++ lets you do native development (and apparently some people seem very excited by this) you don’t get full bragging rights if you’re using the projection. The C++ Metro project templates in the Visual Studio 11 preview are all working at a level removed from the true native experience.

That’s a good thing, by the way (for much the same reasons that managed code is a good thing). Without the various language projections, WinRT would be a clear reminder of why so many of us walked away from COM a decade ago. There will rarely be any good reason to work directly with WinRT at the native level. (Adopting a boastful “more native than thou” attitude may be a reason of sorts, but it’s certainly not a good one.) But I like to know how things work, so I thought it’d be fun to try out some real native WinRT development. This means eschewing the projections, and getting down to the level where you will see an actual vtbl call. (We didn’t see any such thing in the ‘big picture’ talks on the first day of the conference, despite what some presenters claimed.)

Raw Native WinRT Client Projects

The first challenge is to create a C++ project capable of working in the WinRT environment, but without opting into the projected world. None of the templates supplied with the Visual Studio preview seem to do this—the Metro projects all turn on the projection features, but the classic C++ templates produce programs which won’t run in the new WinRT world. (If you saw any of the //build/ presentations on WinRT, you may remember the green and blue boxes—the green boxes are the new Metroid world, while the blue boxes represent the classic pre-Win8 world.)

You need your project file to contain a combination of settings which, as far as I can tell, can’t actually be configured in Visual Studio’s UI, so some manual project file editing is required. In particular, you need an <Immersive>true/</Immersive> element in a property group to enable the deploy step that builds a new appx package for your app—if I’ve understood correctly, you won’t get into the green box without this. (This property is present in all the new Metro templates, but as far as I can see, there’s no UI for turning this on in an existing C++ project.) However, that setting also turns on all the projection stuff, so you need to turn that back off again. You need a <CompileAsWinRT>false</CompileAsWinRT> inside the <ClCompile> section for each configuration, and also <GenerateWindowsMetadata>false</GenerateWindowsMetadata> and <WindowsMetadataFile /> inside each configuration’s <Link> section.

If you don’t turn those off, you’ll get a load of errors when you try to include the header files that define the native versions of all the WinRT types and interfaces—the C++ compiler will complain that it has already seen all these types (because it imports them automatically when you’re using the C++ WinRT projection), and that they all look different (because the C++ projection represents many these types differently from the native versions).

So, we now have a project that builds genuine native C++ code with none of the projection features switched on, but which puts the result into the new appx packaging format. This will also use the appx-aware debugger. (You can turn that on with <DebuggerFlavour>ImmersiveLocalDebugger</DebuggerFlavor>, but that setting will be in effect as a result of turning on the immersive setting for the build system.)

Next, we need some code.

Ruh Roh!

First, let’s initialize the runtime:

HRESULT hr = ::RoInitialize(RO_INIT_SINGLETHREADED);

[Update (2011/09/20): it turns out that this should be RO_INIT_MULTITHREADED if you want it to work.]

That should look hauntingly familiar to anyone who’s done any COM development. It’s not quite the same—the classic COM versions of this API start with ‘C’ rather than ‘R’, so that’d be CoInitialize, and if you called the version that accepted a threading mode (CoInitializeEx) that’d be COINIT_APARTMENTTHREADED.

But at this stage, WinRT definitely sounds like native COM, albeit spoken in a silly voice.

Old-School Error Handling

You’ll have noticed the HRESULT in that first line of code. If you’ve done any COM, you’ll be all too familiar with this. It’s just a 32-bit integer, and all COM operations (APIs and also method calls on objects) return these. If the top bit is set, that indicates an error, and the number may or may not tell you something about what failed. You have to check these every time you do anything. So I wrote a little helper:

inline void CheckHresult(HRESULT hr, LPCWSTR message)
{
  if (FAILED(hr))
  {
    wcout << L"Error (0x" << hex << hr << L") during: " << message << endl;
    exit(1);
  }
}

That’s a bit brutal, but it’ll do for now. We should call it after RoInitialize:

CheckHresult(hr, L"RoInitialize");

And we’ll be seeing more of that sort of thing.

Creating an Object

Now that we’ve initialized WinRT, we need an object. The first thing most immersive applications do is create an Application object. For example, in the high-level world of the C++ projection for WinRT, here’s the generated program entry point in App.cs.g from an ordinary metro app:

int main(lang::array<Platform::String^>^ args)
{
  auto app = ref new App();

  app->Application::Run();
}

That first line constructs the App object, which derives from the Application class. For now, I’m not going to get into derivation—for this blog post, I’m just going to construct the base Application class directly. As you can see above, in the world of the C++ projection for WinRT creating an object is one line of code. Here’s the native version of that one line of code:

const wchar_t *appClassName = L"Windows.UI.Xaml.Application";
HSTRING hstring;
hr = ::WindowsCreateString(appClassName,
       static_cast<UINT32>(::wcslen(appClassName)), &hstring);
CheckHresult(hr, L"WindowsCreateString");
IInspectable* pInspApp;
hr = ::RoActivateInstance(hstring, &pInspApp);
CheckHresult(hr, L"RoActivateInstance");
::WindowsDeleteString(hstring);
CheckHresult(hr, L"WindowsDeleteString");

What a lot of code. Actually the interesting part is just two lines in the middle:

IInspectable* pInspApp;
hr = ::RoActivateInstance(hstring, &pInspApp);

The rest is all string handling or error handling, which is exactly the sort of low-level cruft that the language projections save you from. All this sort of thing will be going on, it’s just that if you choose to use the high-level projections, they hide all that for you.

HSTRING is WinRT’s native representation of strings. COM veterans will know that this is new—in old-school COM, we represented strings in numerous ways, but this wasn’t one of them. (In the VB/scripting/IDispatch/dual side of the COM house, strings were typically BSTRs, while in the non-C++-languages-need-not-apply side of COM, they were often just plain old LPWSTRs, although there were a bunch of other more unusual options.) I’ll leave the what and why of HSTRING for another time, so for now, just know that this is how WinRT expects its strings to look, so we had to create one from our C-style string constant to be able to call RoActivateInstance.

So that’s two departures from classic COM: there’s a new string type, but there’s also the fact that we’re using a string at all. I’m passing a string containing the name of the class I want to instantiate (“Windows.UI.Xaml.Application”) to RoActivateInstance. The nearest equivalent classic COM API was CoCreateInstance, and that used GUIDs to identify types. But in WinRT, you’ll see strings in a lot of places you used to see GUIDs.

In fact that call to RoActivateInstance introduces a third new thing: IInspectable.

IInspectable

In classic COM, all interfaces derived from a base interface called IUnknown, which offered two services: reference counting-based lifetime management (through the AddRef and Release methods), and the ability to request other interfaces that the object might offer (through the QueryInterface method). IUnknown is still there, but there’s a new interface which everything in WinRT seems to derive from: IInspectable.

IInspectable derives from IUnknown, as we can see from the native C++ definition (in the SDK’s inspectable.h file, which looks like it was generated from IDL):

MIDL_INTERFACE("AF86E2E0-B12D-4c6a-9C5A-D7AA65101E90")
IInspectable : public IUnknown
{
public:
  virtual HRESULT STDMETHODCALLTYPE GetIids( 
    __RPC__out ULONG *iidCount,
    __RPC__deref_out_ecount_full_opt(*iidCount) IID **iids) = 0;
        
  virtual HRESULT STDMETHODCALLTYPE GetRuntimeClassName( 
    __RPC__deref_out_opt HSTRING *className) = 0;
        
  virtual HRESULT STDMETHODCALLTYPE GetTrustLevel( 
    __RPC__out TrustLevel *trustLevel) = 0;
        
};

So IUnknown is still there—this still looks and feels like COM. But every WinRT interface gets three new features on top of IUnknown’s services. GetIids lets us ask an object for a list of the interfaces it offers, and whatever it returns in this list, it is committed to making available through QueryInterface. (The old-school COM model is that you had to know what to ask the object for. Now you can ask it what it offers.)

You can also ask an object its type—with classic COM there wasn’t a standard ubiquitous way that you could ask any old interface pointer “what kind of object do you point to?” The assumption was that since you wrote the program, you should know what the object is. (And if you don’t know what the object is, what makes you think you can do anything useful with it?) But some of the languages WinRT projects into just assume it’s possible to ask an object its type, so this method makes that possible.

Finally, there’s GetTrustLevel. As the documentation (http://msdn.microsoft.com/library/br205824(v=VS.85) ) helpfully says, this “Gets the trust level of the current Windows Runtime object.” Well that clears things up. I’m assuming this has something to do with the security sandboxing model, but I haven’t yet had time to look at that in any detail.

As it happens, we’re not going to use any of the IInspectable features here. I want to call a method on the WinRT Application class, so I already know precisely which interface I want. I can therefore use the classic COM mechanism for getting hold of it:

Windows::UI::Xaml::IApplication* pApplication;
hr = pInspApp->QueryInterface(__uuidof(pApplication), (void**) &pApplication);
CheckHresult(hr, L"QI for IApplication");

That’s pretty ordinary COM. (Beautiful, isn’t it? *shudder*)

I’ve now got two references to the object—one typed as IInspectable, and one as IApplication. IApplication is a WinRT interface so it derives from IInspectable, making that first reference is superfluous—my pApplication pointer gives me all I could need, so I’ll let the other one go:

pInspApp->Release();

Remember, Release (along with AddRef) is one of the very few COM interface methods that doesn’t return an HRESULT, so we don’t need to check that this worked. It’s not allowed to fail.

So, we have finally written the native equivalent of that first line of code in the high-level projected C++ program. It was a while ago, so in case you’ve forgotten it, here’s the high level entry point code again:

int main(lang::array<Platform::String^>^ args)
{
    auto app = ref new App();

    app->Application::Run();
}

C++ may produce fewer machine code instructions per line of source than the other WinRT languages, but sometimes it achieves this by making you do all the work yourself… (I can hear all the people who used VB back in the 1990s when I was a C++ COM developer saying “I told you so!”)

We’re now ready to move onto the second line—the method call.

Calling Methods

Now that we’re up and running with our object, you may be relieved to hear that invoking methods is pretty straightforward:

hr = pApplication->Run();
CheckHresult(hr, L"Application.Run");

OK, it’s still twice as much code as you’d expect to write in VB or C# thanks to the return code based error handling, but it’s much less effort than it took to get to this point. COM makes fairly light work of invoking methods. Which reminds me…

That’s Not a Vtbl—THIS is a Vtbl

In the ‘big picture’ talks on the first afternoon of //build/ we were shown a C++ Metro app, and a big deal was made of showing the disassembly for a WinRT method call. It looked something like this:

	myTextBlock->SelectAll();
00F55F4F  mov         ecx,dword ptr [ebx+0C4h]  
00F55F55  call        Windows::UI::Xaml::Controls::TextBlock::Windows::UI::Xaml::Controls[::ITextBlock]::SelectAll (0F56160h)  

That’s a call to a TextBlock’s SelectAll method, which they chose because it’s a simple no-parameters method, which makes the resulting compiled code really simple. There are two instructions. The first instruction (mov) loads the implicit ‘this’ argument, passing it via the ECX register (which is one of the numerous ways of passing arguments in assembly language, although most go on the stack). The second instruction (call) is what simple method calls look like in assembly language.

This was proudly announced as a vtbl call, which was offered as evidence of the innate efficiency of C++ compared to some other languages. But as anyone who’s done much C++ COM development will have noticed, that wasn’t true—that’s not what vtbl calls look like. And if you step into that call in the debugger, the claims for efficiency look a whole lot more doubtful, because you end up in a compiler-generated thunk that’s 55 instructions long! (There is a real vtbl call, but it’s buried somewhere in the middle of those 55 instructions.)

The C++ projection for WinRT looks pretty expensive compared to classic COM. Going back to my truly native example, here’s what our native call to the Application object’s Run method looks like in the disassembler:

	hr = pApplication->Run();
00F211E9  mov         eax,dword ptr [esp+14h]  
00F211ED  push        eax  
00F211EE  mov         ecx,dword ptr [eax]  
00F211F0  call        dword ptr [ecx+40h]  

That’s compiled in Release mode by the way—you’ll get more verbose code in Debug. This is about as efficient as a vtbl call gets. The first instruction (mov) is loading the pApplication variable from memory into a CPU register. The second instruction (push) is passing that as the implicit ‘this’ argument—here we’re using the stack, which is how it’s done in COM. (COM doesn’t ever put ‘this’ in ECX, which was one giveaway that the earlier code wasn’t native COM. Everything goes on the stack in COM.) And then the final two instructions are what a vtbl call looks like. COM interface pointers point into a part of the object that contains another pointer, which points to the vtbl for whichever interface you’re working with. The vtbl is an array of function pointers, one for each method in the interface.

The vtbl here is for IApplication. Looking at the function pointers in the vtbl, the methods defined by base interfaces come first, so the first three slots are for IUnknown’s QueryInterface, AddRef, and Release, and the next three are for IInspectable’s methods, so the IApplication methods actually start at slot 6 (or in 0-based counting, at offset 5 which, given 32-bit function pointers, is actually offset 0x16, i.e. 20). The call instruction is looking up the slot at offset 0x40, and since this is a 32-bit process, turning that byte offset back into a slot offset we get 0x10, i.e. 16, which is the interface’s 17th method. So taking out the 6 methods of IUnknown and IInspectable, that means this code is invoking the 11th method in IApplication. And if you go and look at the interface definition for IApplication (which appears to be in Windows.UI.Xaml-coretypes.h), you’ll see that this is indeed the Run method, as you’d expect.

COM vtbl calls always involve this multiple indirection. We always use COM objects via interface pointers. To do anything useful with an interface pointer, we dereference it to get the vtbl. And then we retrieve nth entry in the vtbl. And that points to a method, so we perform an indirected call through that vtbl entry.

Modern CPUs are not very good at dealing with calls through pointers by the way—they like to look ahead, but this sort of indirection prevent them from knowing where they’re going until they get there. So vtbl calls are relatively expensive—much more expensive than, say, an inlined method call such as the CLR might perform. Native C++ can offer some performance benefits over managed code, but it’s not as clear cuts as a lot of people seem to think, particularly with COM in the picture. (Anything crossing a COM boundary defeats a lot of optimizations, something that's not true when using a .NET library.)

Reimagining Success

Incidentally, this call to Run fails. I know I’m right back in the world of COM because I get a return HRESULT of E_FAIL (0x80004005), which means roughly “Something went wrong, and the developer who wrote this method couldn’t [be bothered to] find a more informative error code.” If you pass that to the Windows API for formatting error messages you just get “Unspecified error”.

At the time of writing this, I have some theories as to why this might have failed, but I’ve not tested them yet. But since the //BUILD/ conference is apparently about reimagining everything, I’m going to reimagine what success means. My goal here was to show some of the very basic operations—creating new objects and invoking methods—done in real native code, and I’ve done that, even though my application does nothing useful as yet. To build something meaningful, I need to get into more complex mechanisms like inheritance, and event handling. But it’s nearly time for breakfast, and I want to get this posted before this morning’s talks, so I’ll leave things broken for now, and will follow up with an application that actually does something in a future post.

[Update (2011/09/20): as mentioned in the update above, this turned out to be a threading model issue. WinRT wants this part of the application initialization to happen from a multithreaded context, rather than a singlethreaded one. Also, I put back a missing p tag that had caused a paragraph to vanish in the original post.]

Obviously, nobody in their right mind is going to write applications this way. I just like to know what’s really going on under the covers, so I think it’s interesting to explore the details at this level—if you want to stand a chance of realizing any of the benefits that native code hypothetically offers, I think it’s important to understand this level of detail even if you ultimately choose to work at a higher level. If you care about performance, you need to know what the language projections are really doing for you.


Podcast with Jesse Liberty - Monday 22 November, 2010, 7:04 PM

I recently recorded a podcast with Jesse Liberty, in which we talked about various C# things, and in particular, about the new async features Microsoft recently previewed for C# 5.


Copyright © 2002-2010, Interact Software Ltd. Content by Ian Griffiths. Please direct all Web site inquiries to webmaster@interact-sw.co.uk