IanG on Tap

Ian Griffiths in Weblog Form (RSS 2.0)

Blog Navigation

March (2014)

(1 item)

January (2014)

(2 items)

November (2013)

(2 items)

July (2013)

(4 items)

April (2013)

(1 item)

February (2013)

(6 items)

September (2011)

(2 items)

November (2010)

(4 items)

September (2010)

(1 item)

August (2010)

(4 items)

July (2010)

(2 items)

September (2009)

(1 item)

June (2009)

(1 item)

April (2009)

(1 item)

November (2008)

(1 item)

October (2008)

(1 item)

September (2008)

(1 item)

July (2008)

(1 item)

June (2008)

(1 item)

May (2008)

(2 items)

April (2008)

(2 items)

March (2008)

(5 items)

January (2008)

(3 items)

December (2007)

(1 item)

November (2007)

(1 item)

October (2007)

(1 item)

September (2007)

(3 items)

August (2007)

(1 item)

July (2007)

(1 item)

June (2007)

(2 items)

May (2007)

(8 items)

April (2007)

(2 items)

March (2007)

(7 items)

February (2007)

(2 items)

January (2007)

(2 items)

November (2006)

(1 item)

October (2006)

(2 items)

September (2006)

(1 item)

June (2006)

(2 items)

May (2006)

(4 items)

April (2006)

(1 item)

March (2006)

(5 items)

January (2006)

(1 item)

December (2005)

(3 items)

November (2005)

(2 items)

October (2005)

(2 items)

September (2005)

(8 items)

August (2005)

(7 items)

June (2005)

(3 items)

May (2005)

(7 items)

April (2005)

(6 items)

March (2005)

(1 item)

February (2005)

(2 items)

January (2005)

(5 items)

December (2004)

(5 items)

November (2004)

(7 items)

October (2004)

(3 items)

September (2004)

(7 items)

August (2004)

(16 items)

July (2004)

(10 items)

June (2004)

(27 items)

May (2004)

(15 items)

April (2004)

(15 items)

March (2004)

(13 items)

February (2004)

(16 items)

January (2004)

(15 items)

Blog Home

RSS 2.0

Writing

Programming C# 5.0

Programming WPF

Other Sites

Interact Software

Natural Sorting in C#

Thursday 13 December, 2007, 01:43 PM

Jeff Atwood recently posted about natural sorting. This is all about making sure that strings that contain numbers sort numerically. I’m slightly surprised to see that he wants to call it alphabetical sorting. Surely by definition, alphabetical sorting is defined by, well, the alphabet. This is an issue about numbers, not letters.

Anyway, he says he tried and gave up on a succinct C# version. He suggests that it will take 40+ lines of code. I believe that’s misleading, because as far as I can tell, the Python versions are only able to be so succinct because Python already appears to know how to sort an array. Both examples he shows rely on this. In .NET, collections aren’t intrinsically sortable. Let’s sort that:

/// <summary>
/// Compares two sequences.
/// </summary>
/// <typeparam name="T">Type of item in the sequences.</typeparam>
/// <remarks>
/// Compares elements from the two input sequences in turn. If we
/// run out of list before finding unequal elements, then the shorter
/// list is deemed to be the lesser list.
/// </remarks>
public class EnumerableComparer<T> : IComparer<IEnumerable<T>>
{
    /// <summary>
    /// Create a sequence comparer using the default comparer for T.
    /// </summary>
    public EnumerableComparer()
    {
        comp = Comparer<T>.Default;
    }

    /// <summary>
    /// Create a sequence comparer, using the specified item comparer
    /// for T.
    /// </summary>
    /// <param name="comparer">Comparer for comparing each pair of
    /// items from the sequences.</param>
    public EnumerableComparer(IComparer<T> comparer)
    {
        comp = comparer;
    }

    /// <summary>
    /// Object used for comparing each element.
    /// </summary>
    private IComparer<T> comp;


    /// <summary>
    /// Compare two sequences of T.
    /// </summary>
    /// <param name="x">First sequence.</param>
    /// <param name="y">Second sequence.</param>
    public int Compare(IEnumerable<T> x, IEnumerable<T> y)
    {
        using (IEnumerator<T> leftIt = x.GetEnumerator())
        using (IEnumerator<T> rightIt = y.GetEnumerator())
        {
            while (true)
            {
                bool left = leftIt.MoveNext();
                bool right = rightIt.MoveNext();

                if (!(left || right)) return 0;

                if (!left) return -1;
                if (!right) return 1;

                int itemResult = comp.Compare(leftIt.Current, rightIt.Current);
                if (itemResult != 0) return itemResult;
            }
        }
    }
}

(Note: I offer the code samples on this page under the MIT license.)

So yes, I need a lot of code. However, that’s a utility class that is applicable to a wide range of scenarios, not just this one. It’s slightly irritating that it’s not already built into the .NET framework. Heck, maybe it is, and I’ve just been looking in the wrong place.

Given easy way to compare two sequences, a C# 3.0 natural sort becomes roughly as trivial as the Python examples in Jeff’s blog:

string[] testItems = { "z24", "z2", "z15", "z1",
                       "z3", "z20", "z5", "z11",
                       "z 21", "z22" };

Func<string, object> convert = str =>
{   try { return int.Parse(str); }
    catch { return str; } };
var sorted = testItems.OrderBy(
    str => Regex.Split(str.Replace(" ", ""), "([0-9]+)").Select(convert),
    new EnumerableComparer<object>());

It’s probably not meaningful to count lines of code. This being C#, I could have put it all on one line. As it is, I split it across more lines than I normally would, to avoid an annoying HTML layout issue. (I put my code samples in PRE blocks to get the formatting right, PRE blocks and long lines are a bad combination.) But I think it’s fair to say that any differences in size are due merely to syntactic differences between Python and C#. Structurally, there’s no substantial difference – I’ve been able to apply exactly the same techniques the Python examples used in C#.

If I print out the results using this code:

foreach (string s in sorted)
{
    Console.WriteLine(s);
}

It prints out the test items in this order:

z1
z2
z3
z5
z11
z15
z20
z 21
z22
z24

I.e., ascending numeric order, rather than what you’d get with most string ordering.

[Updated 21st December 2007: Charles Petzold didn’t like the original version, which treated spaces as significant for sorting. So I’ve updated the example to ignore spaces, as the position of “z 21” in the output above shows. I simply added a call to Replace(" ", "") on the string before passing it into Regex.Split.]

Copyright © 2002-2013, Interact Software Ltd. Content by Ian Griffiths. Please direct all Web site inquiries to webmaster@interact-sw.co.uk