IanG on Tap

Ian Griffiths in Weblog Form (RSS 2.0)

Blog Navigation

July (2014)

(5 items)

April (2014)

(1 item)

March (2014)

(1 item)

January (2014)

(2 items)

November (2013)

(2 items)

July (2013)

(4 items)

April (2013)

(1 item)

February (2013)

(6 items)

September (2011)

(2 items)

November (2010)

(4 items)

September (2010)

(1 item)

August (2010)

(4 items)

July (2010)

(2 items)

September (2009)

(1 item)

June (2009)

(1 item)

April (2009)

(1 item)

November (2008)

(1 item)

October (2008)

(1 item)

September (2008)

(1 item)

July (2008)

(1 item)

June (2008)

(1 item)

May (2008)

(2 items)

April (2008)

(2 items)

March (2008)

(5 items)

January (2008)

(3 items)

December (2007)

(1 item)

November (2007)

(1 item)

October (2007)

(1 item)

September (2007)

(3 items)

August (2007)

(1 item)

July (2007)

(1 item)

June (2007)

(2 items)

May (2007)

(8 items)

April (2007)

(2 items)

March (2007)

(7 items)

February (2007)

(2 items)

January (2007)

(2 items)

November (2006)

(1 item)

October (2006)

(2 items)

September (2006)

(1 item)

June (2006)

(2 items)

May (2006)

(4 items)

April (2006)

(1 item)

March (2006)

(5 items)

January (2006)

(1 item)

December (2005)

(3 items)

November (2005)

(2 items)

October (2005)

(2 items)

September (2005)

(8 items)

August (2005)

(7 items)

June (2005)

(3 items)

May (2005)

(7 items)

April (2005)

(6 items)

March (2005)

(1 item)

February (2005)

(2 items)

January (2005)

(5 items)

December (2004)

(5 items)

November (2004)

(7 items)

October (2004)

(3 items)

September (2004)

(7 items)

August (2004)

(16 items)

July (2004)

(10 items)

June (2004)

(27 items)

May (2004)

(15 items)

April (2004)

(15 items)

March (2004)

(13 items)

February (2004)

(16 items)

January (2004)

(15 items)

Blog Home

RSS 2.0

Writing

Programming C# 5.0

Programming WPF

Other Sites

Interact Software

LINQ and N-ary Cartesian Products

Wednesday 28 July, 2010, 07:21 PM

I recently found myself needing to solve a problem that involved producing an n-ary Cartesian product. I ended up with a fun solution in which I used LINQ to generate a LINQ query. I thought I’d share it. Being me, I spent quite a while writing a blog about this, tinkering with the details, and leaving it for a few days so I could re-read the text with a fresher point of view. And then I got distracted. And then it got so large that I split it up.

The problem with sitting on a draft of a blog post for that long is that there’s a good chance that Eric Lippert will write a post on exactly the same topic in between me starting to write the post and actually publishing it.

Ah well. I ended up with a slightly different solution from Eric. Also, the main reason I was delaying this was that I wanted to get a couple of examples from other languages together. So I hope this will be interesting even to people who have already read Eric’s post.

This ended up being rather longer than I had anticipated, so I’m breaking it up into a series.

Part 1 – Simple but Too Specialized

Before getting to the general n-ary case, let’s start with a simple case: the Cartesian product of two sets, or a cross join, as it’s known in databases. If you’re not familiar with this operation, it just produces every possible pairing between the items from two sets. This is easy in C# thanks to LINQ:

char[] letters = { 'A', 'B', 'C' };
int[] numbers = { 1, 2, 3, 4 };

var cartesianProduct = from letter in letters
                       from number in numbers
                       select new { letter, number };

foreach (var item in cartesianProduct)
{
    Console.WriteLine(item);
}

I start with two lists—the letters A, B, and C, and the numbers 1-4—and the output shows all the possible pairs:

{ letter = A, number = 1 }
{ letter = A, number = 2 }
{ letter = A, number = 3 }
{ letter = A, number = 4 }
{ letter = B, number = 1 }
{ letter = B, number = 2 }
{ letter = B, number = 3 }
{ letter = B, number = 4 }
{ letter = C, number = 1 }
{ letter = C, number = 2 }
{ letter = C, number = 3 }
{ letter = C, number = 4 }

That’s fine as far as it goes, but I needed to deal with more than two lists. LINQ can handle this too—you can have as many from clauses as you like:

char[] letters = { 'A', 'B', 'C' };
int[] numbers = { 1, 2, 3, 4 };
string[] colours = { "Red", "Blue" };

var cartesianProduct = from letter in letters
                       from number in numbers
                       from colour in colours
                       select new { letter, number, colour };

This produces the following output:

{ letter = A, number = 1, colour = Red }
{ letter = A, number = 1, colour = Blue }
{ letter = A, number = 2, colour = Red }
{ letter = A, number = 2, colour = Blue }
{ letter = A, number = 3, colour = Red }
{ letter = A, number = 3, colour = Blue }
{ letter = A, number = 4, colour = Red }
{ letter = A, number = 4, colour = Blue }
{ letter = B, number = 1, colour = Red }
{ letter = B, number = 1, colour = Blue }
{ letter = B, number = 2, colour = Red }
{ letter = B, number = 2, colour = Blue }
{ letter = B, number = 3, colour = Red }
{ letter = B, number = 3, colour = Blue }
{ letter = B, number = 4, colour = Red }
{ letter = B, number = 4, colour = Blue }
{ letter = C, number = 1, colour = Red }
{ letter = C, number = 1, colour = Blue }
{ letter = C, number = 2, colour = Red }
{ letter = C, number = 2, colour = Blue }
{ letter = C, number = 3, colour = Red }
{ letter = C, number = 3, colour = Blue }
{ letter = C, number = 4, colour = Red }
{ letter = C, number = 4, colour = Blue }

That’s a 3-way Cartesian product. You can keep adding as many from clauses as you like, because LINQ supports n-ary Cartesian products. Problem solved? Not quite. This approach is perfect if you know how many sets you’re combining at compile time—in these examples, I knew what the ‘n’ in n-ary was when writing the code. (2, and 3, respectively.)

But in the problem I’m trying to solve, I didn’t have that information at compile time. I need to produce the n-ary Cartesian product across a bunch of sets where I don’t know how many sets there would be until runtime.

But this doesn’t stop me from using LINQ. A LINQ query is just an object built up by a series of method calls—the query expression syntax I used in the last couple of examples disguises this, but explicit method calls offer a bit more power. LINQ queries with multiple from clauses use the SelectMany operator beneath the covers, so that last query is equivalent to this:

var lettersAndNumbersCartesianProduct = letters.SelectMany(
    letter => numbers.Select(number => new { letter, number }));

var cartesianProduct = lettersAndNumbersCartesianProduct.SelectMany(
    (letterAndNumber) => colours.Select(colour =>
        new { letterAndNumber.letter, letterAndNumber.number, colour }));

More generally, if you have N from clauses you will have N-1 calls to SelectMany. (You’ll notice that I’ve now got two anonymous types—the one used to contain the pairs produced by the first SelectMany is fed into the second SelectMany that produces the (letter, number, colour) ‘triples’ that make up my final output. One of the advantages of using the C# query expression syntax is that the C# compiler hides all of this sort of stuff for you. If you inspect the code C# produces for the query expression version, using ILDASM or Reflector, you’ll see it’s doing something similar under the covers.)

I split the last example into two statements to highlight a useful fact: if I need a 3-way Cartesian product, I can start with a 2-way Cartesian product, and then just bolt an extra SelectMany on the end. This technique of taking an existing LINQ query and invoking an extra operator on it to form a more complex query is sometimes called chaining.

I can take this a step further and build a general purpose method that takes any number ‘n’ of sets and produces the n-ary Cartesian product of those sets by repeatedly chaining SelectMany calls. However, I’d need to come up with a slightly more adaptable representation. As it stands, with each set I add to the product, the resulting set’s members grow more complex. The Cartesian product of numbers and letters produced a set where each member had both a number and a letter, e.g. { letter = 'A', number = 2 } or { letter = 'C', number = 1 }. The Cartesian product of numbers letters, and colours produced a set where each member had three items, looking like { letter = A, number = 3, colour = "Red" } or { letter = B, number = 2, colour = "Blue" }.

I’ll have to abandon anonymous types to generalize this—anonymous types are great when you know at compile time exactly what you’ll be dealing with. But they cannot adapt at runtime. So in the next blog in this series, I’ll use a different representation, so that I can write a generalized method for producing an n-ary Cartesian product.

Copyright © 2002-2013, Interact Software Ltd. Content by Ian Griffiths. Please direct all Web site inquiries to webmaster@interact-sw.co.uk