Friday, 05 September 2008
When writing our new messaging framework (GMX) for Genome v4, I ran into an interesting problem with LINQ. My colleague, Sztupi also ran into the same problem at almost the same time, so I thought it would make sense to write about it.

Before describing the problem, let me summarize some not-so-well-known facts about LINQ. If you are experienced with LINQ and the expression trees it uses, you can even skip this part and proceed from “So much for the LINQ overview” sentence to read about the problem I ran into.

When you write a query, such as

from c in customers
where c.City == "London"
select c
the C# compiler compiles it into a method call like this:

customers.Where(c => c.City == "London")

You can even write the Where() call directly as well; you don’t have to use the "from…" syntax. The parameter of the Where() call is a special construct called a lambda expression, which is something very similar to an anonymous method. In fact, sometimes it is an anonymous method.

Now the question is what you want to do with this lambda expression. If you want to filter customers that are already loaded into the memory, you want to have an anonymous method compiled from the lambda. However, if the customers reside in the database or in an XML file, you actually never want to evaluate the lambda as a .NET method call, but rather you want to transform it to SQL or XPath and let the underlying engine execute it. In this case, the anonymous method is not a good option, as it would be very hard to find out from the compiled CLR code that the method wanted to compare the City field to "London".

And here comes the big trick of LINQ. The C# compiler decides during compile time whether to compile the lambda expression to an anonymous method, or to an expression tree initialization code. If it compiles it to expression tree initialization, then during runtime, a new expression tree will be created whenever this Where() method is called, and this expression tree will represent the lambda expression you just described. O/RM engines like Genome can take this expression tree and transform it to SQL.

The only question remains is how the C# compiler can decide whether to compile the lambda to an anonymous method or to expression tree initialization. This decision is done by analyzing the parameter types of the Where() method you are actually about to call. If the Where() method takes a delegate as a parameter, it compiles to an anonymous method, and if it takes an Expression<T> parameter, it compiles to expression initialization.

It is good to know that the LambdaExpression class has a Compile() method, that can be used to compile the expression tree to a delegate. We don’t have a transformation in the other direction however, so you cannot get an expression tree from a delegate.

So much for the LINQ overview. Now comes my problem.

Let's suppose that I have a filter call, such as:

customers.Where(c => c.City == "London")
which is transformed to SQL by an O/RM:
… WHERE City = 'London'
Let’s say that, for some reason, I realize that my program needs to filter for the customers in London so often that I want to extract this filtering logic (maybe I want to change it to filter for the country as well, when the product starts ruining the international market). No problem. We all know how to extract logic, don’t we?

class Helper
  public static bool IsLocalCustomer(Customer c)
    return c.City == "London";
Done. You go for holidays, leaving behind a yellow sticky note for your colleague (it is already too late to write an e-mail, and your wife has called already twice): "Please update the c.City == "London" calls in the queries to Helper.IsLocalCustomer(c).

What will happen? You can guess. You switch on your phone after landing, and before you drink your first cocktail on the beach, and you get an SMS: the application does not work at all, and they got an error: Helper.IsLocalCustomer cannot be transformed to SQL.

What is the problem? The c.City == "London" expression is moved into the IsLocalCustomer method. As this expression is not a lambda anymore, and anyway the return type of the method is bool (and not Expression<T>), this is compiled to a standard CLR method. Even though the Where() call:

customers.Where(c => Helper.IsLocalCustomer(c))
is still compiled to an expression tree, it does not help, as this expression tree will contain only a method call, and nothing about cities, equality or London. And without telling the O/RM how to interpret the helper method call, it will not be able to transform it to SQL. In most O/RMs, you cannot even specify how to transform a custom method call. In Genome, you can: So, if you use Genome, then I think you should stop here, and map the helper function with <Linq> mapping. But now I wanted to find out: what is the intended way to solve this problem with LINQ? So I dived into the world of LINQ expression trees.

OK, so let's come back to the original problem of extracting this logic. As method extraction does not generally work, we come up with another idea:

class Helper
  public static Set<Customer> FilterForLocalCustomer(Set<Customer> s)
    return s.Where(c => c.City == "London");
This works of course, and in most cases, this solution is good enough, but it does a little bit too much. Actually we wanted to extract only the comparison of the customer, and we did not want to bind it to the filtering directly. Let's say that we also wanted to express the following; the previously defined helper method does not work anymore:

customers.Where(c => c.IsVIP ? c.Country == "UK" : c.City == "London")

OK, we get to yet another idea:

class Helper
  public static Expression<Predicate<Customer>> GetIsLocalCustomerExpression()
    return c => c.City == "London";
Since the return type of the method is an expression, and not only Predicate<Customer> (which is equivalent to delegate bool (Customer)), the C# compiler compiles an expression tree initialization from the lambda, so you can use it for the Where():

Unfortunately, the helper method is very ugly this way (as the method signature is obscured with technical implementation details of LINQ), so you would never put it to the Customer class directly. Also, it is bound to the usage scenario: it can be used for queries that are executed as SQL. Well, actually you can also apply it to an in-memory customers collection as well, thanks to the Compile() method mentioned above, but this is far from nice:

Brrr… Calling it for a concrete customer instance, it is even worse:

Although this is ugly and very far from the natural decomposition that we did first, there is even an additional problem here. At least one… (I can come up with more, but I won’t mention them in this article).

Let's suppose now that I don’t have a customer set anymore, but an Order set. The order class, as usual, has a property that returns the customer it is created for. This is a kind of FK in the database. If your boss gives you the task to filter the orders for local customers, you will quickly face the next problem. Without decomposition, this would look like:

orders.Where(o => o.Customer.City == "London")
with the nice decomposition (which is only possible in Genome):

orders.Where(o => Helper.IsLocalCustomer(o.Customer))
and with our helper method:

orders.Where(o => ???

Hm… that won't work. Since the helper method returns an Expression<Predicate<Customer>>, it cannot be used for the Orders set, as it would need an Expression<Predicate<Order>>. Well of course we could create an alternative helper method that returns the lambda o.Customer.City == "London", called GetIsLocalCustomerOfAnOrderExpression(), but this would lead to an endless creation of London filtering methods, which you don’t want to do…

I think something like this should work:

orders.Where(o => Helper.GetIsLocalCustomerExpression()<<o.Customer>>)
As you see, I had to use a non-existing notation <<>> to express that here we would like to compile the lambda to an expression tree, where the expression tree returned by the helper method is also embedded, with the c -> o.Customer parameter substitution.

If you don’t use lambda for this Where() call, but you manually create the expression tree, you can solve this expression embedment. It is not easy, but you can do it. Of course, then half of the benefits of using LINQ, such as compile-time checking and intellisense, are gone.

BTW, the FilterForLocalCustomer() solution also suffers from the same issue, and it also cannot be used for an orders collection.

I wonder if there will be such a construct in LINQ, or anything else that generally solves the query encapsulation problem. But I don’t know how it should work yet. One option would be to write a decompiler that can convert the CLR compiled methods (at least the simple ones) to an expression tree. I don’t think it would be a very complex task, as for example Reflector can decompile thousands of lines within seconds. Maybe Jitter should do that, as it has to go through the method anyway. Or maybe you could mark the methods to be decompiled with an attribute, and use PostSharp, or something similar, to produce the expression-tree version of the methods as a post-build step. But these are just ideas…

Do you have a better one?

Posted by Gaspar

Genome | Linq