Friday, 05 September 2008
When writing our new messaging framework (GMX) for Genome v4, I ran into an interesting problem with LINQ. My colleague, Sztupi also ran into the same problem at almost the same time, so I thought it would make sense to write about it.

Before describing the problem, let me summarize some not-so-well-known facts about LINQ. If you are experienced with LINQ and the expression trees it uses, you can even skip this part and proceed from “So much for the LINQ overview” sentence to read about the problem I ran into.

When you write a query, such as

from c in customers
where c.City == "London"
select c
the C# compiler compiles it into a method call like this:

customers.Where(c => c.City == "London")

You can even write the Where() call directly as well; you don’t have to use the "from…" syntax. The parameter of the Where() call is a special construct called a lambda expression, which is something very similar to an anonymous method. In fact, sometimes it is an anonymous method.

Now the question is what you want to do with this lambda expression. If you want to filter customers that are already loaded into the memory, you want to have an anonymous method compiled from the lambda. However, if the customers reside in the database or in an XML file, you actually never want to evaluate the lambda as a .NET method call, but rather you want to transform it to SQL or XPath and let the underlying engine execute it. In this case, the anonymous method is not a good option, as it would be very hard to find out from the compiled CLR code that the method wanted to compare the City field to "London".

And here comes the big trick of LINQ. The C# compiler decides during compile time whether to compile the lambda expression to an anonymous method, or to an expression tree initialization code. If it compiles it to expression tree initialization, then during runtime, a new expression tree will be created whenever this Where() method is called, and this expression tree will represent the lambda expression you just described. O/RM engines like Genome can take this expression tree and transform it to SQL.

The only question remains is how the C# compiler can decide whether to compile the lambda to an anonymous method or to expression tree initialization. This decision is done by analyzing the parameter types of the Where() method you are actually about to call. If the Where() method takes a delegate as a parameter, it compiles to an anonymous method, and if it takes an Expression<T> parameter, it compiles to expression initialization.

It is good to know that the LambdaExpression class has a Compile() method, that can be used to compile the expression tree to a delegate. We don’t have a transformation in the other direction however, so you cannot get an expression tree from a delegate.

Genome | Linq
Friday, 05 September 2008 16:16:03 (W. Europe Daylight Time, UTC+02:00)  #    Disclaimer  |  Comments [0]  | 
 Tuesday, 05 August 2008

We frequently get asked about Genome’s future in the light of Microsoft’s upcoming .NET 3.5 SP1 release, which includes the Entity Framework and related technologies such as LINQ and ADO.NET Data Services (see also the beta release announcement on Scott Guthrie's blog giving a broad overview about the new features).

LINQ

LINQ (already released with .NET 3.5) provides query language capabilities for C# and VB.NET. Many new Microsoft technologies and products by other vendors rely on LINQ. It is crucial to integrate with it in order to stay connected with other technology trends. The distinction to LINQ2SQL needs to be emphasised, as many users confuse the two.
Genome has been fully integrated with LINQ since November 2007 (although we released several preview integration versions from 2006 on).  In fact, Genome was the first third party O/RM to provide LINQ integration. Developers who use Genome are thus not locked out of technology trends related to LINQ.

Astoria

Astoria is the code name for Microsoft ADO.NET data services. It provides a REST interface for any data source that supports the interfaces IQueryable (introduced with LINQ) and IUpdateable (introduced with Astoria). It is not an O/RM, but rather a messaging layer over O/RMs or other data sources.
Astoria’s current release focuses on integrating with Entity Framework, but it appears that its extensibility is still unstable when it comes to other frameworks. Astoria is a great concept, but we doubt anyone is currently using it in production.
We are confident that Genome will support Astoria in the near future (before the end of this year), when integration possibilities have matured and the integration issues on Astoria’s side have been resolved. As with LINQ, developers who use Genome are not hindered from using this technology.

Entity Framework (EF)

Entity Framework actually consists of three major modules:

  • Entity Data Model (EDM): this is an abstraction of a relational model that introduces higher level concepts such as inheritance, composition and associations. Any database ER model can be mapped to an EDM. It also provides a neutral (i.e. vendor-neutral) dialect of SQL. Developers can map their databases to EDM and formulate queries to them in eSQL. EDM exposes “entities”, which are not CLR classes but rather structured data rows with meta data attached.
  • Provider Model: this is an extensibility point of Entity Framework for database vendors, to allow them to adapt eSQL and the EDM (data types, etc.) to vendor-specific database models (vendor SQL and database type systems).
  • LINQ To Entities: this is an object-relational mapping tool that allows CLR class models to be mapped to an EDM. In other words, it maps CLR classes to EF entities.

Genome actually overlaps with LINQ To Entities to a certain degree. Entity Framework itself is much more than an O/RM, as it represents the next level of abstraction for data access on the .NET platform (hence its original name, ADO.vNext). If Entity Framework proves to be useful and is widely adapted by our target customers, we can imagine integrating Genome with Entity Framework by replacing LINQ to Entities and allowing CLR business models to be mapped to EDMs with Genome. This would help our customers benefit from the Genome O/RM API and utilise EDM for other applications such as reporting, etc.

Our main concerns about Entity Framework and Genome’s value proposition:

Technical Overkill

There is the potential that the proposed development model and abstraction required by Entity Framework is overkill for certain applications (e.g. there are three models and all mappings between them need to be managed).

Tools provided by Entity Framework heavily depend on visual designers integrated in Visual Studio to manage the various mapping models and generate code from the models. This is especially the case with large and complex projects that involve large and complex models – which is what Entity Frameworks seems to target. We strongly doubt that relying on visual designers to that extent is a good approach. For example, resolving a merge conflict in the model (as can easily occur in projects with large teams) is not possible with a graphical designer, thus forcing developers to edit models manually.

Version 1 issues

Of course any first version of a product will have some immaturity issues which people usually have to more or less work around. However, since Entity Framework provides a radical and very complex new concept for abstracting data access, the functional completeness of Version 1 is very low compared to what the concept itself covers. The danger of encountering issues that are difficult or impossible to resolve is quite high in Version 1. This can be a particular problem in large enterprise projects, which is of course what Entity Framework appears to target.

The bottom line

The funny thing is that while LINQ2SQL is too simple for many applications, Entity Framework seems to be far too complex for many of our cases.

We are going to continue polishing Genome into an O/RM that we think is sophisticated enough to serve complex enterprise projects while also remaining simple enough to not force over-engineering. We are just about release Genome V4. Working on O/RM for .NET since 2002 has given us quite a lot of confidence in our approach: we balance flexibility and simplicity. We ensure that our customers are not locked out of technology trends on the .NET platform, so we will continue to integrate Genome with new technology concepts introduced by Microsoft in this field. We hope that our position as the first 3rd party O/RM to integrate with LINQ has already proven our commitment to this strategy.

Tuesday, 05 August 2008 17:18:05 (W. Europe Daylight Time, UTC+02:00)  #    Disclaimer  |  Comments [0]  | 
 Tuesday, 05 February 2008

No, this article does not nag about some code I've seen that misuses new features. This is how I did it - on purpose.

I've always disliked the way I usually set up data in the database for testing: recreate the database, create the domain objects, setting all the necessary properties, commit the context. Take this code for example:

DataDomainSchema schema = DataDomainSchema.LoadFrom("SomeMappingFile");
schema.CreateDbSchema(connStr);

DataDomain dd = new DataDomain(schema, connStr);

using (Context.Push(ShortRunningTransactionContext.Create()))
{
  Customer tt = dd.New<Customer>();
  tt.Name = "TechTalk";

  RootProject tt_hk = dd.New<RootProject>();
  tt_hk.Name = "Housekeeping";

  ChildProject tt_hk_hol = dd.New<ChildProject>();
  tt_hk_hol.Name = "Holiday";
  tt_hk.ChildProjects.Add(tt_hk_hol);

  ChildProject tt_hk_ill = dd.New<ChildProject>();
  tt_hk_ill.Name = "Illness";

  tt_hk.ChildProjects.Add(tt_hk_ill);

  tt.RootProjects.Add(tt_hk);

  RootProject tt_g = dd.New<RootProject>();
  tt_g.Name = "Genome";

  ChildProject tt_g_dev = dd.New<ChildProject>();
  tt_g_dev.Name = "Development";
  tt_g.ChildProjects.Add(tt_g_dev);

  ChildProject tt_g_mnt = dd.New<ChildProject>();
  tt_g_mnt.Name = "Maintenance";
  tt_g.ChildProjects.Add(tt_g_mnt);
  tt.RootProjects.Add(tt_g);

  Context.CommitCurrent();
}

What I dislike in this is the 'setting all the necessary properties' part. Part of it is that it's hard to follow the hierarchy of the objects.

The other is that I'm lazy.

Even if I'm typing with considerable speed - and keep pressing ctrl+(alt)+space and let ReSharper do the rest - I still hate it for its repetitiousness. I always wanted to have something like ActiveRecord's Fixtures in Rails - but I never had the time to implement it. Yeah, typical excuse, and that's how we usually lose development time even in the short run, so I know I'll have do it the next time I need to create test data.

Sure, I could always create builder methods for every type to handle, passing in the property values and collections etc, but even creating those is yet another repetitious task. I always longed for some more 'elegant' write-once-use-everywhere kind of framework. So when I read this post, I thought, maybe I can get away with writing a simple, but usable enough, initializer helper extension. Here's the resulting initializing code:

...

using (Context.Push(ShortRunningTransactionContext.Create()))
{
  dd.Init<Customer>().As(
     Name => "TechTalk",
     RootProjects => new Project[] {
       dd.Init<RootProject>().As(
         Name => "Housekeeping", 
         ChildProjects => new Project[] {
           dd.Init<ChildProject>().As(Name => "Holiday"),
           dd.Init<ChildProject>().As(Name => "Illness")
         }),
       dd.Init<RootProject>().As(
         Name => "Genome", 
         ChildProjects => new Project[] {
           dd.Init<ChildProject>().As(Name => "Development"),
           dd.Init<ChildProject>().As(Name => "Maintenance")
         })
       });

  Context.CommitCurrent();
}

Prettier to the eye - but unfortunately, it's still not practical enough. For one thing, it’s easy to represent a tree this way, but it still doesn't offer a solution for many-to-many relations. That's a lesser concern though, and I have ideas for overcoming this (but haven’t done it so far due to lack of time, again). A greater problem is that it's not type safe: the parameter names of the lambdas (Name, RootProjects, ChildProjects) are just that - names, aliases; they are not checked during compile time. Even as a dynamic typed language advocate, I don't like too much dynamic behavior in statically type languages - that usually results in little gain if any, while losing their advantages, even 'developer-side' ones, like refactoring or intellisense support.

So, no conclusions there - I don't know which way I prefer yet. It seems that I really will have to go on and write some xml-file based initialization library (which will share some of the abovementioned problems of the non-static languages, of course, but renaming those properties in the config by hand which you just modified in the code at least feels a bit more normal).

Still, if you're interested, here's the extension for doing the job:

public static class DataDomainInitializerExtension

{
  public static DataDomainInitializer<T> Init<T>(
      this DataDomain dd, params object[] parameters)
  {
    return new DataDomainInitializer<T>(dd.New<T>(parameters));
  }
}

public class DataDomainInitializer<T>
{
  private readonly T target;
  public DataDomainInitializer(T obj)
  {
    this.target = obj;
  }

  public T As(params Expression<Func<string, object>>[] expressions)
  {
    foreach (Expression<Func<string, object>> expression in expressions)
    {
      object value = GetValue(expression.Body);
      string key = expression.Parameters[0].Name;

      PropertyInfo property = typeof(T).GetProperty(key, 
        BindingFlags.Instance
        |BindingFlags.Public
        |BindingFlags.NonPublic);

      Type collectionType = GetCollectionType(property.PropertyType);
      if (collectionType != null)
      {
        CopyCollection(property, collectionType, value);
      }
      else
      {
        property.SetValue(target, value, null);
      }
    }
    return target;
  }

  private void CopyCollection(
      PropertyInfo property, Type collectionType, object collection)
  {
    object targetProperty = property.GetValue(target, null);

    MethodInfo addMethod = collectionType.GetMethod("Add");
    foreach (object enumValue in (IEnumerable)collection)
    {
      addMethod.Invoke(targetProperty, 
                       new object[] { enumValue });
    }
  }

  private static Type GetCollectionType(Type type)
  {
    foreach (Type @interface in type.GetInterfaces())
      if (@interface.IsGenericType && 
          @interface.GetGenericTypeDefinition() 
            == typeof(ICollection<>))
          return @interface;

     return null;
  }

  private static object GetValue(Expression expression)
  {
     ConstantExpression constExpr = expression as ConstantExpression;
     if (constExpr != null)
       return constExpr.Value;
     return (Expression.Lambda<Func<object>>(expression).Compile())();
  }

}

Posted by Attila.

Genome | Linq
Tuesday, 05 February 2008 13:31:19 (W. Europe Standard Time, UTC+01:00)  #    Disclaimer  |  Comments [0]  | 
 Tuesday, 18 September 2007

While documenting/testing Genome 3.3 I stumbled about this strange behaviour, which seems to be a bug of the C# 3.0 beta 2 compiler.

I was trying to compile the following GROUP BY example with Genome:

var ordersPerCountryPerYear1 = from o in Helper.DB.Extent()
                               group o by new { o.Customer.Address.Country, o.OrderDate.Value.Year } into g
                               select new
                               {
                                 Country = g.Key.Country,
                                 Year = g.Key.Year,
                                 OrderCount = g.Count()
                               };

And received the following error from the compiler:

error CS1061: 'System.Linq.IGrouping' does not contain a definition for 'Count' and no extension method 'Count' accepting a first argument of type 'System.Linq.IGrouping' could be found (are you missing a using directive or an assembly reference?)

However, my team insisted that the extension method Count() is provided by Genome. To find out, why the compiler does not find it, they asked me to call it directly in Main():

TechTalk.Genome.Extensions.Linq.InternalIGroupingExtensions.Count(null);

After inserting this call in my code, the program suddenly compiled (including the statement, the C# compiler complained about previously).

We think this is a bug of the compiler. As a workaround I now have the following method on one class in my project to satisfy the compiler :-) :

static void ThisIsNeverCalled()
{
  TechTalk.Genome.Extensions.Linq.InternalIGroupingExtensions.Count(null);
}

Posted by Chris

Technorati Tags:

Tuesday, 18 September 2007 17:24:01 (W. Europe Daylight Time, UTC+02:00)  #    Disclaimer  |  Comments [5]  | 
 Friday, 05 January 2007

In the recent months there has been a lot of confusion in the community about what LINQ is and what it is not. If you discuss this topic with others and read through the blogs you will find a lot of different perspectives and opinions on LINQ.

Most of the questions and misconceptions about LINQ I have encountered are about mixing up LINQ with an O/RM system and not understanding the impact of LINQ to .NET based O/RMs.

This is a brief summary about LINQ and how it relates to O/RMs, using Genome as a concrete example.

Genome | Linq
Friday, 05 January 2007 18:55:23 (W. Europe Standard Time, UTC+01:00)  #    Disclaimer  |  Comments [0]  |