Thursday, 15 March 2007

George Lawton contacted me at the end of February to ask some questions about O/RM as he was working on a story for theserverside.net. The story has now been published, and I think George Lawton and Jack Vaughn did a good job of providing an accurate analysis of the current situation of the O/RM market for .NET.

When I received George’s email, I was quite surprised that he was inquiring about the situation that people allegedly complain about O/RMs generating quick wins in the beginning that you pay dearly for at a later stage. In his article, you can read how strongly I disagree with this myth and I was pleased to see that other people quoted in his article feel the same.

George asked us the following three questions, which I found very interesting to discuss:

  1. What specific features of Genome make it simpler to use, both initially and over time, than other O/RM tools?
  2. What have been some of the major challenges in the use of O/RM tools, and what are the ways you have gone about addressing these?
  3. What specific tips do you have to offer developers in getting the most out of using O/RM tools as part of the software development process?

Intrigued by his questions, I put together quite extensive replies – replies that may be of interest to others, too. Based on my answers to George, I have put together this article to outline our thoughts on the issues above and give some advice to developers who are evaluating O/RMs.

Features that make Genome simpler to use

Database reverse engineering

Genome allows you to initiate projects by reverse engineering legacy database structures into mapping and domain models that are worked with later. This can spare a lot of time and effort in the initial stages of a project, when starting out with a legacy database.

It is important to understand, however, that Genome is not intended to work up from the database to the object model in the regular course of a project. The reverse engineering is just a kick-start when starting off with a legacy database. Once you do O/RM, the object model becomes much richer than the database model. Databases simply do not allow the richness of expression that object models do, so it does not make sense to continue modelling the database as the leading schema in ongoing development.

XML-based mapping

O/RM tools use XML or attribute-based mapping and each approach has its advantages and disadvantages. Genome (like other popular Java ports) uses XML-based mapping.

The advantage of XML mapping over attribute-based mapping is the ease with which you can express and maintain complex mappings. Simple mappings may be better to express in attributes, but the more advanced O/RM tools use XML because it offers a far richer set of possibilities when mapping. Of course, there is no ideal solution, which is why there are two approaches to begin with.

Handling of mapping files

Genome is a feature-rich O/RM designed to handle complex scenarios, so XML-based mapping was the obvious choice. Maintaining mapping in XML files involves a small additional overhead due to having to navigate between source code and mapping and partially having redundant data in both places. To counterbalance this minor overhead, Genome integrates with Visual Studio, allowing you to forward engineer from source code to mapping and conveniently navigate between source and mapping within Visual Studio by using shortcut keys and context menus.

Many other O/RMs that use XML-based mapping do not offer this kind of convenience, making it hard and costly to maintain mapping information with them.

Processing of mapping files

Many O/RMs process mapping during runtime, leading to runtime errors and performance penalties. Genome has a compiler that processes the mapping files during build and therefore does not suffer these problems.

Genome OQL

Like almost all O/RMs, Genome introduces its own Object Query Language, or OQL. OQL differs from other proprietary query languages in that it

  • Checks queries during compile time, eliminating hard-to-detect runtime errors
  • Allows query decomposition and reuse (in both OQL and LINQ), which pays off when refactoring later.

These advantages are also valid when comparing OQL to SQL. A slight disadvantage shared by all O/RMs is that developers must learn the proprietary OQL of their O/RM, as opposed to SQL, which is commonly known.

Genome’s OQL however is very similar to C# and hence very easy for developers to learn. This problem will be solved with the advent of LINQ in due course anyhow, as it will set a common standard for expressing queries for all O/RMs.

That leads us to another advantage of Genome:

  • Genome’s proprietary query language and LINQ are treated as equal alternatives, bringing the added value of enabling you to migrate projects to LINQ at any time without scrapping any existing code. Projects started today will not become obsolete with the arrival of LINQ.
  • For the same reason, you are free from being shackled to Genome’s proprietary language because LINQ can be used to the same effect.

Why not use SQL?

SQL does not provide the features mentioned above and is not suitable for expressing queries against object models. SQL is perfectly adequate for working with tables and fields, but not with object models.

Genome, however, offers the possibility to seamlessly integrate custom SQL code into your domain model, if necessary.

Challenges of using O/RM

Expression power of query languages

If the tool provides only a poor query language (e.g. only query by example), you cannot leverage the full power of a database platform. A fully featured O/RM does not limit you in the kind of query logic that you can run against the database. You can do everything in OQL that you can do in SQL.

Performance

That old chestnut: time and again, O/RM is accused of degrading performance significantly. It is very difficult to provide conclusive comparisons in terms of performance. What we have done though is re-implement MS PetShop with Genome to demonstrate that performance need not suffer when O/RM is used (we will publish more about this soon):

  • Our implementation issues the same kinds of queries as the original
  • Genome infrastructure adds less than 10% overhead to the manually tuned layer of the MS implementation
  • Genome’s approach requires *less than half* the lines of code
  • Both solutions scale equally, which is much more important than the 10% performance difference.

Our everyday experience here at TechTalk also delivers evidence that using Genome does not adversely affect performance: we have numerous large projects (large public websites and projects involving development efforts of up to 75,000 person-days) where we haven’t run into any performance problems.

Integrating custom SQL code

Sometimes you have to perform tasks on a database for which an O/RM is simply no good (batch deleting or updating large sets of data, where you don’t need middle-tier processing in between). In these cases, it is important that you are not trapped in the O/RM system but have the possibility to directly reach out to the database in a way that it is still integrated with the O/RM.

Adapting to legacy database schemas

Another important feature of an O/RM is its ability to adapt to legacy database models, which have often existed in the organisation for a long time and cannot be changed (a typical old architectural anti-pattern in enterprise-wide development is to use a database as the interface between independent systems). To support such dinosaurs, an O/RM has to be flexible as regards primary key structures (field types and composites) and primary key value generation (database or business logic generated) and must take foreign key constraints into account when updating.

Tips and advice

Think OO

It is important to understand that O/RM requires strong knowledge of object-oriented programming. If you are not thinking (heck, breathing) OO, then O/RM is no good for you.

Be aware of the architectural implications

An O/RM framework is indeed a central, determining factor in any architecture, regardless of how “transparently” it offers persistence services. Be aware of all the implicit architectural decisions you are making when using an O/RM. Checking whether an O/RM supports all the architectural scenarios you are going to need is crucial.

Don’t try to abstract the O/RM

This is related to the previous tip. The more you try to abstract an O/RM from your architecture, the more plumbing work you have to do and the more O/RM features you lose. The tack of abstracting an O/RM so much that you can *easily* exchange it for another is also a myth: by the time you have abstracted enough, you will have reduced the whole thing to the common set of features of all possible O/RMs, which is zero – they do not have anything fully in common. The challenge is therefore not to abstract it to the point of non-existence, but to make sure that you that your scenarios are supported.

Check and double-check the support and documentation available

There’s no doubt that O/RMs are quite complex frameworks that require effort to learn. This is where the quality of documentation and samples comes in and that’s why users are well advised to check both before settling on a tool. It also means that active, responsive support is vital for lowering the entry barrier. Related to occasional difficulties in this area is the current lack of common standards for O/RM in .NET. As mentioned earlier, this issue at least may be resolved by LINQ in the near future.

Go future-proof

LINQ is a fact for the future – the question is only when, whether it will be launched proper at the end of 2007 or if we need to wait until 2008. O/RMs should provide a good migration story to LINQ (I think we do pretty well in this regard) and protect your project investments for the time after LINQ has been released.

Posted by Chris