Monday, 10 April 2006

We have just released a new version (2.5) of our O/R mapping tool, Genome. You may already have noticed that TechTalk frequently releases new versions of Genome. When we created the first automated data access layer implementation for one of our .NET projects four years ago (which later became Genome), we tried to build up a core framework that can be easily extended in many ways without breaking the existing public API. This allows us to provide frequent updates with improved functionality that can be used easily by our customers without spending valuable efforts on integration.

Genome 2.5 is a bit different. You might not notice any difference because this release is as backwards compatible as previous ones: when you upgrade, you most probably don’t have to do anything else but modify your references to Genome to use the new version. However, Genome 2.5 introduces a big change, or at least a bigger one than usual: we have integrated support for Oracle and Microsoft database server platforms into one build. This will also allow us to extend Genome’s support for further database platforms in the future more easily and it has made Genome, our lovely child (as we’re used to calling it), a teenager who has successfully graduated from elementary school.

You’ll probably say: “What’s the big deal? There are several O/RM products on the market that support multiple database platforms.” Well, that’s true, but there can be significant differences in how they support a specific database platform. The challenge is not to provide an abstraction for the various SQL syntax flavours used by different database platforms but to deal with the semantic differences found in those platforms, such as identity generation (automatically incremented field value versus sequence), limiting the result set to a maximum row count (TOP n vs. ROWNUM <=n), handling LOB parameters in grouped database commands or different database type systems.

In Genome 2.5 we have tried to provide transparent integration with the different database providers. And in most cases, it just works! You simply change the database provider setting in the properties of your DataDomain schema project, modify the connection string, and your application is ready to run on a different database platform. We have found, however, that providing a safe, straightforward and transparent solution is not always possible. Because of the semantic differences mentioned before, there are cases when you cannot achieve exactly the same functionality on another database platform. One problem is that of the empty string, to finally arrive at the topic I wanted to talk about.

When we released the first Oracle Technology Preview edition of Genome, a customer who tried it out asked me about an error he observed in his application: one of the Genome calls threw an ADO.NET exception saying that the length of a string parameter was not specified correctly. After some investigation it turned out that he tried to update a persistent property with an empty string. He was very surprised when I told him that the problem arose because Oracle cannot store empty strings in tables and that ADO.NET uses this weird error message to notify you about it. He showed me SQL commands from his last project where he updated fields with empty strings and it worked. Yes, of course it worked, because he built up SQL commands in a string using double apostrophes ('') to represent empty strings. I showed him that these empty strings are automatically replaced by NULL values when the Oracle SQL parser processes the query.

While such implicit conversions can happen in loosely typed SQL, the strongly typed nature of OQL does not allow them, especially if they depend on the selected target database platform. Anyway, even when not trying to abstract differences of database platforms (and sticking to SQL), I'm quite sure that in the case mentioned before the application still contains undiscovered errors, because the customer was not aware of this implicit transformation performed by the Oracle SQL parser. To give you an example, let's say that you want to list items with empty text fields. Your record won’t be found if you express your query with "MyField = ''" because it will be interpreted as "MyField = NULL" when you should have written "MyFileld IS NULL". I therefore agree with the ADO.NET approach: if you cannot use empty strings in the database, then your application should be made aware of that by triggering an exception (why you get such a useless error message from ADO.NET is a different story…).

After some heated discussions in the Genome team, we decided that since we cannot provide a bullet-proof seamless solution to this problem, we will delegate the responsibility of resolving it to the developer of the application code.

There are several strategies available to the application code for dealing with this situation. One solution is to avoid using empty strings in the business layer and use null values instead. Unfortunately, this is not too convenient an option for simple applications based mostly on (web) for example because of how data binding works for ASP.NET controls, e.g. the TextBox control in ASP.NET handles cases when the Text property is set to NULL such that it only displays an empty text box on the page, allowing the user to enter the desired value. If the text box is left empty, however, the Text property will return an empty string, although it was initialised with NULL. This means you cannot simply bind the text box result to the property of your business object automatically, since it won't be able to handle the returned empty strings.

As you can imagine, there are many possible solutions to this problem. Depending on your scenario, you might want to introduce a specialised TextBox or TextBox adapter to carry out the transformation properly. Or you can go for a completely different strategy to handle your empty strings. In any case, you can see that Genome should leave these decisions up to you and not try to provide a silver bullet which you might end up shooting yourself in the foot with.

In the scenario I want to discuss, I have decided to work with NULL values instead of empty strings. Because of the problems described before, I still need to be prepared to receive empty strings on the property setters of my business logic. Because of that, I imagine the following generic pattern for a persistent property:

public string MyProperty
  get { return _myProperty; }
  set { _myProperty = value == "" ? null : value; }

In the following, I'll show you a solution that uses Genome extensibility features to help you implement this pattern for Genome persistent objects very easily, without having to code this pattern manually over and over again. What I will do is provide a transparent aspect for Genome proxies generated for my business class which will set a NULL value to a persistent property whenever an empty string is used.

I will go through implementing such an aspect as a Genome extension step by step.

You can find the final solution attached to this article. If you are not interested in the implementation details, you can skip to the "Using the new aspect" section of this article.

Implementing a new Genome mapping aspect

When using Genome to map a business layer, persistent properties of the business class are left abstract, since Genome will provide the data access logic for those properties. During the DataDomain schema compilation, Genome derives from the abstract class and implements all persistent properties (as well as a lot of other things) with the appropriate data access logic.

What I want to do now is define an additional aspect for the property setter, which wraps the original property setter generated by Genome with logic to replace empty strings with NULLs. The final property implementation should then look like this:

public override string MyProperty
  get { return genome-data-access-logic; }
    genome-data-update-logic(value == "" ? null : value);

This additional aspect is configurable through a new mapping element that can be applied to a <Member/> element. Since a class mapping in Genome can be composed of several mapping files (similar to partial classes in C#), this aspect can be encapsulated in a separate mapping file that extends the core mapping of a business layer just for the Oracle release of an application.

As the first step, I’ll introduce the new mapping element describing my aspect to Genome. In Genome, all XML elements and attributes that can be used in the mapping file are bound to .NET classes that will be populated during schema compilation with values specified in the mapping file. Similarly to .NET attributes, where new attributes can be introduced by using the "Attribute" postfix for the class name describing the new attribute, Genome recognises the "XmlData" postfix for introducing new mapping elements. For example, to parse the <PersistentField> XML element, the PersistentFieldXmlData class is instantiated (this class is defined in TechTalk.Dal.dll).

When using XML elements in your mapping file, you can refer to them in full qualified manner including their namespaces or just with their simple names. In the latter case, you have to specify the corresponding namespaces with a <Using> element that Genome should resolve the XmlData classes from.

In my case, I want to use a new mapping element, let’s call it <ReplaceEmptyString/>, to augment <Member/> mappings of a business class with my newly introduced aspect:

<Member name="MyProperty">
  <PersistentField />
  <ReplaceEmptyString />

As I mentioned, I have to implement a .NET class, ReplaceEmptyStringXmlData, to allow this. This class is instantiated when the mapping file is parsed and the <ReplaceEmptyString/> element is found. To extend the functionality of a member mapping in Genome (<Member/>), this class has to implement the interface "IXmlMemberMapper" which specifies a "Map" method.

Before I go further, I want to explain the Genome DataDomain schema compilation process, which is split into three phases:

  1. Parsing and schema definition building
  2. Schema compilation
  3. Assembly generation
Schema Compilation Phases

In the first phase, all features that want to participate in the schema compilation have to register themselves. The order of registration does not matter here, because the compilation order of features is based on their dependencies. As a result of the first phase, an interim structure called schema definition is created in memory.

In the second phase, the compiler investigates the features registered during phase 1 for their dependencies and sets up a compilation order. The compilation is then performed by calling a specific interface method on the features in the calculated order. As a result, the DataDomain schema is set up.

The third phase generates code for Genome proxies (the classes derived from the abstract persistent classes) and compiles the proxy classes together with the serialised DataDomain schema into the mapping assembly.

Phase 1

Following this pattern, the Map method of the ReplaceEmptyStringXmlData class is called in the first phase of the schema compilation process to register my aspect in the DataDomain schema:

public class ReplaceEmptyStringXmlData : IXmlMemberMapper
  public void Map(MemberDefinition memberDef)
    // register our compiler into the schema definition for the member
    memberDef.Compilers.Add(new ReplaceEmptyStringCompiler());
Phase 2

The ReplaceEmptyStringCompiler class instance I have created is investigated by the Genome compiler during the second phase to identify dependencies and also to configure aspects in the resulting schema. This is all done through the IMemberFeatureCompiler interface implemented by ReplaceEmptyStringCompiler.

In this case, my ReplaceEmptyStringCompiler feature configures a code generation aspect for the member, using the original implementation configured by the PersistentField feature. My compiler must therefore only be called after PersistentField has already done its job, so it depends on the configuration created by the PersistentField feature:

public class ReplaceEmptyStringCompiler : IMemberFeatureCompiler
  public CompilerStep[] GetDependencies(SchemaCompilerContext context, Member member)
    // This dependency will ensure that my implementation is called
    // only _after_ the so-called main compiler of the member, which is
    // PersistentField in this case.
    return new CompilerStep[] { MemberCompiler.CreateMemberCompiler(context, member) };
  public void CompileDds(SchemaCompilerContext context, Member member)
    ICodeMemberCompiler baseCompiler = member.Schema.AssemblyProvider[member];
    // I register my code generator, using the one 
    // provided by PersistentField (baseCompiler)
    ReplaceEmptyStingCodeGenerator.SetMemberGenerator(member, baseCompiler);
  // I do not provide context-sensitive error messages now
  public XmlParserContext Context { get { return null; } }

SetMemberGenerator registers the new code generator for the member and writes an information message to the compiler console output for debugging purposes.

static public void SetMemberGenerator(Member member, ICodeMemberCompiler baseCompiler)
  if (baseCompiler == null)
    throw new NotSupportedException(
      "This member does not support code generation.");
  member.Schema.AssemblyProvider[member] = 
    new ReplaceEmptyStringCodeGenerator(baseCompiler);
  Console.WriteLine("Empty String Replacer added: {0}", member);
Phase 3

To implement a member code generator in Genome, the ICodeMemberCompiler interface has to be implemented. My implementation simply delegates all calls to the original generator except the GetMemberSetterCode method, where I want to add the empty string check of my aspect. The implementation generates a conditional expression (?:) using CodeDom and calls the original generator using this generated expression instead of the expression representing the .NET built-in "value" variable.

Please note that .NET CodeDom does not support generating conditional expressions. As a workaround, you can generate a code that uses a normal condition (if) and sets a temporary variable to the proper value in the condition cases. The ConditionalCodeExpression class, defined in the TechTalk.Services.dll shipped with Genome, is a transparent implementation of this workaround.

public CodeStatementCollection GetMemberSetterCode(
  Member member, 
  TypeImplementation typeImpl, 
  CodeExpression[] parameters, 
  CodeExpression value, 
  CodeCompileContext context)
  // modify .set(value) to
  // .set(value == "" ? null : value)
  ConditionalCodeExpression conditionalExpr = new ConditionalCodeExpression(
    new CodeBinaryOperatorExpression(
      new CodePrimitiveExpression("")),
    new CodePrimitiveExpression(null),
  return baseCompiler.GetMemberSetterCode(member, typeImpl, parameters, conditionalExpr, context);

Having implemented the code-generator for my aspect, I am all set to use my newly defined aspect.

There is one more trick. As you have probably noticed, some of the features in the mapping file can also be expressed as XML attributes of the parent feature element. To use our new ReplaceEmptyString feature as an attribute, I only have to extend it with a default property recognised by Genome's parsing. The easiest way to do this is to create a public field called "value", in this case of type bool, for the ReplaceEmptyStringXmlDate class. After that, I can declare the feature as well:

<Member name="Name" PersistentField="Name" ReplaceEmptyString="true" />

Nice, isn't it?

The attached file also contains another reincarnation of my empty string replacement aspect. The <EmptyStringReplacer/> element can be placed under the root element of a mapping file and applies the empty string replacement aspect for all string persistent fields without needing to individually modify their mappings. Since this aspect applies to the whole mapping schema, you can create it in a separate mapping file of your DataDomain schema project which contains just this single mapping. This is probably the ideal solution to the Oracle problem (see EmptyStringReplacerXmlData for details).

Debugging the new aspect

If you’re working on code generation extensions for Genome you’ll probably find it useful that the C# file generated by CodeDom is saved to your local temp directory. You can open that file (usually the latest C# file in your temp folder) and check whether the expected code was generated (and you can also see the code Genome generates for the proxy implementation of your business classes).

To debug your code generator, you have to attach to the ddsc.exe process during compilation. You have to be very fast or use the "Pause" button to suspend execution while you attach. A more convenient solution is to include a Debugger.Break() call in one of your compiler methods.

Using the new aspect

Using the new aspect is straightforward:

  1. Include the C# file attached to this article in your BL assembly (or any other assembly referred by the datadomain schema project). You have to add references to the TechTalk.Services.dll and TechTalk.Xml.Serializer.dll assemblies to the project in order to compile this file. These assemblies can be found in the Genome installation's bin folder.
  2. To use the aspect with its short name, you need to specify a namespace using in the affected mapping files:
    <Using namespace="TechTalk.Genome.Extensions" />
  3. To add the aspect to individual persistent properties (only for properties of type string), just specify the ReplaceEmptyString="true" attribute for the <Member> element, like this:
    <Member name="MyProperty" PersistentField="Name" ReplaceEmptyString="true" />
    Or, if you prefer the element notation, you can specify the aspect like this:
    <Member name=”MyProperty”>
  4. To apply the aspect on all persistent string properties of the whole mapping schema, add a new mapping file with just one element to your DataDomain schema project (as well as the usual usings):
    <TechTalk.Genome.Extensions.EmptyStringReplacer />


As you can see, Genome can be extended with new code generation aspects quite easily. You can implement similar extensions on your own.


The sample code provided was tested with Genome v2.5.2. TechTalk does not assume liability for the code provided in this sample.



Posted by Gáspár