Architectural Overview: Using LINQ in WCF

Today I would like to give an architectural overview of my usage of LINQ.  This may actually become the first in a series of architectural discussions on various .NET and AJAX technologies.  In this discussion, I'm going to be talking about the architecture of the next revision of my training blog engine, Minima.  Since the core point of any system is that which goes into and comes out of the system, the goal of this commentary will be to get to the point where LINQ projects data into WCF DTOs.  Let me start by explaining how I organize my types.  Some of you will find this boring, but it's amazing how many times I get questions on this topic.  For good reason too!  These questions show that a person's priorities are in the right place as your type, namespace, and file organization is critical to the manageability and architectural clarity of your system.

However, before we get started, let me state briefly that as I've stated in my post entitled SQL Server Database Model Optimization for Developers, when you design your database structure you should design it with your O/R mapper in mind.  If you don't, then you will probably fall into all kinds of problems as my post describes.  This is incredibly important, however, if you keep to normal everyday normalization procedures, you are probably doing OK for the most part anyway.  Since I've written about that before, there's no reason for me to go into detail here.  Just know that, if you database design sucks, your application will probably suck too.  Don't built your house on the sand.

In terms of LINQ, I actually use the VS2008 "LINQ to SQL Classes" template to create the LINQ information.  In most every other area of technology, it's a good practice to avoid wizards and templates like the plague, but when it comes to O/R mapping, you need to be using an automated tool.  If your O/R mapper requires you to do any work (...NHibernate...*cough*cough*), then you can't afford to work with it.  You need to be focusing on the business logic of your system, not playing around with mechanical nonsense.  As I've said in other contexts, stored procedures and ad hoc SQL are forms of unmanaged code.  When you are managing the mechanics of a system yourself, it's, by definition, unmanaged.  Stored procedures and ad hoc SQL are to LLBLGen/LINQ as ASP/PHP is to ASP.NET as C++ is to .NET languages.  If you are managing the mechanical stuff yourself, you are working with unmanaged code.  When it comes to using managed code, in the context of database access, this is the point of an O/R mapper.  Furthermore, if the O/R mapping software you are using requires you to write up templates or do manual mapping, that's obviously not completely managed code.

Now when I create a LINQ classes I will create one for each "architectural domain" of the system that I deem necessary.  For example, in a future release of Minima, there will be a LINQ class to handle my HttpHandler and UrlRewriting subsystem and another LINQ class to handle blog interaction.  There needs to be this level of flexibility or my WCF services will know too much about my web environment and my web site (a WCF client) will then have direct access to the data which the WCF service is intended to abstract.  Therefore, there will be a LINQ class for web site specific mechanics and another LINQ class for service specific mechanics.  Also, when I create the class for a particular domain I will give it a simple name with the suffix of LINQ.  So, my Minima core LINQ class is CoreLINQ.cs and my Minima service LINQ class is ServiceLINQ.cs.  Simple.

Upon load of the LINQ designer and either after or before I drop in the specific tables required in that particular architectural domain.  Then I'll set my context namespace to <SimpleName>.Data.Context and my entity namespace to <SimpleName>.Data.Entity.  For example, in the Minima example, I'll then have Core.Data.Context and Core.Data.Entity.  One may argue that there's nothing really going on in Core.Data.Context to which I much respond: yeah, well there's already a lot going on in Core.Data (other data related non-LINQ logic I would create) and Core.Data.Entity.   The reason I say "after or before I drop in the specific tables" is to emphasize the fact that you can change this at a later point.  It's important to keep in mind at this point that LINQ doesn't automatically update its schema with the schema from your database.  LLBLGen Pro does have this feature built in and it does the refreshing in a masterful way, but currently LINQ doesn't have this ability.  Therefore, to do a refresh, you need to do a "CTRL-A, Delete", to delete all the tables, do a refresh in Server Explorer, and then just re-add them.  It's not much work.

Now, moving on to using LINQ.  When I'm working with both LINQ entities (or LLBLGen entities or whatever) and WCF DTOs in my WCF service, I do not bring in the LINQ entity namespace.  The ability to import types in from another namespace is one of the most powerful set under appreciated features in all of .NET (um.. JavaScript needs them!), however, when you have a Person entity in LINQ and a Person DTO, things can get confusing fast.  Therefore, to avoid all potential conflicts, my import is left out and I, instead, keep a series of type aliases at the top of my service classes just under the namespace imports.  Notice also the visual signal in the BlogEntryXAuthor table name.  This tells the developer that this is a many-to-many linking table.  In this case it's in the database schema, but if it weren't in there, I could easily alias it as BlogEntryXAuthorLINQ without affecting anyone else.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
//+
using DataContext = Minima.Service.Data.Context.MinimaServiceLINQDataContext;
using AuthorLINQ = Minima.Service.Data.Entity.Author;
using CommentLINQ = Minima.Service.Data.Entity.Comment;
using BlogLINQ = Minima.Service.Data.Entity.Blog;
using BlogEntryLINQ = Minima.Service.Data.Entity.BlogEntry;
using BlogEntryUrlMappingLINQ = Minima.Service.Data.Entity.BlogEntryUrlMapping;
using BlogEntryXAuthorLINQ = Minima.Service.Data.Entity.BlogEntryAuthor;
using LabelLINQ = Minima.Service.Data.Entity.Label;
using LabelXBlogEntryLINQ = Minima.Service.Data.Entity.LabelBlogEntry;
using UserRightLINQ = Minima.Service.Data.Entity.UserRight;
//+

Next, since we are in the context of WCF, we need to discussion validation of incoming information.  The following method is an implementation of a WCF service operation.  As you can see, when a user sends in an e-mail address, there is an immediate validation on the e-mail address that retrieves the author's LINQ entity.  This is why the validation isn't being done in a WCF behavior (even though there are tricks to get data from a behavior too!)  You may also note my camelCasing of instances of LINQ entities.  The purpose of this is to provide an incredibly obvious signal to the brain that this is an object, not simply a type (...as is the point of almost all the Framework Design Guidelines-- buy the book!; 2nd edition due Sept 29 '08)

//- @GetBlogMetaData -//
[MinimaBlogSecurityBehavior(PermissionRequired = BlogPermission.Retrieve)]
public BlogMetaData GetBlogMetaData(String blogGuid)
{
    using (DataContext db = new DataContext(ServiceConfiguration.ConnectionString))
    {
        //+ ensure blog exists
        BlogLINQ blogLinq;
        Validator.EnsureBlogExists(blogGuid, out blogLinq, db);
        //+
        return new BlogMetaData
        {
            Description = blogLinq.BlogDescription,
            FeedTitle = blogLinq.BlogFeedTitle,
            FeedUri = new Uri(blogLinq.BlogFeedUrl),
            Guid = blogLinq.BlogGuid,
            Title = blogLinq.BlogTitle,
            Uri = new Uri(blogLinq.BlogPrimaryUrl),
            CreateDateTime = blogLinq.BlogCreateDate,
            LabelList = new List<Label>(
                blogLinq.Labels.Select(p => new Label
                {
                    Guid = p.LabelGuid,
                    FriendlyTitle = p.LabelFriendlyTitle,
                    Title = p.LabelTitle
                })
            )
        };
    }
}

It would probably be a good idea at this point to step into the Validator class to see what's really going on here.  As you can see in the following class I have two methods (in reality there are dozens!) and most of it should be obvious.  The validation is obviously in the second method, however, it's the first one that's being directly called.  Notice two things about this: First, notice that I'm passing in my DataContext.  This is to completely obliterate any possibilities of overlapping DataContexts and, therefore, any strange locking issues.  Second, notice that I'm pre-registering my messages in a strongly typed Message class(notice also that the members of Message are not static-- the magic of const.)  This last piece could easily be done in a way that provides for nice localization.

Now moving on to the actual validation.  Unless I'm desperately trying to inline some code, I normally declare the LINQ criteria prior to the actual link statement.  Of course, this is exactly what the Func<T1, T2> delegate is doing.  Notice also that I try to bring the semantics of the criteria into the name of the object.  This really helps in in making many of your LINQ statements read more naturally: "db.Person.Where(hasEmployees)".

namespace Minima.Service.Validation
{
    internal static class Validator
    {
        //- ~Message -//
        internal class Message
        {
            public const String InvalidEmail = "Invalid author Email";
        }

        //- ~EnsureAuthorExists -//
        internal static void EnsureAuthorExists(String authorEmail, out AuthorLINQ authorLinq, DataContext db)
        {
            EnsureAuthorExists(authorEmail, out authorLinq, Message.InvalidEmail, db);
        }

        //- ~EnsureAuthorExists -//
        internal static void EnsureAuthorExists(String authorEmail, out AuthorLINQ authorLinq, String message, DataContext db)
        {
            Func<AuthorLINQ, Boolean> authorExists = x => x.AuthorEmail == authorEmail;
            authorLinq = db.Authors.SingleOrDefault(authorExists);
            if (authorLinq == null)
            {
                FaultThrower.Throw<ArgumentException>(new ArgumentException(message));
            }
        }
    }
}

In the actual query itself, you can see that the semantics of the method is that a maximum of one author should be returned.  Therefore, I'm able to use the Single or SingleOrDefault methods.  Note that if you use these and you return more than one entity, an exception will be throw as Single and SingleOrDefault only allow what their name implies.  In this case here, AuthorEmail is the primary key in the database and, by definition, there can be only one (at this point I'm sure about 30% of you are doing Sean Connery impressions).  The difference between Single and SingleOrDefault is simple: when the criteria is not met, Single throws an exception and SingleOrDefault returns the type's default value.  The default of a type is that which the C# "default" keyword will return.  In other words, a reference type will be null and a struct will be something else (i.e. 0 for Int32).  In this case, I'm dealing with my AuthorLINQ class, which is obviously a reference type, and therefore I need to check null on it.  If it's null, then that author doesn't exist and I need to throw a fault (which is what my custom FaultThrower class does).  What's a fault?  That's a topic for a different post.

As you can see from the method signatures, not only is the author e-mail address being validated, the LINQ entity is being returned to the caller via an out parameter.  Once I have this authorLinq entity, then I can proceed to use it's primary key (AuthorId) in various other LINQ queries.  It's critical to remember that you always want to make sure that you are only using validated information.  If you aren't, then you have no idea what will happen to your system.  Therefore, you should ignore all IDs that are sent into a WCF service operation and use only the validated ones.  A thorough discussion of this topic is left for a future discussion.

Now we are finally at the place where LINQ to WCF projection happens.  For clarity, here it is again (no one likes to scroll back and forth):

return new BlogMetaData
{
    Description = blogLinq.BlogDescription,
    FeedTitle = blogLinq.BlogFeedTitle,
    FeedUri = new Uri(blogLinq.BlogFeedUrl),
    Guid = blogLinq.BlogGuid,
    Title = blogLinq.BlogTitle,
    Uri = new Uri(blogLinq.BlogPrimaryUrl),
    CreateDateTime = blogLinq.BlogCreateDate,
    LabelList = new List<Label>(
        blogLinq.Labels.Select(p => new Label
        {
            Guid = p.LabelGuid,
            FriendlyTitle = p.LabelFriendlyTitle,
            Title = p.LabelTitle
        })
    )
};

The basics flow of this are as follows: In DataContext db, in the Blogs table, pull sub-set where PersonId == AuthorId, then select transform that data into a new type.  The DTO projection is obviously happening in the Select method.  This method is akin to a SELECT in SQL.  My point in saying that is to make sure that you are aware that SELECT is not a filter; that's what Where does.  After execution of the Where method as well as after execution of the Select method, you have an IQueryable<Blog> object, which contains information about the query, but no actual data yet.  LINQ defers execution of SQL statements until they are actually used.  In this case, the data is actually being used when the ToList method is called.  This of course returns a list of List<Blog>, which is exactly what this service operation should do.  What's really nice about this is that WCF loves List<T>.  It's not a big fan of Collection<T>, but List<T> is it's friend.  Over the wire it's an Array and when it's being used by a WCF client, it's also a List<T> object.

In closing I should mention something that I know people are going to ask me about: To project from WCF DTO to LINQ you do the exact same thing.  LINQ isn't a database-specific technology.  You can LINQ between all kinds of things.  Though I use LINQ for my data access in many projects, most of my LINQ usage is actually for searching lists, combining to lists together, or modifying the data that gets bounds to the interface.  It's incredibly powerful.

Moving into a non-Minima example, if, for example, you needed to have a person's full name in a WPF ListBox and the name-specific LINQ properties you have are FirstName and LastName property, instead of doing tricks in your ItemTemplate, you can just have your ItemsSource use LINQ to sew the FirstName and LastName together.

lstPerson.ItemsSource = personList.Select(p => new
{
    FullName = p.FirstName + " " + p.LastName,
    p.PostalCode,
    Country = p.Country ?? String.Empty
});

The really sweet part about this is the fact that LINQ entities implement the INotifyPropertyChanged interface, so when doing WPF data binding, WPF will automatically update the ListBox when the data changes!  Of course, this doesn't help you if you are doing a seriously SOA system.  Therefore, my DTOs normally implement INotifyPropertyChanged as well.  This is not a WPF-specific interface (it lives in System.ComponentModel) and therefore does not tie the business object to any presentation.

That should show you a bit more of how LINQ can work with all kinds of stuff.  Therefore, it shouldn't be hard to figure out how to project from a WCF DTO to LINQ. You could literally copy/paste the LINQ -> DTO code and just switch around a few names.

If you are new to LINQ, then I recommend the book Pro LINQ by Joseph C. Rattz Jr. However, if you are already using LINQ or want a view into its internal mechanics, then I must recommend LINQ in Action by Fabrice Marguerie, Steve Eichert, and Jim Wooley.