2005 2006 2007 2008 2009 2010 2011 2015 2016 aspnet azure csharp debugging exceptions firefox javascriptajax linux llblgen powershell projects python security services silverlight training videos wcf wpf xag xhtmlcss

10 Things Most Developers Didn't Know in 2007

To end 2007, I thought I would make a list of things which I found that most developers didn't know.  To make things more interesting, this list is actually a series of 10 mini-articles that I wrote today.  Since this article has several sub-articles, here's a table of contents to help you out (these aren't really in any order of importance):

#1  SQL Server supports powerful subqueries as anonymous sets.

Many developers don't take the time to seriously look at T-SQL or SQL Server internals.  As such, they miss many of SQL Server's more powerful features.  In January 2007, when co-worker saw he write the following query, he about fell out of his seat:

select MemberName, m.MemberId, count(*) from (select 
    distinct MemberId, 
    VisitUserAgent 
    from VisitSession 
    where MemberId is not null) a 
inner join Member m on a.MemberId = m.MemberId 
group by m.MemberId, MemberName, VisitUserAgent 
having count(*) > 1 
order by count(*) desc 

For starters, the guy didn't know you could do a filter after a group by, but that's not my point.  He had no idea that SQL Server (2000) allows you to use subqueries or use subqueries as anonymous sets.  As you can see, you can select from the list as well as use it in a join.  This tidbit alone should toss many painfully slow cursor-based stored procedures into the trash.  It's a simple SQL feature, but it's a powerful one.

#2  Firefox has an operating-system style console for web application debugging.

It's incredibly hard to find an ASP.NET web developer who knows this one.  It's a feature that knocks people right off their seats.  Instead of throwing alerts all over your AJAX applications, you can use the Firefox console and the dump( ) function.  Did I mention this has been a native feature since Firefox 1.0?

Step 1 (start Firefox with -console switch)

Step 2 (add the boolean key 'browser.dom.window.dump' to the Firefox configuration an set it to true)

Then simply call dump( ), instead of alert( ) and you're done. Your output will go to the Firefox console window (which looks almost exactly like a cmd window).

With this technique you can entirely avoid any possibility of having an infinite loops of alerts.  Personally, I like to track all the output of my web applications.  This comes in very handy when I'm using event capturing or need to watch the progressive state of my application.  When I do this, I also like to write an output identifier to each data dump.  Here's a sample of what I usually use for debugging:

var Configuration = { 
    Debug: false
}; 

var Debug = { 
    counter: 0, 
    write: function(text) { 
        if(Configuration && Configuration.Debug) { 
            dump(text); 
        } 
    }, 
    writeLine: function(text) { 
        if(Configuration && Configuration.Debug) { 
            Debug.counter++;        
            dump(Debug.counter + ':'+ text + '\n'); 
        } 
    } 
};

Here's some sample output using the Debug.writeLine( ) abstraction:

Leaves alert( ) in the dust, doesn't it? You can actually learn more about this technique and others from my Firefox for ASP.NET Web Developer video series found on my blog.  These topics are crucial to your understanding of modern web development.

#3  JavaScript has natively handled loosely-coupled multi-cast events for years.

This isn't something just for the Firefox, Opera, Safari world.  Even IE6 has native support for this feature. I'm not sure why this is, but in September 2007 when I was designing the AJAX exam for Brainbench, not a single one of the reviewers knew that JavaScript natively supported loosely-coupled multi-cast events.  I actually comments from almost all of the reviewers telling me that I should "leave server-side questions out of the exam".

JavaScript loosely-coupled multi-cast events are one of the most important core features of AJAX applications. They allow you to quickly and efficiently attach multiple event handlers to the XHTML same element. This becomes critically important when you are with multiple AJAX components, each of which that want to have an event handler attached to the load event of the window object.

I wrote an article about this in September 2007, so I'm not going to go into any kind of details here.  You my also opt to view this file from my SolutionTemplate, which supplements that blog entry.

#4  Not all image formats are created equal.

A few months ago, I came in as lead architect about half way through a project.  After having a few people fired for absolute incompetence, I did find a few people (PHP guys) who were ready, willing, and actually able to learn ASP.NET.  Everything was going well until the designer came back with his new theme and my associate whom I was training implemented it.  Everyone thought the project was going fine until I stepped in the room.  It didn't take but 10 seconds for a red flag to go up.  Just looking at the web site I could tell that this theme implementation was a disaster.  I noticed that there were signs of JPEG compression all over every single one of the images.  However, being a scientist and part-engineer I knew that measurement was a major key to success.  So, I whipped out Firebug, hit refresh and felt my jaw drop.  The landing page was 1.5MB.  Ouch.

You absolutely can not use one single image format for ever image on your web site, especially not the deadly JPEG format which does little more than destroy your images.  There are rules which web developers must need to follow or else a project is doomed to failure.  First off, you need to be using PNG24s for the all important images, while comparing their file sizes and quality with PNG8 compression.  Using Adobe Photoshop's Save For Web feature is very helpful for this.  If the image is a photo or something with many "real life" colors and shades, perhaps you want to do a size and quality comparison against a JPEG version as well.  If you absolutely need to have transparent images for IE6, then you need to take extreme care and either make special PNG versions for each background or, if you don't care too much about quality and the image is small with very few colors, use a GIF with transparencies.  The same goes for Firefox and printing.  Firefox (as of 2.0) does not print transparent PNG images.  So, if you want to support printing in Firefox, then you need to either make special PNG images for each background or make low-quality GIF images.

Needless to say, the designers theme had to go under severe reconstruction.  Not just because of the image sizes, but because he felt the need to design special input box, textarea, and button controls.  His design would have worked well for a WPF application, but this is the web (... but don't even get me started on the fact that his design for a wide screen monitor at over 1300x800.  The design was useless anyhow!)  The next project I ran as lead architect went much smoother.  Because it was extremely AJAX intensive, everything was minimized to the absolute core.  Each page had the minimal default.css plus it's own CSS sheet and only included the JavaScript it needed.  The web site landing page included barely anything and even had it's own extremely stripped down version of the JavaScript files.  For this project, I went from 350K in development to 80k in production.

#5  Custom server controls are not esoteric, complicated, or take too long to create.

  This seems to be a very common misconception amongst ASP.NET developers.  The reality, however, is that creating server controls is often a very trivial task.  Yet, many developers will use a GridView or other canned control for everything.  The GridView is awesome for basic tabular in a very simple, data-driven applications, but I can rarely use it.  On the other hand, I love the repeater and rely on it for almost everything.  Actually, it and the Literal are my two favorite controls.  I have to rely on these two controls to ensure that my AJAX applications are extremely optimized.  One of the beautiful things about .NET is that every ASP.NET control is simply a .NET class, which means that you can programmatically reuse them, inherit from them, and override their internals.  Thus, allowing us to create some powerful and elegant custom server controls.

On the same project with the overly sizes image files, we had an interesting meeting about how to show a media play list on a web page.  There was all kinds of talk about using Flash to create a media play list.  The conversation was quickly giving me an allergic reaction.  So, after hearing all kinds of absolutely insane quotes of time for creating a Flash play list, I decided to take matters in to my own hands.  Two hours later I handed the client a complete play list from A to Z.  To be clear, I had built this one something I had already had, but the grand total of time was them about 3 hours.  It's amazing what you can do when you understand the .NET framework design guidelines and aren't afraid to follow best-practices.

Here is how you would use a similar control:

<%@ Register Assembly="Jampad.Web" Namespace="Jampad.Web.Controls" TagPrefix="j" %>

<j:Media id="media01" runat="server" />

In your code behind, you would have something that looked like this:

media01.DataSource = MediaAdapter.GetContent(this.MemberGuid);

Upon loading the page, the data was bound and the output was a perfect XHTML structure that could them be customized in any number of ways using the power of CSS.  How do you make something like this happen?  It's simple, here is a similar control (Media.cs) placed in a class library (WebControls.csproj):

using System;
using System.Web;
using System.Web.UI;
using System.Web.UI.HtmlControls;
using System.Web.UI.WebControls;

namespace Jampad.Web.Controls
{
    [ToolboxData("<:Media runat=\"server\"></:Media>")]
    public class Media : CompositeControl
    {
        private Repeater repeater;

        public Media( ) {
        }

        private Object dataSource;

        public Object DataSource {
            get { return dataSource; }
            set { dataSource = value; }
        }

        protected override void CreateChildControls( ) {
            HtmlGenericControl div = new HtmlGenericControl("div");
            div.Attributes.Add("class", "media-list");

            try {
                repeater = new Repeater( );
                repeater.DataSource = this.DataSource;
                repeater.ItemTemplate = new MediaTemplate(ListItemType.Item);
                repeater.HeaderTemplate = new MediaTemplate(ListItemType.Header);
                repeater.FooterTemplate = new MediaTemplate(ListItemType.Footer);
                div.Controls.Add(repeater);
                repeater.DataBind( );
            }
            catch (Exception ex) {
                Literal error = new Literal( );
                error.Text = "<span class=\"error-message\">" + ex.Message + "</a>";
                div.Controls.Add(error);
            }

            this.Controls.Add(div);
            base.CreateChildControls( );
        }
    }
}

Notice the use of the repeater control.  This is the same control we use in ASP.NET as <asp:Repeater />.  Since this is .NET, we can use it programmatically to create our own powerful controls.  Also notice the various templates that are being set on the Repeater.  These are the same templates you would set declaratively in an ASPX page.  In this case, I'm programmatically assigning to these templates an instance of MediaTemplate (in MediaTemplate.cs).  This MediaTemplate.cs is just another file thrown in a class library, in our case the same WebControls.csproj, though since it's just a class, it could be in a different assembly and namespace altogether. Here's what the MediaTemplate.cs looks like:

using System;
using System.Collections.Generic;
using System.Text;
using System.Web.UI.WebControls;
using System.Web.UI;

namespace Jampad.Web.Controls
{
    internal class MediaTemplate : ITemplate
   {
        ListItemType type = new ListItemType( );

        public MediaTemplate(ListItemType type) {
            this.type = type;
        }

        public void InstantiateIn(Control container) {
            Literal lit = new Literal( );
            switch(type) {
                case ListItemType.Header:
                    break;

                case ListItemType.Item:
                    lit.DataBinding += new EventHandler(delegate(Object sender, System.EventArgs ea) {
                        Literal literal = (Literal)sender;
                        RepeaterItem item = (RepeaterItem)literal.NamingContainer;
                        literal.Text += String.Format("<div class=\"media-item\">\n");
                        literal.Text += String.Format("  <div class=\"media-item-inner\">\n");
                        literal.Text += String.Format("    <a href=\"\"><img src=\"\" alt=\"Media\" class=\"media-thumb\" /></a>\n", (String)DataBinder.Eval(item.DataItem, "mediaPath"), (String)DataBinder.Eval(item.DataItem, "thumbPath"));
                        literal.Text += String.Format("  </div>\n");
                        literal.Text += String.Format("  <div class=\"media-item-bottom\"></div>\n");
                        literal.Text += String.Format("</div>\n\n");
                    });
                    break;

                case ListItemType.AlternatingItem:
                    break;

                case ListItemType.Footer:
                    break;
            }
            container.Controls.Add(lit);
        }
    }
}


Simply compile those to together and you're set.  You can even embed (hopefully tiny) images in your project to make things even more seamless.  Using this simple pattern, I've created all kinds of things.  You can see a real example of this, including image embedding, in my SQL Feed Framework (formerly known as Data Feed Framework).  It's InfoBlock controls follow this same pattern.  For much better examples, whip out reflector and start digging around the System.Web namespaces.

It's actually rather astonishing to learn of some of the attituted some developers have about custom controls. When I was one of the editors for an ASP.NET 2.0 exam last year, I noticed one of the questions ask which type of control was "harder" to create. The answers were something like "User Control", "Custom Control", and a few others. They were looking for the answer "Custom Control". Since "harder" is not only a relative term, but also a subjective and an abstract one, the question had no actual meaning. Custom controls aren't "harder" than user controls.

#6  Most developers I worked with in 2007 had never heard of an O/R mapper.

Why do most developers still absolutely insist on wasting their time writing a chain of SqlConnection, SqlCommand, and SqlDataAdapter?  Perhaps it's just an addiction to being busy instead of actually being productive that causes this.  I don't know.  I would, however, expect these developers have to have some curiosity that there may be an easier way.  ADO.NET is awesome stuff and it is the foundation for all .NET O/R mappers, but if I'm not throwing around 1,000,000 records at a time with SqlBulkCopy, I'm not interested in working with ADO.NET directly.  We need to have a system that allows us to get what we want instead of forcing us to screw about with low-level mechanics.  It's no secret that I'm a huge supporter of Frans Bouma's work with LLBLGen Pro and I also use LINQ in most of my .NET 3.5 applications.  For a corporate .NET 2.0 project, there's absolutely no excuse to not pay the $300 for LLBLGen Pro.  Managers!  Open the wallets!  It will save you money.

However, it's not always about the money.  Even if the developers knew about O/R mapping, and the company isn't from in a poverty-stricken 3rd world country, sometimes extreme pride, lack of personal integrity, and political alignment can destroy any chance of being productive.  A long time ago I worked at a company where I thought I would be productive.  Five or so weeks into the design phase of the project, we received a politically-focused project manager as big brother.  He was absolutely against the use of any modern technology and despised the idea of an O/R mapper.  He instead told us that we were to write a stored procedure for every possible piece of interaction that would happen.  He also wanted us to use Microsoft's data application block to access the stored procedures.  At one point he said that this was their O/R mapper, showing that he had no idea what an O/R mapper was.

A few days after his reign had started, I took an hour or so to write up a 12 page review document covering various aspects of LLBLGen Pro and how they would work on the project.  I thought it was a very convincing document.  In fact, one guy looked at it and was convinced that I took it from the LLBLGen web site.  The project manager, however, was beginning to be annoyed (this is not uncommon with me and old-school project managers!)  The project manager decided to call together a panel of his "best" offshore developers and put me in what basically amounted to be a doctoral defense.  Prior to the meeting I sent out my "dissertation" and asked everyone to read it before they arrived at the meeting so that they would be prepared for the discussion.  When it was time for the meeting, I was told to sit at one side of a large meeting table and the project manager and his team sat at the other.  Then the disaster began.  First off, not one single person on that team had read my document.  Secondly, for the next 45 minutes they asked me basic questions that the document would have answered.  Even after they admitted that I had answered all of their concerns to their satisfaction and being told by their team that LLBLGen Pro was obviously a very productive tool, they reached the conclusion that they still weren't going to use it.  It was a waste of my time and I still want those 45 minutes of my life back.

What was really interesting about my defense was the developer's code.  In the meeting, the developers had showed me their [virtually unreadable, anti-.NET framework design guidelines, inefficient, insecure] .NET project code and I was shocked to see how much time they wasted on writing the same stuff over and over and over again.  When they showed me their stored procedures, I about passed out.  It's a wonder how any of their systems run.  They were overridden with crazy dynamic SQL and cursors.  They even had most of the business logic in the data access tier.  The concept of N-tier architecture was not something that they understood at all.  I think that's the point where I gave up on my defense.  If a developer doesn't even understand the critical need for N-layer and N-tier architecture, there's just no way they will be able to understand the need for an O/R mapper.  It's actually one of the fastest way to find a coder hiding amongst professionals.  Their SQL/ADO.NET code was also obviously not strongly-typed.  This was one of the core points of an O/R mapper and these developers could not understand that.  They could not see the benefit of having an entity called Person in place of the string "Persno" (deliberate misspelling).

This project didn't really take off at all, but for what parts I was involved, I used the next best thing to an O/R mapper: a strongly-typed data-set.  Read this carefully: there is no shame in using a strongly-typed data set if you don't have an O/R mapper.  They are no where near as powerful, but they are often good enough to efficiently build your prototypes so that the presentation layer can be built   You can replace the data access components later.

The training of developers in the use of LLBLGen Pro and LINQ O/R mapping was one of the main reasons I publicly released both my Minima Blog Engine and my Minima 3.5 Blog Engine source code to the public in 2007.  You are free to use these examples in your own training as you see fit. 

For more information and for some example of using an O/R mapper, please some of my resources below:

#7  You don't need to use SOAP for everything.

This is one of the reasons I wrote my XmlHttp Service Interop series in March and May 2007.  Sometimes straight up HTTP calls are good enough.  They are quick, simple, and light-weight.  If you want more structure, you can simply use XML serialization to customize the smallest possible data format you can think of.  No SOAP envelope required.

Here are the parts to my series:

Also keep in mind that you don't need to keep JSON to JavaScript.  It's a beautiful format that could easily be an amazing structured replacement for flat CSV files.  RESTful interfaces using GET or POST with HTTP headers are also a great way to communication using very little bandwidth.  My AJAX applications rely heavily on these techniques, but I've also used them for some behind the scenes work as well.

One great example of how you can use RESTful services is by looking at the interface of the ESV Bible Web Service V2. In November 2007, I wrote a .NET 3.5-based framework to abstract the REST calls from the developer. By looking at my freely available source code, you can see how I'm interacting with the very light-weight REST service.

#8  A poor implementation of even the most beautiful database model can lead to a disaster.

For more information on this topic, see my October 2007 post entitled "SQL Server Database Model Optimization for Developers". Here is an abstract:

It's my assessment that most developers have no idea how much a poor database model implementation or implementation by a DBA unfamiliar with the data semantics can affect a system. Furthermore, most developers whom I have worked don't really understand the internals of SQL Server enough to be able to make informed decisions for their project. Suggestions concerning the internals of SQL Server are often met with extremely reluctance from developers.

#9  Most web developers have no idea how to build a proper XHTML structure.

XHTML is not HTML and shouldn't be be treated like it is.  While HTML is a presentation format, XHTML is a structure format.  You can use HTML for visual formatting, but XHTML simply defines a structure.  In August 2007, I wrote an article entitled "Coders and Professional Programmers" and in it I discussed some of the differences between coders who have no clue what's going on, but who overwhelm the technology world and rare programming professionals.  I didn't go into to many specifics in this article, but one of the things I had in mind was the severe lack of XHTML knowledge that coders have.  What's profound is that XHTML is probably the single most basic web development topic in existence, yet people just have no idea how to use it properly.

When you come at a web project and you have defined your user experience, you need to materialize that definition.  You do not go about this by dragging and dropping a bunch of visual elements on a screen and nesting 4 tables.  Well, if you do, then you're probably a coder, not a professional.  Building your interface structure is actually rather similar to building a database model in that you need to define your entities and semantic meaning.  So, when you look at the top of you landing page, you need to avoid thinking "this is a 4em piece of italic black text" and simply say that this it a heading.  What type of heading?  If it's he most important heading, then it would probably internally translate to a h1 element.  In the same way, if you have text on your screen, you should avoid doing this:

Lorem ipsum dolor sit amet.<br/>

Mauris nonummy, risus in fermentum.<br/>

By doing this, you are completely destroying any possibility of text formatting.  You have also fallen into he world of telling a system how to do it's job.  Just think about how we work with XML.  Do you go in and tell the system how to parse the information or how to scan for a certain element?  No, the entire point of abstraction is so that we can get closer and closer to telling the system what we want instead of telling it how to do it's job.  In XML, we simply state an XPath and we're done.  With XHTML, we don't want to say "put text here, break, put more text here, and break again".  You could think of the above HTML code as a "procedural structure".  What if we used a more object-oriented model?  In an object-oriented model, we focus on the semantics of the entities and this is exactly how we are to design in XHTML.  A proper way to declare our text would be like this:

<p>Lorem ipsum dolor sit amet.</p>

<p>Mauris nonummy, risus in fermentum.</p>

Now, instead of telling the system how to do it's job we state that we want two paragraphs.  Done.  By focusing on the semantic representation we are closer to focusing on the purpose of the application and letting the system do whatever it does best.  Furthermore, since XHTML is a structural format and not a presentation format, we have another technology for presentation, namely, CSS.  With our new XHTML sturcture we can simply attach a default.css document to every page on the web site and in a centralized manner state the following to format every single document on our web site in an instant.  You couldn't beat the power of that with quantum parallelism (well... maybe).

p {
font-family: Georgia, times new roman, serif;
font-size: 90%;
line-height: 1.1em;
}

Every single item in XHTML has some purpose and a set of guidelines attached to it to allow you to choose the correct element to match your semantic meaning.  This is a long and fancy way of saying, don't use divs for everything!  A div is a containment unit, not something to hold every single thing in all the world just so that you can use more CSS in a misguided attempt to look modern.  Use divs for containment, not for giving IDs to text.  So, what should you use?  Whatever closely matches your needs.  For example, when you are adding a single horizontal menu or tab list to your user experience, you should avoid using a bloated HTML table, which will force longer load times and basically kill your chanced for mobile support.  You should also avoid throwing together a list of divs in a parent div, which provides to semantic meaning at all.  This would be like declaring all your .NET objects as Object and using reflection everything you wanted to access anything.  Rather, you want to ask yourself "to what data structure does this object most closely map?"  In this case, it's basically a simple list or a collection.  What XHTML element most closely brings out this item's meaning?  Depending if the list is an ordered list or an unordered list, your XHTML element will be either a <ul/> or an <ol/>.

What if you wanted to show a column of images where the image metadata (i.e. title, date, description) was to the right of each image image?  Do we use a bloated table?  No, your load time will go through the roof, your DOM interaction will be come prohibitively complex, and your mobile support will be shot.  Do we use a series of divs with CSS floating?  No, again, this in no way reflects any semantic relation of the entity.   To what data structure does this closely maps?  Think about it.  It's a series of "things" where each thing has two sub-components (image and data).  This is a dictionary.  In XHTML, the closest element you have to a dictionary is a <dl/>.  The dl contains an alternating series of data terms (<dt/>) and a data definitions (<dd/>) to allow you to present your data in a way that makes sense.  You don't have a full semantic representation as you would with a "imagelist" element, but you are accurately representing the fact that this is a dictionary.

After you have defined your complete structure, you may then start to modify the elements using CSS.  Your headings and titles (mapped to h1,h2,h3,h4,h5,6) will be formatted according to their requirements as will all your paragraphs (mapped to p).  You will also modify your ol, ul, and dl lists to match your visual requirements.  Your <ul/> or <ol/> lists will probably have something like the following:

ul {
list-style-type: none;
padding: 0;
}

ul li {
display: inline;
/* or float: left; depending on how you want to format them. */
/* Floating would keep the list items as block elements thereby */
/* allowing more padding and margin tweaking. */
}

With your dl may have something similar to this:

dl {
width: 300px;
}

dl dt,
dl dd {
float: left;
}

dl dt {
clear: both;
}

This technique of semantically mapping entities to visual elements is neither new or isolated to web development.  The Windows Presentation Foundation (WPF) allows a similar technique.  You can define a simple ListBox which contains your raw data for your series of elements (e.g. icon list, list of names, menu).  Then you apply a Style to enhance the user experience.  Elements in XHTML and elements in XAML don't really have their own look and feel, but are, rather, structural entities which allow you to define the element's semantic representation to reality to which later the look and feel can be applied.

#10  CSS is not simply a technology to allow simple font size, color, and style changes.

It's a very powerful technology which allows us to efficiently create powerful web solutions.  It can allow help us preserve ASP.NET caching and help us to avoid page recompilation.  Furthermore, a proper CSS architecture can bring us media specific CSS to enable us to efficiently customize our web pages for print and for mobile devices.  As if that weren't enough, CSS themes can allow us to quickly deploy branded web sites.

Unfortunately, however, CSS architecture is not something known by too many web developers, especially ASP.NET developers.  Back in May 2007, I wrote an article on my blog entitled "CSS Architecture Overview", so I won't go into any more details here.

Those were the top 10 things in 2007 which I found developers to be the most ignorant.  It's not really an exhaustive list and doesn't cover things like the lack of understanding how MSIL maps to reality, how JavaScript deals with variable scope, or how you may not need to waste company resources on SQL Server 2005 Standard when express may work just fine for your requirements.  Some of these topics are closer to my core speciality than others, but each of them represents an incredibly important segment of technology which web solution architects must take into account.  Looking back over the list of articles I wrote the open source projects I released and the various clients and developers I worked with in 2007, this has easily been by busiest year ever.  This won't be stopping in 2008.  Hopefully an increased knowledge base and an stronger adherence to best practices will turn more and more coders into professional in 2008 and beyond.

To learn more about some of these topics or even related ones, be sure to walk around my blog a bit and to subscribe to my my RSS feed.

SQL Server Database Model Optimization for Developers

It's my assessment that most developers have no idea how much a poor database model implementation or implementation by a DBA unfamiliar with the data semantics can affect a system. Furthermore, most developers whom I have worked don't really understand the internals of SQL Server enough to be able to make informed decisions for their project. Suggestions concerning the internals of SQL Server are often met with extremely reluctance from developers. This is unfortunate, because it is only when we understand a system’s mechanics that we can fully optimize our usage of it. Some familiar with the history of Physics will recall the story of when Einstein "broke" space by his special theory of relativity. Before Einstein was able to "fix" space, he had to spend nearly a deciding trying to decipher how space worked. Thus was born the general theory of relativity.

It's not a universal rule, but I would have to say that the database model is the heart of any technical solution. Yet, in reality, the database implementation often seems to be one of the biggest bottle necks of a solution. Sometimes it’s a matter of poorly maintained databases, but from my experience it seems to mostly be a matter of a poorly designed implementation. More times than not, the SQL Server database model implementation has been designed by someone with either only a cursory knowledge of database modeling or by someone who is an expert in MySQL or Oracle, not SQL Server.

Database modeling does not end with defining entities and their respective relations. Rather, it extends completely into the implementation. What good is an amazing plan, if it is going to be implemented poorly? The implementation phase to which I am referring comes before the actual implementation, yet after what most people refer to as “modeling”. It’s actually not even a phase with a temporal separation, but is rather a phase that requires continual thought and input from the start about the semantic understanding of the real world solution. This phase includes things like data-type management, index design, and security. This phase is the job of the resident architect or senior level developer, not the job of the DBA. It needs to be overseen by someone who deeply understanding both SQL Server and the semantics of the solution. Most of the time the DBA needs to completely stay away from the original data model and focus more on the server specific tasks like monitoring backups and tweaking existing data models based on the specifications that an architect has previously documented. Having said this, I often find that it's not only not the architect or senior developer optimizing a project, often nobody even cares!

Developers need to start understanding that designing a proper data model based on the real world representation includes minimizing data usage, optimizing performance, and increasing usability (for the solution’s O/R mapping). These are not jobs for a DBA. Someone with close knowledge to the project needs to make these decisions. More times than not, a DBA simply does not have the understanding of the project required to make these important decisions. They should stay away from the requirements of the system, leaving this to the architect and senior-level developers. Despite what many well intentioned DBAs think, they do not own the data. They are merely assistants to the development team leaders.

Let's start off by looking at storage optimization. Developers should be able to look at their data model and notice certain somewhat obvious flaws. For example, suppose you have a table with a few million rows with each row containing multiple char(36) columns (a guid), two datatime columns (8-bytes each), six int columns (4-bytes each)-- two of which are foreign keys to reference/look-up/enumeration tables, and an int (4-bytes) column which is also table's primary key and identity. To optimize this table, you absolutely must know the semantics of the solution. For example, if we don't care about recording the seconds of a time, then the two datetime columns should be set to be smalldatetime columns (4-bytes each). Also, how many possible values could there be in the non-foreign key int columns? Under 32,727? If so, then these could easily be smallint columns (2-bytes each).

What about the primary key? The architect or senior-level developer should have a fairly good estimate on how large a table will ever become. If this table is simply a list of categories, then what should be do? Often the common response is to convert it to a tinyint (1-byte). In reality, however, we shouldn't even care about size of the primary key. It’s completely negligible; even if there were only 100 rows, switching it to a tinyint could cause all kinds of problems. The table would only be marginally smaller and all your O/R mappers are now using an Int16 instead of an Int32, which could potentially cause casting problems in your solution. However, if this table tracks transactions, then perhaps you need to make it a bigint (8-bytes). In this case, you need to put force a strong effort to making sure that you have optimized this table down to its absolutely raw core as those bigint values can add up.

Now, what about the foreign keys? If they are simply for reference, then the range of values probably isn't really that wide. Perhaps there are only 5 different values against which the data is to be constrained. In this case, the column could probably be a tinyint (1-byte). Since a primary key and foreign key must be the same data type, the primary key must also become a tinyint (1-byte). This small change alone could cut your database size by a few hundred MB. It wasn't just the foreign key table that dropped in size, but the references between the two tables are now smaller as well (-- I hope every now understand why you need to have a very good reason before you even think about using a bigint foreign key!) There's something else to notice here as well. Reference tables are very helpful for the developer to look at the raw data, but does there really need to be a constraint in the database? If the table simply contains an Id and Text column with only 8 possible values, then, while the table may be tremendously helpful for documentation purposes, you could potentially break the foreign key constraint and put the constraint logic in your application. However, keep in mind that this is for millions or possibility billions of rows. If the referencing table contains only a few thousand rows or if space doesn’t have a high opportunity cost, which may be the case if the solution is important enough to actually have that many rows in the first place, then this optimization could cause more problems than it solves. First off, your O/R mapper wouldn’t be able to detect the relation. Secondly, obviously, you wouldn’t have the database level constraint for applications not using the solution’s logic.

Another optimization that’s important is performance optimization. Sometimes a table will be used in many joins and will be used heavily by each of the CRUD (Create, Retrieve, Update, Delete) operations. Depending on how important the column is, you may be able to switch a varchar(10) to a char(10) . The column will allocate more space, but your operations may be more efficient. Also, try to avoid using variable length columns (varchar) as foreign keys. In fact, try to keep your keys as the smallest integer type you possibly can. This is both a space and performance optimization. It's also important to think very carefully about how the database will be accessed. Perhaps certain columns need extra indexes and others need less. Yes, less. Indexes are great for speeding up read access, but slow down insert operations. If you add too many indexes, your database inserts could run your system to a crawl and any index defragmentation could leave you with a painfully enormous transaction log or a non-functioning SQL Server.

This is exactly what happened to a company I worked with in 2005. Every Saturday night for several weeks in a row, the IT team would get an automated page from their monitoring service telling them that all their e-commerce web sites were down. Having received the phone call about 2AM, I looked into a few things and noticed that the transaction log had reached over 80GB for the 60GB database. Being a scientist who refuses fall into the post hoc ergo proctor hoc fallacy, I needed measurements and evidence. The first thing I did was write a stored procedure that would do some basic analysis on the transaction log by pulling data from the fn_dblog( ) function and doing a simple cube and save the results into a table for later review. Then I told them that the next time the problem occurred they were to run the stored procedure and call me the next Monday (a polite way of telling them that I’m sleeping at 2AM on Saturdays). Exactly one week later the same thing happened and the IT department ran the stored procedure as instructed (and, yes, waited to Monday to call me, for which I am grateful). Looking over the stored analysis data, I noticed that there were a tremendous number of operations on various table indexes. That gave me the evidence that I needed to look more closely at the indexes of each of the 5,000+ tables (yes, that’s four digits—now you know why I needed more information). After looking at the indexes, I realized that the database was implemented by someone who didn’t really understand the purpose of indexing and who probably had an itchy trigger finger on the Index Tuning Wizard. There were anywhere from 6 to 24 indexes on each table. This explained everything. When the weekly (Saturday at 2AM) SQL Server maintenance plan would run, each of the indexes were defragmented to clean up the work done by high volume of sales that occurred during the week. This, therefore, caused a very large number of index optimizations to occur. Each index defragmentation operation would be documented in the transaction log, filling the transaction log’s 80GB hard drive, thereby functionally disabling SQL Server.

In your index design, be sure to also optimize your index fill factors. Too full and you will cause a page split and bring your system to a crawl. Too empty and you're wasting space. Do not let a DBA do this. Every bit of optimization requires that a person deeply knowledgeable about the system to implement a complete database design. After the specifications have been written, then the DBA can get involved so that he or she can then run routine maintenance. It is for this reason that DBAs exist. For more information on the internals of SQL Server, see the book Inside SQL Server 2005: The Storage Engine by Kalen Delaney (see also Inside SQL Server 2000). This is a book which should be close to everyone who works with SQL Server at all times. Buy it. Read it. Internalized it.

There’s still more to database modeling. You want to also be sure to optimize for usability. Different O/R mapping solutions will have different specific guidelines, but some of the guidelines are rather general. One such guideline is fairly well known: use singular table names. It's so incredibly annoying to see code like "Entries entry = new Entries( );" The grammar just doesn't agree. Furthermore, LINQ automatically inflects certain tables. For example, a table called “BlogEntry” will be related to the LINQ entity “BlogEntry” as well as “BlogEntries” in the LINQ data context. Also, be sure to keep in mind that your O/R mapper may have special properties that you’ll want to work around. For example, if your O/R mapper creates an entity for every table and in each created entity there is a special "Code" property for internal O/R mapper use, then you want to make sure to avoid having any columns named "Code". O/R mappers will often work around this for you, but "p.Code_" can get really confusing. You should also consider using Latin-style database naming (where you prefix each column with its table name—so-named because Latin words are typically inflected with their sentence meaning thereby allowing isolated subject/object identification), this is not only a world of help in straight SQL joins, but just think about how Intellisense works: alphabetically. If you prefix all your columns with the table name, then when you hit Ctrl-J you’ll have all your mapped properties grouped together. Otherwise, you’ll see the "Id" property and it could be be 10-15 internal O/R mapper properties before you find the next actual entity column. Doing this prefixing also usually alleviates conflicts with existing O/R mapper internal properties. This isn't quite as important for LINQ, but not having table-name prefixed columns in LLBLGen Pro can lead to some headaches.

Clearly, there's more to think about for database design than the entity relation diagramming we learn in Databases 101. It should also be clear that your design will become more optimized the more specific it becomes. A data model designed for any database server probably won’t be as efficient as a well thought out design for your specific database server and for your specific solution requirements. Just keep in mind that your database model is your applications view of reality and is often the heart of the system and it therefore should be optimized for space, speed, and usability. If you don't do this, you could be wasting many gigabytes of space (and therefore also hundreds of dollars of needless backups), have a painfully inefficient system, and have a hard time figuring out how to access the data in your application.

Real World HttpModule Examples

Back when I first discovered HttpHandlers I remember being ecstatic that the control that I thought I lost when moving from PHP to ASP.NET was finally returned, but when I first discovered HttpModules I remember almost passing out at the level of power and control you get over the ASP.NET request pipeline.  Since then, I've used HttpHandlers, HttpHandlerFactories, and HttpModules in many projects, but I'm noticing that while many people have heard of them, many have no idea what you would ever use them for.  So, I would like to give a few examples.

The first example is really simple.  On my blog, I didn't ant anyone to alias my web site or access it in any other way than by going to www.netfxharmonics.com.  Since HttpModules allow you to plug into the ASP.NET request pipeline, I was able to quick write an HttpModule to do exactly what I wanted:

public class FixDomainHttpModule : IHttpModule
{
    public void Dispose( ) {
    }

    public void Init(HttpApplication context) {
        context.BeginRequest += delegate(Object sender, EventArgs ea) {
            HttpApplication ha = sender as HttpApplication;
            String absoluteUrl = ha.Context.Request.Url.ToString( ).ToLower( );
            if (ha != null) {
                if (MinimaConfiguration.ForceSpecifiedDomain) {
                    if (!absoluteUrl.StartsWith(MinimaConfiguration.Domain.ToLower( ))) {
                        context.Response.Redirect(MinimaConfiguration.Domain);
                    }
                }
            }
        };
    }
}

...with this web.config:

<httpModules>
  <add name="FixDomainHttpModule" type="Minima.Web.HttpExtensions.FixDomainHttpModule" />
</httpModules>

By doing this I don't need to put a check in each page or in my MasterPage.  Since HttpModules are for the entire application, even URLs accessing my images are forced to be done by www.netfxharmonics.com.

Another example is a simple authentication system for a system I was working on a while back.  The application allowed anyone logged into active directory to access it's resources, but only certain people logged into active directory would be authorized to the use application (i.e. anyone could access images and CSS, but only a few people could use the system).  Knowing that the .NET framework is the model for all .NET development, I looked at the machine's web.config to see how ASP.NET implemented its windows and form authentication.  As it turns out, it does so by HttpModules.  So, I figured that the best way to solve this problem was by creating an HttpModule, not by throwing a hack into each of my WebForms or my MasterPages.  Furthermore, since ASP.NET uses the web.config for its configuration, including authentication configuration, I wanted to allow configuration of my authentication module to be via the web.config.  The general way I wanted to configure my HttpModule would be by a custom configuration section like this:

<Jampad>
    <Security RegistrationPage="~/Pages/Register.aspx" />
</Jampad>

The code for the HttpModule was extremely simple and required only a few minutes to throw together.  If the page being accessed is a WebForm and is not the RegistrationPage set in web.config, then the system's Person table is checked to see if the user logged into the machine has an account in the application.  If not, then there is a redirect to the RegistrationPage.  Simple.  Imagine how insane that would have been if you wanted to test for security on each page.

public class JampadSecurityModule : IHttpModule
{
    public void Dispose( ) {
    }

    public void Init(HttpApplication context) {
        context.BeginRequest += delegate(Object sender, EventArgs ea) {
            HttpApplication ha = sender as HttpApplication;

            if (ha != null) {
                CheckSecurity(context);
            }
        };
    }

    private void CheckSecurity(HttpApplication context) {
        SecurityConfigSection cs = (SecurityConfigSection)ConfigurationManager.GetSection("Jampad/Security");

        if (String.IsNullOrEmpty(cs.Security.RegistrationPage)) {
            throw new SecurityException("Security RegistrationPage is required.");
        }

        if (cs == null) {
            return;
        }

        if (!context.Request.Url.AbsoluteUri.Contains(cs.Security.RegistrationPage) &&
            context.Request.Url.AbsoluteUri.EndsWith(".aspx")
            ) {
            PersonCollection pc = new PersonCollection( );
            pc.GetMulti(new PredicateExpression(PersonFields.NTLogin==ActiveDirectoryFacade.NTUserName));

            if(pc.Count < 1){
                context.Response.Redirect(cs.Security.RegistrationPage);
            }
        }
    }
}

Again, plugging this into the web.config makes everything automatically happen:

<httpModules>
    <add name="JampadSecurityModule " type="Jampad.Security.JampadSecurityModule" />
</httpModules>

Recently I had to reuse my implementation in an environment that would not allow me to use LLBLGen Pro, so I had to rewrite the above 2 lines of LLBLGen Pro code into a series of Strongly-Typed DataSet tricks.  This implementation also had a LoginPage and an AccessDeniedPage in the custom configuration section, but other than that it was the same idea.  You could actually take the idea further by checking if the person is currently authenticated and if they aren't do a check on the Person table.  If they have access to the application, then PersonLastLoginTime column to the current time.  You could do many things with this implementation that would be rather crazy to do in a WebForm or MasterPage.

Another example of an HttpModule would be my custom access control system I built into my blog.  I'm not going to paste the code here as it's just the same idea as the other examples, but I will explain the concept.  Basically I created a series of tables in my blog's SQL Server database that held information on access control.  In the Access table I put columns such as AccessDirection (allow, deny), AccessType (IP Address, UserAgent, HTTP Referral), AccessText, AccessMessage, and AccessRedirect.  My HTTPModule would filter every ASP.NET request through this table to figure out what to do with it.  For example, I could block an IP address by creating a table record for 'deny', 'IP Address', '10.2.1.9', 'Your access has been denied.', NULL.  Immediately upon inserting the row, that IP address is blocked.  I could also block certain UserAgents (actually this was the original point of this HttpModule--bots these days have little respect for the robots.txt file).  I could also block requests that were from a different web site.  This would allow me to stop people from leaching images off my web site for use on their own.  With a simple HttpModule I was able to do all this in about an hour.  By the way, one record I was tempted to create was the following: 'deny', 'UserAgent', 'MSIE', NULL, 'http://www.getfirefox.com/'.  But I didn't :)

Now when would you use an HttpModule versus an HttpHandler?  Well, just think about what the difference is.  An HttpHandler handles a specific address pattern for a specific set of HTTP verbs, while every request in an application goes through an HttpModule.  So, if you wanted to have an image creator at /ImageCreator.imgx, then you need to register .imgx to IIS and then register your image creation HttpHandler in your web.config to handle that address (in case you forgot, web browsers care about the Content-Type, not the file extension.  In this example, your HttpHandler would set the Content-Type as 'image/png' or whatever your image type is.  That's how a web browser will know what to do with a file.  It has nothing to do with the file extensions; that's just for IIS.)  On the other hand, if you wanted to block all traffic from a specific web site, then you would create an HttpModule, becayse HttpModules handle all traffic on an application.  So, if you just remember this fundamental difference in purpose between the two, then shouldn't have my problems in the future.

Simplified Universal HttpHandlerFactory Technique

A few months ago I wrote about the Universal HttpHandlerFactory Technique where you have one HttpHandlerFactory that all your ASP.NET processing goes through and then in that HttpHandlerFactory you then choose what HttpHandler is returned based upon what directory or file is accessed.  I still like this approach for certain scenarios, but my blog got to the point where I was managing 11 different HttpHandlers for about 25 different path and file patterns.  So, it was time to simplify.

What I came up with was basically my go to card for everything: put it in SQL Server.  From there I just checked the URL being accessed against the patterns in a database table and then looked up what HttpHandler to use for that particular request.  Then, of course, I cached the HttpHandler for future file access.

Here are my SQL Server tables (yes, I do all my design in T-SQL-- SQL GUI tricks are for kids):

create table dbo.HttpHandlerMatchType  (
HttpHandlerMatchTypeId int primary key identity not null,
HttpHandlerMatchTypeName varchar(200) not null
) 

insert HttpHandlerMatchType select 'Contains'
insert HttpHandlerMatchType select 'Starts With'
insert HttpHandlerMatchType select 'Ends With'
insert HttpHandlerMatchType select 'Default'

create table dbo.HttpHandler (
HttpHandlerId int primary key identity not null,
HttpHandlerMatchTypeId int foreign key references HttpHandlerMatchType(HttpHandlerMatchTypeId),
HttpHandlerName varchar(300),
HttpHandlerMatchText varchar(200)
) 

Here is the data in the HttpHandler table:

 HttpHandler Table

Looking at this image and the SQL Server code, you can see that I'm matching the URL in different ways.  Sometimes I want to use a certain HttpHandler if the URL simply contains the text in the HttpHandlerMatchText and other times I'll want to see if it the URL ends with it.  I included an option for "starts with" as well, which I may use in the future.  This will allow me to have better control of how paths and files are processed.  Also, notice that one is "base".  This is a special one that basically means that the following HttpHandler will be used (keep in mind we are in a class that inherits from the PageHandlerFactory class-- please see my original blog entry):

base.GetHandler(context, requestType, url, pathTranslated);

Now in my HttpHandlerFactory's GetHandler method I'm doing something like this (also note how LLBLGen Pro helps me simply my database access):

HttpHandlerCollection hc = new HttpHandlerCollection( );
hc.GetMulti(new PredicateExpression(HttpHandlerFields.HttpHandlerMatchTypeId != 4));

IHttpHandler hh = null;
foreach (HttpHandlerEntity h in hc) {
    hh = MatchHttpHandler(absoluteUrl, h.HttpHandlerName.ToLower( ), h.HttpHandlerMatchTypeId, h.HttpHandlerMatchText.ToLower( ));
    if (hh != null) {
        break;
    }
}

This is basically just going to look through all the HttpHandlers in the table which are not the "default" handler (which will be used when there is no match).  The MatchHttpHandler method basically just passes the buck to another method depending of whether I'm matching the URL based on Contains, StartsWith, or EndsWith.

private IHttpHandler MatchHttpHandler(String url, String name, Int32 typeId, String text) {
    IHttpHandler h = null;
    switch (typeId) {
        case 1:
            h = MatchContains(url, name, text);
            break;

        case 2:
            h = MatchStartsWith(url, name, text);
            break;

        case 3:
            h = MatchEndsWith(url, name, text);
            break;

        default:
            throw new ArgumentOutOfRangeException("Invalid HttpHandlerTypeId");
    }

    return h;
}

Here is an example of one of these methods; the others are similiar:

private IHttpHandler MatchContains(String url, String name, String text) {
    if (url.Contains(text)) {
        return GetHttpHandler(name);
    }
    return null;
}

As you can see, it's nothing fancy.  The last method in the chain is the GetHttpHandler, which is basically a factory method that converts text into an HttpHandler object:

private IHttpHandler GetHttpHandler(String text) {
    switch (text) {
        case "base":
            return new MinimaBaseHttpHandler( );

        case "defaulthttphandler":
            return new DefaultHttpHandler( );

        case "minimaapihttphandler":
            return new MinimaApiHttpHandler( );

        case "minimafeedhttphandler":
            return new MinimaFeedHttpHandler( );

        case "minimafileprocessorhttphandler":
            return new MinimaFileProcessorHttpHandler( );

        case "minimapingbackhttphandler":
            return new MinimaPingbackHttpHandler( );

        case "minimasitemaphttphandler":
            return new MinimaSiteMapHttpHandler( );

        case "minimatrackbackhttphandler":
            return new MinimaTrackBackHttpHandler( );

        case "minimaurlprocessinghttphandler":
            return new MinimaUrlProcessingHttpHandler( );

        case "projectsyntaxhighlighterhttphandler":
            return new ProjectSyntaxHighlighterHttpHandler( );

        case "xmlrpcapi":
            return new XmlRpcApi( );

        default:
            throw new ArgumentOutOfRangeException("Unknown HttpHandler in HttpHandlerMatchText");
    }
}

There is one thing in this that stands out:

case "base":
    return new MinimaBaseHttpHandler( );

If the "base" is simply a call to base.GetHandler, then why am I doing this?  Honestly, I just didn't want to pass around all the required parameters for that method call.  So, to make things a bit more elegant I created a blank HttpHandler called MinimaBaseHttpHandler that did absolutely nothing.  After the original iteration through the HttpHandlerCollection is finished, I then do the following (it's just a trick to the logic more consistent):

if (hh is MinimaBaseHttpHandler) {
    return base.GetHandler(context, requestType, url, pathTranslated);
}
else if(hh != null){
    if (!handlerCache.ContainsKey(absoluteUrl)) {
        handlerCache.Add(absoluteUrl, hh);
    }
    return hh;
}

One thing I would like to mention is something that that sample alludes to: I'm not constantly having everything run through this process, but I am caching the URL to HttpHandler mappings.  To accomplish this, I simply setup a simple cached dictionary to map URLs to their appropriate HttpHandlers:

static Dictionary<String, IHttpHandler> handlerCache = new Dictionary<String, IHttpHandler>( );

Before ANY of the above happens, I check to see if the URL to HttpHandler mapping exists and if it does, then return it:

if (handlerCache.ContainsKey(absoluteUrl)) {
    return handlerCache[absoluteUrl];
}

This way, URLs can be processed without having to touch the database (of course ASP.NET caching helps with performance as well).

Related Links

All my work in this area has been rolled-up into my Themelia ASP.NET Framework, which is freely available on CodePlex. See that for another example of this technique.

HnD Customer Support and Forum System

If you are curious about LLBLGen Pro or have been using it for a while and want to further more skills with it, then you simply must check out Frans Bouma's HnD (Help and Discuss), an ASP.NET-based Customer Support and Forum System.  It was actually released back in December 2006, but I only just now got around to checking it out and I have to say it's really, really nice.  But what else would you expect from the person who created the world's most powerful database abstraction system (LLBLGen Pro)?

You can actually see an example of HnD by going to LLBLGen Pro's support forum.  I've been using their support forum for a while now and I could seriously tell a difference in usability from their own system.  One feature of HnD I definitely want to mention is something that many forum's don't have, but all need: when replying, it's critical to see the message you are replying to as you are typing (this is one of the reasons I switch to Gmail from Yahoo!)  Frans thought of this and HnD has that ability.  HnD even allows for attachments and has an attachment approval system for moderators, which is really nice.  The feature list goes on and on.

Not only is the end product nice and easy to use, it's released with full source code (just as LLBLGen Pro is buy, when you buy it).  However, unlike the commercial product LLBLGen Pro, HnD is released under the GPLv2 license, so... we can have all kinds of fun messing with it.  From my perspective, this is one of the greatest things about it and is exactly why I released Minima (a minimalistic ASP.NET 2.0 blog engine built using LLBLGen Pro).  Simple, to the point, source is provided, and the source is actually easy to navigate.

The solution is split into an SD.HnD.Utility project which contains very base level functionality (much like Minima's General project), it includes an SD.HnD.DAL project which contains the LLBLGen Pro DAL (much like Minima's MinimaDAL project), it includes an SD.HnD.BL project which contains "business logic" for Hnd (much like Minima's MinimaLibrary project), and finally it includes the web site.

This is an incredible project for anyone who wants to strengthen their LLBLGen Pro skills.  I can tell you that it has personally already helped me with my own LLBLGen Pro skills.  So, whether you want a really nice ASP.NET-based forum system, want to learn more about ASP.NET data binding, want to learn LLBLGen Pro for the first time, or just want to enhance your LLBLGen Pro skills, you should seriously considering nabbing the source code for HnD.

As a postscript, if you are unfamiliar with Frans Bouma's work, then should check out his blog at the below link.  His work is always great and he definitely deserves his MVP many times over.

Related Links

1 2 3

Powered by
Python / Django / Elasticsearch / Azure / Nginx / CentOS 7

Mini-icons are part of the Silk Icons set of icons at famfamfam.com