2005 2006 2007 2008 2009 2010 2011 2015 2016 2017 aspnet azure csharp debugging elasticsearch exceptions firefox javascriptajax linux llblgen mongodb powershell projects python security services silverlight training videos wcf wpf xag xhtmlcss

SQL Server Database Model Optimization for Developers

It's my assessment that most developers have no idea how much a poor database model implementation or implementation by a DBA unfamiliar with the data semantics can affect a system. Furthermore, most developers whom I have worked don't really understand the internals of SQL Server enough to be able to make informed decisions for their project. Suggestions concerning the internals of SQL Server are often met with extremely reluctance from developers. This is unfortunate, because it is only when we understand a system’s mechanics that we can fully optimize our usage of it. Some familiar with the history of Physics will recall the story of when Einstein "broke" space by his special theory of relativity. Before Einstein was able to "fix" space, he had to spend nearly a deciding trying to decipher how space worked. Thus was born the general theory of relativity.

It's not a universal rule, but I would have to say that the database model is the heart of any technical solution. Yet, in reality, the database implementation often seems to be one of the biggest bottle necks of a solution. Sometimes it’s a matter of poorly maintained databases, but from my experience it seems to mostly be a matter of a poorly designed implementation. More times than not, the SQL Server database model implementation has been designed by someone with either only a cursory knowledge of database modeling or by someone who is an expert in MySQL or Oracle, not SQL Server.

Database modeling does not end with defining entities and their respective relations. Rather, it extends completely into the implementation. What good is an amazing plan, if it is going to be implemented poorly? The implementation phase to which I am referring comes before the actual implementation, yet after what most people refer to as “modeling”. It’s actually not even a phase with a temporal separation, but is rather a phase that requires continual thought and input from the start about the semantic understanding of the real world solution. This phase includes things like data-type management, index design, and security. This phase is the job of the resident architect or senior level developer, not the job of the DBA. It needs to be overseen by someone who deeply understanding both SQL Server and the semantics of the solution. Most of the time the DBA needs to completely stay away from the original data model and focus more on the server specific tasks like monitoring backups and tweaking existing data models based on the specifications that an architect has previously documented. Having said this, I often find that it's not only not the architect or senior developer optimizing a project, often nobody even cares!

Developers need to start understanding that designing a proper data model based on the real world representation includes minimizing data usage, optimizing performance, and increasing usability (for the solution’s O/R mapping). These are not jobs for a DBA. Someone with close knowledge to the project needs to make these decisions. More times than not, a DBA simply does not have the understanding of the project required to make these important decisions. They should stay away from the requirements of the system, leaving this to the architect and senior-level developers. Despite what many well intentioned DBAs think, they do not own the data. They are merely assistants to the development team leaders.

Let's start off by looking at storage optimization. Developers should be able to look at their data model and notice certain somewhat obvious flaws. For example, suppose you have a table with a few million rows with each row containing multiple char(36) columns (a guid), two datatime columns (8-bytes each), six int columns (4-bytes each)-- two of which are foreign keys to reference/look-up/enumeration tables, and an int (4-bytes) column which is also table's primary key and identity. To optimize this table, you absolutely must know the semantics of the solution. For example, if we don't care about recording the seconds of a time, then the two datetime columns should be set to be smalldatetime columns (4-bytes each). Also, how many possible values could there be in the non-foreign key int columns? Under 32,727? If so, then these could easily be smallint columns (2-bytes each).

What about the primary key? The architect or senior-level developer should have a fairly good estimate on how large a table will ever become. If this table is simply a list of categories, then what should be do? Often the common response is to convert it to a tinyint (1-byte). In reality, however, we shouldn't even care about size of the primary key. It’s completely negligible; even if there were only 100 rows, switching it to a tinyint could cause all kinds of problems. The table would only be marginally smaller and all your O/R mappers are now using an Int16 instead of an Int32, which could potentially cause casting problems in your solution. However, if this table tracks transactions, then perhaps you need to make it a bigint (8-bytes). In this case, you need to put force a strong effort to making sure that you have optimized this table down to its absolutely raw core as those bigint values can add up.

Now, what about the foreign keys? If they are simply for reference, then the range of values probably isn't really that wide. Perhaps there are only 5 different values against which the data is to be constrained. In this case, the column could probably be a tinyint (1-byte). Since a primary key and foreign key must be the same data type, the primary key must also become a tinyint (1-byte). This small change alone could cut your database size by a few hundred MB. It wasn't just the foreign key table that dropped in size, but the references between the two tables are now smaller as well (-- I hope every now understand why you need to have a very good reason before you even think about using a bigint foreign key!) There's something else to notice here as well. Reference tables are very helpful for the developer to look at the raw data, but does there really need to be a constraint in the database? If the table simply contains an Id and Text column with only 8 possible values, then, while the table may be tremendously helpful for documentation purposes, you could potentially break the foreign key constraint and put the constraint logic in your application. However, keep in mind that this is for millions or possibility billions of rows. If the referencing table contains only a few thousand rows or if space doesn’t have a high opportunity cost, which may be the case if the solution is important enough to actually have that many rows in the first place, then this optimization could cause more problems than it solves. First off, your O/R mapper wouldn’t be able to detect the relation. Secondly, obviously, you wouldn’t have the database level constraint for applications not using the solution’s logic.

Another optimization that’s important is performance optimization. Sometimes a table will be used in many joins and will be used heavily by each of the CRUD (Create, Retrieve, Update, Delete) operations. Depending on how important the column is, you may be able to switch a varchar(10) to a char(10) . The column will allocate more space, but your operations may be more efficient. Also, try to avoid using variable length columns (varchar) as foreign keys. In fact, try to keep your keys as the smallest integer type you possibly can. This is both a space and performance optimization. It's also important to think very carefully about how the database will be accessed. Perhaps certain columns need extra indexes and others need less. Yes, less. Indexes are great for speeding up read access, but slow down insert operations. If you add too many indexes, your database inserts could run your system to a crawl and any index defragmentation could leave you with a painfully enormous transaction log or a non-functioning SQL Server.

This is exactly what happened to a company I worked with in 2005. Every Saturday night for several weeks in a row, the IT team would get an automated page from their monitoring service telling them that all their e-commerce web sites were down. Having received the phone call about 2AM, I looked into a few things and noticed that the transaction log had reached over 80GB for the 60GB database. Being a scientist who refuses fall into the post hoc ergo proctor hoc fallacy, I needed measurements and evidence. The first thing I did was write a stored procedure that would do some basic analysis on the transaction log by pulling data from the fn_dblog( ) function and doing a simple cube and save the results into a table for later review. Then I told them that the next time the problem occurred they were to run the stored procedure and call me the next Monday (a polite way of telling them that I’m sleeping at 2AM on Saturdays). Exactly one week later the same thing happened and the IT department ran the stored procedure as instructed (and, yes, waited to Monday to call me, for which I am grateful). Looking over the stored analysis data, I noticed that there were a tremendous number of operations on various table indexes. That gave me the evidence that I needed to look more closely at the indexes of each of the 5,000+ tables (yes, that’s four digits—now you know why I needed more information). After looking at the indexes, I realized that the database was implemented by someone who didn’t really understand the purpose of indexing and who probably had an itchy trigger finger on the Index Tuning Wizard. There were anywhere from 6 to 24 indexes on each table. This explained everything. When the weekly (Saturday at 2AM) SQL Server maintenance plan would run, each of the indexes were defragmented to clean up the work done by high volume of sales that occurred during the week. This, therefore, caused a very large number of index optimizations to occur. Each index defragmentation operation would be documented in the transaction log, filling the transaction log’s 80GB hard drive, thereby functionally disabling SQL Server.

In your index design, be sure to also optimize your index fill factors. Too full and you will cause a page split and bring your system to a crawl. Too empty and you're wasting space. Do not let a DBA do this. Every bit of optimization requires that a person deeply knowledgeable about the system to implement a complete database design. After the specifications have been written, then the DBA can get involved so that he or she can then run routine maintenance. It is for this reason that DBAs exist. For more information on the internals of SQL Server, see the book Inside SQL Server 2005: The Storage Engine by Kalen Delaney (see also Inside SQL Server 2000). This is a book which should be close to everyone who works with SQL Server at all times. Buy it. Read it. Internalized it.

There’s still more to database modeling. You want to also be sure to optimize for usability. Different O/R mapping solutions will have different specific guidelines, but some of the guidelines are rather general. One such guideline is fairly well known: use singular table names. It's so incredibly annoying to see code like "Entries entry = new Entries( );" The grammar just doesn't agree. Furthermore, LINQ automatically inflects certain tables. For example, a table called “BlogEntry” will be related to the LINQ entity “BlogEntry” as well as “BlogEntries” in the LINQ data context. Also, be sure to keep in mind that your O/R mapper may have special properties that you’ll want to work around. For example, if your O/R mapper creates an entity for every table and in each created entity there is a special "Code" property for internal O/R mapper use, then you want to make sure to avoid having any columns named "Code". O/R mappers will often work around this for you, but "p.Code_" can get really confusing. You should also consider using Latin-style database naming (where you prefix each column with its table name—so-named because Latin words are typically inflected with their sentence meaning thereby allowing isolated subject/object identification), this is not only a world of help in straight SQL joins, but just think about how Intellisense works: alphabetically. If you prefix all your columns with the table name, then when you hit Ctrl-J you’ll have all your mapped properties grouped together. Otherwise, you’ll see the "Id" property and it could be be 10-15 internal O/R mapper properties before you find the next actual entity column. Doing this prefixing also usually alleviates conflicts with existing O/R mapper internal properties. This isn't quite as important for LINQ, but not having table-name prefixed columns in LLBLGen Pro can lead to some headaches.

Clearly, there's more to think about for database design than the entity relation diagramming we learn in Databases 101. It should also be clear that your design will become more optimized the more specific it becomes. A data model designed for any database server probably won’t be as efficient as a well thought out design for your specific database server and for your specific solution requirements. Just keep in mind that your database model is your applications view of reality and is often the heart of the system and it therefore should be optimized for space, speed, and usability. If you don't do this, you could be wasting many gigabytes of space (and therefore also hundreds of dollars of needless backups), have a painfully inefficient system, and have a hard time figuring out how to access the data in your application.

The Wandering Developer

This has been an interesting week.  I did an experiment to help prove something that deep down we all know anyway: YOU DON'T NEED TO BE AT THE OFFICE TO WORK.  Last weekend I drove to Chicago (from Kansas City) to fix a few problems caused by overworking in the previous week and while on the trip, I started my 4-Hour Workweek ("4HWW") training.  The trip was only a Saturday, Sunday, and Monday trip and I had to be back Tuesday for work.  However, on the way back the 4HWW training made me realize the obvious: I work remotely and I am remote.  DUH!  When I realized that, I immediately turned NORTH (home is south) away from Kansas City heading towards Minneapolis.  I also called my client telling him that I'm going to remotely call in for the meeting as there was absolutely no reason for me to physically be there.  While in Minneapolis I stayed with a relative and worked from an office in their house.  Since there was no boss, no client, and no coworkers to bother me, I was able to have PURE productivity just as the 4HWW book said I would have.

It never really made ANY sense to me why, living in the 21st century, we developers need to physically go to an office to have a boss fight our productivity at every turn.  People just work better when people aren't watching.  DUH!  Therefore, as of right now... I'm done working on site and am extending my consultant business ("Jampad Technology, Inc.") to from coast to coast (possibly global soon).  I am no longer going to work at any particular location, but will work from a different city in the United States at various intervals for the next few years (until I get sick of that and change careers completely).  Since I don't own a house, don't have kids, am not married and since my car is completely paid off and have the lowest rent in the world, I can do this without affecting anything.  Why didn't I do this soon?  Well, I only did the 4HWW training last weekend.  Phenomenal training!  I'm sick and tired of living out Office Space every day of my life and, as it turns out, my Seminary work isn't going to do itself.  Last year I instituted by quarterly vacation policy (I take a 3-9 day vacation every 3 months) and the success of that naturally lead to this next step.  It was either that or continue to be on the lame 100 Hour Work Week program that most people are on.  Forget that.  I'm sick of working in an office.  Period.

One thing that I realized recently was something that makes me feel stupid for not thinking of sooner.  As a right-brained (as opposed to left-brained) developer, architect, minimalist, and purist I always try to increase the level of abstraction for my life.  I'm always trying to make things more logically manageable instead of simply physically manageable.  The other day when I handed my drivers license to a cashier at a grocery store and she responded "Wow, you're a long way from home".  I immediately got to thinking what a strange thing that is to say.  First of all, what ever happened to the saying "home is where the heart is".  Is this something people hang on their kitchen wall, but don't ACTUALLY believe?  Is society so bad that people have bumper stickers and plaques of cute little saying, but don't actually believe them? (obviously, yes)  Secondly, this person was making a statement about my physical, not logical representation.  When I realized this, it dawned on me that much of the technology world (including myself) is living in a painful contradiction.  We are trying to making everything logically management (i.e. active directory, the Internet, web server farms), but we just can't seem to have a logical representation of the most important thing of all: people.  There's no reason for me to be in an office every single day just like there's no reason my web server needs to be with me.  Furthermore, what's with those awesome science fiction scenes in movies where people are remotely (logically) present in meetings via 3D projection from all over the world?  We dream of this stuff, but I'm taking it now.

So, I'm now available to help on projects nation-wide project.  If you need .NET (C#), ASP.NET, JavaScript/AJAX, LLBLGen Pro/LINQ, Firefox, or XHTML/CSS development , porting, auditing, architecture, or training, all based on best practices, please drop me a line using my i-name account.  My rate varies from project to project and for certain organizations my rate is substantially discounted.  Also, please note that I will never, ever work with the counter productive technologies VB or Visual Source Safe (if you want me to setup web-based Subversion on your site, drop me a line!)

Comment Rules

Below are my filters for comments; I've made them as simple as possible.

  1. If your comment resembles the immature and nonsensical gibberish on YouTube, then it won't ever see my web site.
  2. If your comment is simply hate mail, then it would be unprofessional for me to post it.  No one needs to read about someone else's insecurities.
  3. If you ask me an in depth question or bring up a conversation topic, then I will, of course, answer the question through the appropriate channel of e-mail.  This isn't a forum or a discussion board, it's a blog that allows for one-way intelligent statements.  My blog and comment system are designed after the idea of a scholarly lecture: there will be no questions in class except for clarification with further conversation being in private sessions.  Responses will be posted to the blog when appropriate, otherwise they will be sent via e-mail (see next rule).
  4. If you don't include your real e-mail and ask a question, then I obviously can't post it or answer your question.  Again, this isn't a forum.   I don't like forums and avoid them whenever I can.
  5. If I have to hire a professional linguist to parse your comment, then I'm not going to read it let alone post it.
  6. If you are going to object to a point, you must obviously cite your reference and/or give precedence.
  7. If you in any way suggest that a rule, standard, law, regulation, code, specification, or guideline isn't a "good one" (not sure what that means), then I cannot post your comment.  We cannot come to the law to judge the law; rather, are judged by the law.

In the past two weeks I've read some really insane comments ranging from people saying that memorization is bad (weird!) to calling "home made" techniques something "hackers" do (this was very much a YouTube-style comment).  Honestly, I have little time for comments like these (or the hate mail ones), so please be aware that I have a very well defined process I follow when considering a comment (FYI, this process was adapted from my process for filtering recruiters-- getting 8 calls a day for jobs is a bit excessive when you already despise the industry).

NetFXHarmonics .NET SolutionTemplate

I've had a number of requests for my SolutionTemplate without the e-book lessons, so below is its Subversion repository:

You may still access the e-book version as well in the related links section, but if all you need is a new project to get started then you can use the above; this repository shares the same code base with the e-book version.  I will be very careful to make sure that the two versions are kept in sync to minimize confusion.

I should also mention that since the initial release, SolutionTemplate has had various substantial updates based on my more recent research.  They will both continue to be updated as I think of more foundational portions that should be put in the template.  This SolutionTemplate has helped me time and time again when starting new projects and I hope it helps you too; you can use it for any of your production projects as you please.

Lastly, in case you're wondering why this isn't a Visual Studio template: Subversion is just a better way to work with code that regularly updates.  It's an extremely versatile lightweight, transaction, distributed file system that allows for extremely efficient updates.  I would pay for each of those things, but Subversion is FREE.  Can't beat that!

Related Links

Brainbench AJAX Exam

Well, it's official: I took the position as role as principal author of the Brainbench AJAX Exam.  Now I need to turn my usual AJAX curriculum into something worthy of an exam.  Basically I need to create a suitable outline with about 7-9 topics and 3-5 subtopics and put 4-6 questions into each subtopic to come up with a grand total of 160 questions.  Since I've done this already with the C# 2.0 exam, it should be fairly straight forward!  Err, maybe...

What will the exam cover?  Well, the fundamentals of AJAX.  I'm working on a video series right now that will cover what I refer to as the three pillars of AJAX: Modern JavaScript, Browser Dynamics (a.k.a. "DOM Manipulation" or "DHTML"), and AJAX communication.  Modern JavaScript topics that will be covered are JavaScript namespaces, closures, multi-cast events.  The browser dynamics include topics such as DOM tree navigation, node creation and removal, interpreting node manipulation (e.g. moving a box, changing a color) as well as architecture decisions (e.g. "should this be a div or a span?").  Finally, AJAX communication topics will include XMLHttpRequest usage, result interpretation, performance concerns, JSON translation, and callback creation.  These are of course not all the topics, but just a sampling.  The point is that the exam will basically be the exam for the video series.

To be clear, I will not have anything vendor specific near the exam.  This is one of the reasons I took the position.  The last thing we need is an exam which tests you on two or three completely different frameworks.  Java developers won't have a clue about ASP.NET AJAX and ASP.NET developers won't have a clue about the other 100 or so frameworks in existence.  I also have absolutely no intention of asking about obscure AJAX techniques that almost no one would ever know (e.g. request queuing, animation).  So, really, my video series will cover more than the exam as I have every intention of relying fairly heavily on Firebug in the video series and , but that can't be on the exam.

Powered by
Python / Django / Elasticsearch / Azure / Nginx / CentOS 7

Mini-icons are part of the Silk Icons set of icons at famfamfam.com