Friday, September 11, 2009

The case of the non-enforced foreign key relationship

While working with the TicketDesk 2.0 MVC project, I came across a really unusual situation within the relational database that TicketDesk uses.

The resulting voyage of discovery lead to me finding my first legitimate use of a "non-enforced foreign key relationship" in a relational database... The situation required an explicitly defined foreign key relationship, but also prohibited the server being able to enforce the relationship.

Here is how it happened...

I'm trying to make as few changes to the TicketDesk 1.x data model as possible. The focus for the 2.0 project is the shift to the new MVC platform and sticking with the old data model makes it easier to support upgrades. Plus the existing model is pretty decent as it is.

In TicketDesk, attachments are stored in the database rather than the file system. This prevents the need for write access to the web server's file system. It also allows you to copy or move ALL of the data by just moving the database. Simpler all-around.

We could have a huge discussion about the drawbacks and advantages of this design. The wisdom of using a RDBMS to store binary file data is very debatable. But the end-analysis here is that TicketDesk isn't expected to handle a lot of files, they are not likely to be large files, and the advantages of having all the ticket data in the database outweighs the disadvantages in most environments.

Anyway, the big problem in TicketDesk 1.x is that the client-side file upload mechanism is terrible. It does gets the job done, but it requires that files be uploaded together in a single HTTP POST. This can be slow, the process is subject to request timeouts and size limits, and it uses a LOT of resources on the server. Additionally, if the server rejects the submission then the user has to re-upload everything all over again.

For TicketDesk 2.x I wanted to switch to using a flash based uploader. I personally HATE flash, but when it comes to file uploading plain HTML and javascript fail to provide any good solutions.

The flash uploader will send the files up to the server using separate out-of-band connections to the web server. This means that the server has to "put" the files somewhere until the user has finished submitting the rest of the form.

For new tickets, the files will get uploaded before there is a ticket in the database to associate them with. For existing tickets, the uploads need to be held in a pending status until the user has finished supplying the meta-data  (descriptions, file names, and comments for the activity history log).

I really didn't want the server to dump pending attachments to the file system or hold them in memory, so I needed to modify the database to provide a place for the server to put the uploaded files as soon as they arrived.

The original data model looked like this:



As you can see, there are two tables. Tickets and TicketAttachments. The contents of the file are directly stored in TicketAttachments along with meta-data (file size, name, description, file type, etc). The old primary key was on both TicketId and FileId... not my best primary key definition ever, but it made sense at the time. TicketId is foreign keyed to the TicketId column in parent Tickets table.

Not a remarkable design really.

At first I toyed with creating a new "PendingFiles" table. But there was already a table with the right kind of structure in the database. It seemed to be a waste to duplicate table structure and I've never been a fan of "staging" tables anyway.

So I started by figuring out how to use the existing TicketAttachments table to store the "pending" files as well as "real" files.

For pending attachments to an existing ticket, the table is already pretty good. All that is needed is a column to flag files as pending instead of real. So I added an "IsPending" bit column to the table. New attachments have IsPending flagged. Once the user commits the changes, we just flip the bit to make the file a real attachment.

Here is the revision.


Now, to handle attachments for new tickets. I figured that I could put the files in the same table, but we'd need to leave the TicketId column null... we don't know the TicketId yet.

I didn't like the primary key as it was anyway. FileId is an identity column and can serve as a primary key all by itself.

So, I changed the primary key to FileId then marked TicketId column to allow nulls and got this:


Great! Except...

Because of the foreign key relationship, we can't add a row to TicketAttachments with a null TicketId value unless there is a row in the Tickets table that also has a null TicketId. We can't make TicketId in the Tickets table nullable because it is the primary key (and an identity field to boot).

Ok fine... the DBA in me screamed bloody murder, but I deleted the foreign key relationship. I've been doing web development long enough to know when the advantages of denormalization outweigh good relational design. I hate having to manage relationships purely in code, but sometimes that really is the best way.

So we get to this:



But! LINQ to SQL needs the foreign key to generate the relationship between the entities. I could manually define the relationship in the LINQ to SQL mapping file, but I *know* that this is going to bite me in the ass later.... it always does. Combine that with the uncomfortable absence of a foreign key in the database itself and I just couldn't bring myself to let that slide.

Then I remembered that there is this odd setting in the SQL Management Tools. The designer for editing foreign keys has a true/false flag for "enforce relationship". I never paid it much attention... I mean, what crazy bastard would go to all the effort to define a foreign key, then tell the server to ignore it?

That just never made any sense to me. Until now...

So I did some testing and research. If you set a foreign key up in SQL, but tell it not to enforce it (this is called the NOCHECK option) then the SQL server will behave as if the foreign key doesn't exist. Thus, we can add rows to TicketAttachments that don't correspond to a row in the parent table in direct violation of the foreign key.

Tools that read the database schema do see that there is a relationship though, and so LINQ to SQL will still generate a relationship in the mapping file automatically. From the point of view of external tools it is as if there actually is a regular foreign key relationship in place.

So the final design ended up looking like this:


The foreign key shown here is setup with the NOCHECK option and is not enforced by the server.

So now I know why you'd use the NOCHECK option... I don't expect to need it often, but now I know why such a bizarre option exists and have finally, after years of working with databases, had a good reason to use it.

No comments:

Post a Comment