Friday, August 22, 2008

ADO.NET Entity Framework: Impressive! Powerful! Useless!

The new Microsoft Entity Framework is the latest in a long line of very impressive, yet tragic failures in Microsoft's data access strategy...

The basics of ADO.NET are great. SqlCommand, SqlConnection and their relatives for other platforms... awesome. But almost every single version of ADO.NET has failed when it comes to useful higher level abstractions. To be sure, they each demo well and they each have uses with simple applications (like those you'd be shown a demo of).

But whenever it comes to complex applications, the abstractions tend to become cumbersome, restrictive, an inflexible. The result is that most serious application end up using 3rd party frameworks or custom abstraction layers instead and under-the-hood they tend to stick to the basic ADO.NET SqlConnections and SqlCommands to do the dirty work.

In .NET 1.x it was DataSets and SqlAdapters. In .NET 2.x it was DataTables and TableAdapters. And now we have the ADO.NET Entity Framework (EF).

I had high hopes for EF. MS had clearly recognized that a radical new approach would be needed if they were to achieve a useful abstraction without having to re-invent the same wheel over and over again every few years. They had also set some pretty good goals in terms of making EF useful in support of other data stores outside the classic relational database.

Sadly, what has been delivered in the 1.x version of EF is hopelessly crippled by the deliberate lack of implicit lazy loading.

Here is an example of what this means. Assume we have two logical EF entities that map more-or-less directly to physical tables in a database. One is the Order object and the other the Customer. These entities have a navigation relationship between each other (which is analogous to the physical database's foreign key).

If you get a reference to an Order object it will have a property called Customer. This property is how you'd navigate the relationship between the entities.

So you'd expect that if you look at MyOrder.Customer you'd get back a reference to an instance of the Customer entity... But you would be fucking wrong!

The Customer property on the instance of Order may not have been fetched from the database automatically when you obtained your reference to the Order...

Instead of implicit lazy loading, EF has "Deferred Loading"... or if you prefer you can call it "Explicit Lazy Loading". The idea is that you can check to see if the Customer has been loaded for an Order, and if not then you can explicitly load it when and if you need it. But it will not automatically load the data for these properties and related entities unless you explicitly tell the framework to do so (which is unlike most ORM frameworks, LINQ to SQL, etc.).

What happens in real applications is that you never know what has and has not been loaded. So your code is chock-full of bullshit like this:

if(!someOrder.CustomerReference.IsLoaded)
{
 someOrder.CustomerReference.Load();
}
string customerLastName = someOrder.Customer.LastName;

This allows you to do what you need to be doing in your code, but the price is that you HAVE to do this all over the fucking place... every time you want to access a property that traverses a relationship. You end up with more checks for data than you do data in the first place.

Even worse than that though, when you code something to use your EF model, now you now have to somehow magically "know" which properties on your entities are going to traverse a relationship and which don't.

If you know in advance that you are going need the related data later on, then when you fetch your entity you can a technique known as "Eager Loading" to tell EF to go ahead and load up the related data in advance.

This looks like this:

var x = entities.Orders.Include("Customer");

Again, this allows you to do what needs doing, but if you are making the fetched order entity available to other classes (like as a return value from a public method)... then the caller isn't going to know if you pre-loaded the relationships or not... so they'll still have to do the whole "IsLoaded" anti-tard checking shit anyway.

In his response to a nasty online petition called the "Vote of No Confidence", Tim Mallalieu defended the lack of implicit lazy loading with this statement:

"We took a fairly conservative approach in v1.0, because we wanted developers to be aware of when they were asking the framework to make a roundtrip to the database... our take on 'boundaries are explicit'."

That has to be the most depressing statement that I've ever read regarding ANY data access technology ever!

Hey guys!

The entire point of an abstraction layer is so that developers using that layer DON'T have to be aware of the damned internal workings under the abstraction layer!!!!

But most offensive to me is the overall fact that I cannot "trust" the EF model. For example... if I have a Customer entity then I have no way to know if Orders property contains an empty collection because there aren't any orders of if it is empty because the framework hasn't loaded data. Instead of being able to trust the entity model to be accurate I have to baby sit it and constantly ask "are you sure you loaded data for this already?".

Fuck that!

Then we get into the other side effect. All the mechanisms needed to do the paranoid checking-up after EF use some counter-intuitive techniques. Before I check the Customer property on an Order entity, I have to first check up with a CustomerReference property on my Order to get information about the state of the contents of the actual Customer property? Huh?

Yeah.... that's really slick there!

The eager load technique pisses me off even more!

So I can tell the EF to go ahead and load relationships... but to load them I have to use a method that takes a fucking string as an argument?!

So now I also have to be an expert on exactly what each navigation property in my EF model is specifically named... and that without strong type checking or intellisense? Sure... I can do that, but it slows me down and is just begging for a runtime bug (typoed the name or messed up the capitalization). That means I am constantly having to refer to the damned diagram all the time too which just slows me down and annoys the shit out of me at the same time!

Using EF without lazy loading is a good way to drive yourself into becoming so paranoid you'll need to remember to take your anti-psychotic meds before you even open Visual Studio!

There are a few 3rd party attempts out there to get implicit lazy loading features with the current version of EF. These are clever hacks, and I even tested out one of those. Overall, the hacks give you a much better experience than using the stock Entity Framework as is, but this is also code that will be hard to update to use any future releases of EF too, and these impose other limitations on your code too. I suppose though that if you HAD to use EF, you'd still be smart to use one of these 3rd party techniques to get the implicit lazy loading anyway.

LINQ to SQL may lack the ability to provide a true logical abstraction for your physical data model, or do fancy inheritance, or even handle some of the more unusual data mappings...  but at least you can TRUST that properties in a LINQ to SQL Entity will actually contain data that reflects what is in the real database. Plus the overall usage pattern of LINQ to SQL is much clearer and simpler.

Until EF gets built-in implicit lazy loading, screw it... I'll just use LINQ to SQL.


No comments:

Post a Comment