Software Design: Why CRUD is crud (technical)

(This is going to be a very technical post, so if you are not a programmer, feel free to ignore it.)

One thing that bothers me about many modern OOP software libraries and frameworks is the reliance on CRUD (Create, Read, Update, Delete) when handling persistent data, particularly those that are stored in a relational database. Most ORMs to varying degrees automatically set up your data objects to have these operations, and that leads many developers to think that that is the Right Way to think about data objects.

This is fundamentally flawed: The point of using OOP is to connect in your program a set of related data points with the associated operations on that data. Creating an object to represent a row in a database table makes sense, since the row is (in theory at least) a set of related data, but what doesn't make any sense at all is to limit the definition of that object to its ability to be persisted in a database. In fact, calling them "data objects" implies that they are essentially just a bunch of fields a la the struct{} in C without significant related functionality, which defeats the purpose of having an object in the first place.

How this gets you into trouble

(Since this issue is language-agnostic, I'm using a Python-like pseudocode. Insert whatever language-appropriate syntax you like.)

Consider a simplistic online commenting system, with two kinds of persistent data, Article and Comments. These are represented by two classes:

class Article:
id
title
article_text
...
class Comment:
id
article_id
comment_text
submitter_nick
time_submitted

You set up a web form with a submit button that posts data that looks like this:

{article_id: 5
comment_text: "blah blah blah"
submitter_nick: "Graham Chapman"}

This trickles into some kind of controller function, where you do something like this:

...
comment = new Comment()
comment.submitter_nick = post_data.submitter_nick
comment.article_id = post_data.article_id
comment.comment_text = post_data.comment_text
comment.time_submitted = now()
comment.save()
...

Now, this seems perfectly fine for a small application. Comments get posted, and life is good. But let's say this application grows a bit, and you add on a moderation system, and a threading system, and a way to edit and correct former comments, and a user system so people don't have to enter in the same nickname every time, and so on. At this point, your Comment class is getting referenced in dozens of places all over your program, and they all do a pretty similar kind of operation:

...
comment = Comment.load(comment_id)
comment.[field 1] = new_value_1
comment.[field 2] = new_value_2
comment.save()
...

Time goes by, and like most programmers you forget the details of the (seemingly) working code you wrote weeks ago.

And then something goes wrong. The only evidence you have for it is wildly wrong article_id values in your Comment table in the database that might not even be pointing to a real article (because some database engines don't check for this). Quick: When can the article_id be set, so we can make sure it's not getting modified incorrectly?

And now you have a problem, because the Comment class tells you nothing about where, how, and why article_id actually gets set. Your only option, really, is to run grep or your IDE search tool, locate every instance of the word "article_id", and go through each one individually.

Staying out of trouble

You can easily prevent this problem. We're going to move the operations on the Article and Comment into their classes:

class Article:
...
post_comment(submitter_nick, comment_text):
comment = new Comment()
comment.post_comment(this.id, submitter_nick, comment_text)
class Comment:
...
post_comment(article_id, submitter_nick, comment_text):
this.submitter_nick = submitter_nick
this.article_id = article_id
this.comment_text = comment_text
this.time_submitted = now()
this.save()

And now your code that handles the post looks like this:

...
article = Article(post_data.article_id)
if article.not_found:
fail
article.post_comment(post_data.submitter_nick, post_data.comment_text)
...

Now I'm looking in one class to find out what can happen to a Comment, rather than through the entire code base.

Some nuts and bolts

If you are using an ORM that will autogenerate code and clobber additional functionality that you added, then this is a bit trickier. There are 2 ways to deal with this:

  1. Switch ORMs if possible to one that will allow you to do this without clobbering your code. This is a known problem and your ORM should not be making your life harder.
  2. If you can't switch ORMs, then subclass your ORM's autogenerated class, put your functionality there, and use the subclass in the rest of your application rather than the autogenerated class. You can create an entire tree of subclassed ORM-generated objects if necessary.

How this changes your mindset

What I've found is that those who think in terms of CRUD operations often structure their entire application around those concepts, all the way up to the UI level. You end up with screens with titles like "Edit Customer". That is lazy thinking - you are demanding that your users to translate business concepts like "Customer changed their address" to technical concepts like "Edit Customer".

When you find yourself with an "Edit Customer" page, it's worth asking yourself "Why would a user want to change this data? What procedures is it attached to?" After a little bit of thinking, and discussions with your users if they're available, you will end up with a UI that more closely matches how users are trying to do. You can then build your application, all the way down to the data object level, around the real-world operations that are going on: Change address. Schedule a call. Adjust pricing level. And now your code reflects that reality from top to bottom, so when your users come to you and say "We want to change how we handle pricing adjustments", it's easy to know where to start.

Current rating: 1