Why it's time to retire CRUD

By Jason Bloomberg, Managing Director, Intellyx (guest author)

Create, read, update, and delete – the four central database operations we lovingly refer to as CRUD – have represented the central paradigm for interacting with databases since databases were invented in the middle of the last century. These operations found their way into SQL as INSERT, SELECT, UPDATE, and DELETE.

CRUD has been so engrained in the way we think about database interactions that it’s hard to imagine life without it. It’s quite a shock, therefore, that CRUD is fatally flawed.

The problem is with the update and delete operations as they both destroy data. Delete explicitly deletes data from the database, while update replaces the values in one or more records with new values – thus deleting any record of the old values.

Why is such destruction so problematic? What have organizations done over the years to address the problem? And perhaps most importantly, what should you do now?

The Problem of Data Destruction

There are many reasons to keep data around even after it no longer represents the current state of the business.

Data compliance is perhaps the most important reason. Regulations may require us to keep track of data, perhaps for years.

You never know when someone will need to audit an organization’s data or conduct some kind of forensic query that asks what some value was at some point in the past.

Generally speaking, however, there are numerous business reasons why someone would want to know what some value was at a particular point in the past.

What was the price of a stock on a particular day? What services was a customer paying for before they closed their account? The list goes on and on.

Our data compliance requirements, furthermore, may go beyond ‘what was the value at a particular time?’. We may also want to know ‘when did someone change the value?’ – especially when the answers to these two questions are different.

Logical Updates and Deletes with Timestamps and Flags

The basic approach to resolving the problems of data destruction is to keep historic values of data around even after some operation updates or deletes a record.

For updates to existing data, the first step is to maintain time information about each update with a timestamp column in each table. Instead of performing an UPDATE, the database instead performs an INSERT, putting the time of the operation in the timestamp column. We can think of this type of operation as a ‘logical update.’

Subsequent SELECT statements always check the timestamp and return only the most recent value – unless, of course, the query is for a historical value.

Delete flags work in a similar fashion. With a delete flag, no DELETEs are actually performed on the operational database. Instead, an additional column in each table includes a flag that indicates the record should be considered to have been deleted.

In other situations, admins set up separate tables to keep track of such DELETEs. In this case, apps delete rows from the operational database, while the second table records the deleted rows.

Delete flags thus provide for ‘logical’ deletes that enable applications to treat rows as having been deleted, even though no data was actually removed entirely from the database.

There are several problems with both logical updates and deletes. Most significantly, all application logic must support them – not only upon attempted updates and deletions, but also for every subsequent SELECT.

One minor coding error can lead to an updated or logically deleted row being erroneously returned in a resultset.

Secondly, logical deletes become enormously complex when JOINs are involved. For example, if a business entity like ‘customer’ depends upon data from several tables and then the organization deletes that customer, does that mean each of the individual tables should have a delete flag set? Probably not.

Temporal Tables

To address the challenges with logical updates and deletes, some database engines offer temporal tables as a built-in part of their database.

These ‘system-versioned’ temporal tables have two explicitly defined timestamp columns that keep a full history of data changes for that table. The system (aka the database engine) manages the period of validity for each row in the table automatically.

Temporal tables also contain references to a ‘history’ table, which is a second table with a schema that matches the temporal table. The database engine automatically stores the previous version of each row in the temporal table whenever something updates or deletes it.

With temporal tables, therefore, the database engine automatically handles the heavy lifting of supporting logical updates and deletes without requiring a rework of the database access logic.

From Temporal to Bitemporal

Temporal tables are good for tracking state changes of objects represented in the database – what we call ‘valid’ times. For example, if the price of an item changes from $1.00 to $2.00 at a point in time, a temporal table will track when the $1.00 price was valid and then when the $2.00 price was valid.

What it doesn’t track, however, is when the transaction took place that changed that value.

After all, someone might run an UPDATE today on the table that changes the price a week ago. A temporal table will keep track of the price change itself, but not when the UPDATE took place.

Bitemporal databases like XTDB solve this problem. Such databases automatically keep track of both the ‘valid time’ and the ‘transaction time,’ thus fully supporting queries like ‘when was the price equal to $1.00?” as well as queries like ‘when did somebody change the price?’.

The Intellyx Take

The moral of this story is that over the years, the various approaches to resolving the problem of destructive data operations have been more complicated than people expected.

One fix leads to another problem and so on, and before long, developers have written a mass of convoluted code that impacts maintainability and performance.

Bitemporal databases resolve these issues by automatically keeping track of both valid and transaction times.

Operational queries – the day-to-day operations from running the business – take place normally.

Queries about some value in the past, whether it be for business, compliance, or forensic purposes, are also straightforward.

Why it's time to retire CRUD

The Problem of Data Destruction

Logical Updates and Deletes with Timestamps and Flags

Temporal Tables

From Temporal to Bitemporal

The Intellyx Take

Similar posts

The Challenge of Data Compliance

The DIY Bitemporality Challenge

XTDB: Data Compliance Assurance with Bitemporality