SQLite NULLs are Weird!
Yes, you read it right, SQLite does treat all NULL values differently, therefore, a column with a UNIQUE constraint can have multiple NULL values. It can be confusing coming from a database like Postgres, MySQL, or even MongoDB. And I’ll prove it with some examples! I learnt a while back while working on LiteQueue: a Golang a queueing library.
First we create our schema we’ll use to test throughout this post. Then we’ll insert two records which normally should not be allowed to be inserted.
drop table if exists sample;
create table if not exists sample (
id TEXT primary key,
email TEXT,
deleted_at TEXT,
UNIQUE(email, deleted_at)
) strict;
insert into sample (id, email, deleted_at) values ('1', 'ray@mail.com', null);
-- This will not fail because the constraint doesn't hold
insert into sample (id, email, deleted_at) values ('2', 'ray@mail.com', null);
-- check the content of the sample table
select * from sample;
That’s odd right? Ok, first let’s confirm that our table actually has the constraint.
drop table if exists sample;
create table if not exists sample (
id TEXT primary key,
email TEXT,
deleted_at TEXT,
UNIQUE(email, deleted_at)
) strict;
-- check to see if our constraint was actually defined as part of the table
SELECT sql FROM sqlite_schema WHERE name = 'sample';
If you ran the snippets you can see that both rows were actually inserted into the table and that the constraint was actually defined in the table. So now we have two questions:
- Why does this happen?
- How can we ensure uniqueness when using two keys in SQLite?
Why are nulls like this?
blw bkw bkw why? ref: https://www.sqlite.org/nulls.html
Ensuring Uniqueness
We’ll explore two ways to mitigate this.
Using a generate column
drop table if exists sample;
CREATE TABLE sample (
id TEXT PRIMARY KEY,
email TEXT,
deleted_at TEXT,
deleted_at_coalesced TEXT GENERATED ALWAYS AS (COALESCE(deleted_at, '1970-01-01T00:00:00.000Z')) STORED,
UNIQUE(email, deleted_at_coalesced)
) STRICT;
insert into sample (id, email, deleted_at) values ('1', 'ray@mail.com', null);
-- This will fail due to because of the constraint on the email and the generated column. Uncomment it to test it out
-- insert into sample (id, email, deleted_at) values ('2', 'ray@mail.com', null);
insert into sample (id, email, deleted_at) values ('3', 'ray@mail.com', '2024-11-12T00:00:00.000Z');
insert into sample (id, email, deleted_at) values ('4', 'ray@mail.com', '2024-11-11T01:00:00.000Z');
insert into sample (id, email, deleted_at) values ('6', 'different@mail.com', null);
update sample set deleted_at = '2024-11-11T02:00:00.000Z' where deleted_at is null;
insert into sample (id, email, deleted_at) values ('7', 'different@mail.com', null);
update sample set deleted_at = '2024-11-11T03:00:00.000Z' where deleted_at is null;
insert into sample (id, email, deleted_at) values ('8', 'different@mail.com', null);
select * from sample;
Using an index
We’re going to be using an index on both the email
and deleted_at
fields.
1 | CREATE UNIQUE INDEX if not exists idx_sample_null_deleted_at |
Let’s test it now, we’ll insert some records, update them and insert similar conflicting records. We’ll be using strftime('%Y-%m-%dT%H:%M:%fZ')
as our now()
function. he value of strftime('%Y-%m-%dT%H:%M:%fZ')
is the same in the same transaction
drop table if exists sample;
create table if not exists sample (
id TEXT primary key,
email TEXT,
deleted_at TEXT
) strict;
CREATE UNIQUE INDEX if not exists idx_sample_null_deleted_at
ON sample(email) WHERE deleted_at IS NULL;
insert into sample (id, email, deleted_at) values ('1', 'ray@mail.com', null);
-- This will fail due to idx_sample_null_deleted_at, uncomment it to test it out
-- insert into sample (id, email, deleted_at) values ('2', 'ray@mail.com', null);
insert into sample (id, email, deleted_at) values ('3', 'ray@mail.com', '2024-11-12T00:00:00.000Z');
insert into sample (id, email, deleted_at) values ('4', 'ray@mail.com', '2024-11-11T01:00:00.000Z');
insert into sample (id, email, deleted_at) values ('6', 'different@mail.com', null);
update sample set deleted_at = strftime('%Y-%m-%dT%H:%M:%fZ') where deleted_at is null;
insert into sample (id, email, deleted_at) values ('7', 'different@mail.com', null);
update sample set deleted_at = strftime('%Y-%m-%dT%H:%M:%fZ') where deleted_at is null;
insert into sample (id, email, deleted_at) values ('8', 'different@mail.com', null);
select * from sample;
As seen in the results using an index is the best way to ensure the unique constraint is held.
Acknowledgements
In my last post I wanted to add interactive SQL scripts to it so users could run the tests in their browser as they were reading th post. Fortunately, I had been following Anton for a while and figured it was time to finally integrate Codapi. This post won’t have been possible without Codapi.