In today's inaugural edition of "We review a random article I saw on Hacker News, clicked on, and read," we have You Probably Don't Need Query Builders by Matt Righetti. He appears to work at Cloudflare, which may suggest why his writing is good, or vice versa! (I love their voluminous blog.)
Basically, Matt demonstrates a series of approaches one may take
to dynamically construct SQL queries according to some inputs.
Most of these involve conditional filters: e.g.
if a max age is provided, then append "AND age < $maxAge
".
Matt advocates for moving the conditional logic from your wrapping language (e.g. Kotlin, Rust) to the SQL. Create one SQL statement that behaves correctly under all filter conditions.
SELECT * FROM users WHERE ($maxAge is NULL OR age < $maxAge)
(In all cases, Matt uses a library to bind parameters rather than concatenation/interpolation to avoid SQL Injection, as you should too! Don't get pwned!)
My key observation about Matt's proposal is in my word choice for describing it: This moves the conditional logic from the wrapping language to the SQL.
The conditional logic is still there. If you're equally capable in Kotlin and SQL, then I doubt moving from the former to the latter reduces the risk of error. Furthermore, your wrapping langauge, Kotlin, benefits from myriad assistance with the compiler and your IDE, which track nullity issues, exhaustive enum value consumption in enums, etc., whereas you will only detect an issue with your SQL if you happen to test the condition that causes the bad behavior.
This may be a wash in the relatively simple examples (as must be)
in Matt's blog post
(though, without syntax highlighting, my confidence in accuracy on first read
is low, despite working in exactly this sort of SQL daily,
and I would have to look up the functions used in
id IN (ARRAY_REMOVE(STRING_TO_ARRAY($1, ','), ''))
every time I saw it to think about its behavior,
and of course the IDE doesn't provide Cmd+Click hints on it as a raw string)...
... but in the queries I find myself building dynamically, the fully SQL-ized version of that logic would be monstrous.
My more common reason for breaking the SQL query into a series of conditionals or functions is to reduce the cognitive complexity of each component.
For example, I'll have a structure like:
SELECT id, ${otherColumns()} FROM ${source()} WHERE ${filters()}
The otherColumns()
function may return any combination of the
columns available depending on context (e.g. if 30 columns, then 2^30
possible column sets, which cannot be pre-written).
The source()
function depends on the originating user,
the state of that user's assets, the day of the month, or the state of the machine.
Sometimes it is the name of a table, the path to a local file, a subquery, or
the result of some Common-Table-Expression (CTE) prepended above the SELECT
.
The filters()
function can range from examples like Matt's to some dozens
of potential filters, many involving subqueries and arrays and structs.
The sheer length of the query may be huge to begin with,
stretching into hundreds of kilobytes.
Enumerating some NULL
case for all the conditions not used
would stretch into ridiculous situations like multi-megabyte queries.
Obviously, Matt is merely suggesting to the reader that some cases may be simpler to write and maintain with more of the work in SQL, not suggesting that one should always do in SQL what one could do in Kotlin.
Likewise, I'm merely suggesting the reader remember that one can do in the wrapping language what would be horrifying to do in SQL.