Regular expressions and PostgreSQL have been a great team for many many years. The same is true for PostgreSQL arrays, which have been around for a long time as well. However, what people rarely do is combine those two technologies into something more powerful that can be used for various purposes.
Table of Contents
One of the most widely adopted ways of dealing with arrays is the idea of ANY
and ALL
, which has been supported by PostgreSQL since God knows when. Those two keywords allow us to figure out if a certain value and an array are a match.
Here are some examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
sql test=# SELECT array_agg(x) FROM generate_series(1, 5) AS x; array_agg ------------- {1,2,3,4,5} (1 row) test=# SELECT 3 = ANY(array_agg(x)) FROM generate_series(1, 5) AS x; ?column? ---------- t (1 row) test=# SELECT 3 = ALL(array_agg(x)) FROM generate_series(1, 5) AS x; ?column? ---------- f (1 row) |
The first statement simply generates an array of numbers, which can be used in my example to demonstrate how ANY
and ALL
work. The idea is simple: ANY
will check if one of the values in the array matches - ALL
will check if all of the values are a match. So far this is quite common.
However, what happens if we try to apply this concept to regular expressions?
Many readers might be surprised to learn that combining those two techniques is indeed possible and actually fairly straight forward. Note that in the example above ANY
and ALL
were essentially used in combination with the =
operator. However, we can also apply the ~
operator, which is the PostgreSQL way of handling regular expressions:
1 2 3 4 5 6 |
sql test=# SELECT 'my fancy string' ~ '.*ancy.*ri.+$'; ?column? ---------- t (1 row) |
What it essentially does is matching the regular expression on the right hand side of the operator with the string on the left. So far so good, but what happens if we expand on this a bit?
Here is what we can do:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
sql test=# SELECT array_agg(exp) FROM ( SELECT '.*SELECT.*' AS exp UNION ALL SELECT '.*HAVING.*' UNION ALL SELECT '.*GROUP\s+BY.*' ) AS x; array_agg ------------------------------------------- {.*SELECT.*,.*HAVING.*,".*GROUP\\s+BY.*"} (1 row) |
What we have just created is an array of regular expressions that we can easily apply:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
sql test=# SELECT 'SELECT name, count(*) FROM tab GROUP BY 1 HAVING count(*) > 2' ~ALL(array_agg(exp)) FROM ( SELECT '.*SELECT.*' AS exp UNION ALL SELECT '.*HAVING.*' UNION ALL SELECT '.*GROUP BY.*' ) AS x; ?column? ---------- t (1 row) |
Voilà, we can see that all expressions have matched successfully. All we had to do was utilize ~ALL
instead of =ALL
to do the trick. While this might be obvious to some, it does not seem to be common knowledge out there.
If you want to learn more about pattern matching and fuzzy search, consider reading my blog post dealing with this topic and hopefully learn more about some of those techniques.
Too bad IS [NOT] DISTINCT FROM is not an operator
ERROR: syntax error at or near "any"
LINE 1: select 'a' is not distinct from any ('{a,b,c}'::text[]);
I have been wishing for that too. In particular, an operator could at least potentially use an index.