Saturday, February 14, 2015

Statistics and ordering operations on JSON fields

The JSON type is not provided with any comparison operators built in, unlike the new JSONB type. One reason for this is that it's not really clear how to do the comparison. In particular, comparing the text values seems wrong, because the white space should not matter, and arguably the order of object keys should not matter either. JSONB doesn't have these problems because it dissolves all the white space and stores object keys in a canonical order, so we have come up with sane if slightly counter-intuitive comparison operations.

This limitation on JSON is somewhat irksome, however. It restricts you from doing some operations on JSON such as DISTINCT, GROUP BY, and ORDER BY.

Another issue is that it causes ANALYZE not to create any rows at all in pg_statistic for JSON columns, so even if all you want to know is the average column width, you can't find it. That makes doing things like measuring table bloat just about impossible.

If you have PLV8 available, you can create operators that work fairly sanely on JSON and that let you generate stats, use DISTINCT etc. The steps are outlined here.

But what if you don't want to load PLV8 just for this? Or what of you can't, like say on a managed service that doesn't provide it? All is not lost.  Here is a version which uses text comparison instead of a PLV8 function.  That means you don't need to have PLV8 loaded. Since it uses text comparison, it is subject to the caveats mentioned about about white space and object keys. But it will work, and you will see rows for the column in pg_statistic. Just be careful using ordering operations or creating indexes, as the results, while consistent, might be surprising.

No comments:

Post a Comment