Thursday, March 24, 2016

Weird stuff happens

Five days ago, my buildfarm animal jacana suddenly started getting an error while trying to extract typedefs. It had been happily doing this for ages, and suddenly the quite reproducible error started. For now I have disabled its typedef analysis, but I will need to get to the bottom of it. It's bad enough to crash the buildfarm client leaving the build directory dirty and not able to process further builds until I clean it up. I'm assuming it's probably something that changed in the source code, as nothing else has changed at all. These are the commits that took place between a good run and the first appearance of the error.

  • 9a83564 Allow SSL server key file to have group read access if owned by root
  • 6eb2be1 Fix stupid omission in c4901a1e.
  • 07aed46 Fix missed update in _readForeignScan().
  • ff0a7e6 Use yylex_init not yylex_init_extra().
  • a3e39f8 Suppress FLEX_NO_BACKUP check for psqlscanslash.l.
  • 0ea9efb Split psql's lexer into two separate .l files for SQL and backslash cases.
 I don't know for dead certain that any of these has caused an issue, but finding out what the problem is is just one more way for me to spend my copious free time.

Saturday, March 19, 2016

GIN indexing an array of enums

GIN indexes of arrays can be pretty useful. And arrays of enums can also be pretty useful. Unfortunately, there is no built in GIN indexing of arrays of enums, and in fact there is no pseudo-type for arrays of enums. So, while we can declare an operator class for supporting a type of "anyenum", which I did in my recent work on btree_gin and btree_gist support for enums, we can't declare one for support of "anyenum[]".

However, we can declare one for support of a concrete enum type's array type. And all the pieces are already there.

After a lot of reading and experimentation, Here's what I found that seems to work just like the builtin GIN array operator classes.

Given an enum type of animal, you can declare an operator class for its array type like this (note how almost completely generic this is):

create operator class _animal_ops 
    default for type public.animal[] 
    using gin 
    family array_ops as  
        function 1 enum_cmp(anyenum,anyenum), 
        function 2 pg_catalog.ginarrayextract(anyarray, internal), 
        function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, 
                                        internal, internal, internal), 
        function 4 ginarrayconsistent(internal, smallint, anyarray, integer, 
                                      internal, internal, internal, internal),
        function 6 ginarraytriconsistent(internal, smallint, anyarray, integer, 
                                         internal, internal, internal), 
    storage oid ;

and here it is working:
andrew=# \d animal_classes
 Table "public.animal_classes"
 Column  |   Type   | Modifiers 
---------+----------+-----------
 class   | integer  | 
 animals | animal[] | 

andrew=# select * from animal_classes where '{lion}' <@ (animals);
 class |   animals    
-------+--------------
     1 | {lion,zebra}

andrew=# set enable_seqscan = off;
SET
andrew=# create index animal_classes_animals_idx on animal_classes using gin(animals);
CREATE INDEX
andrew=# explain (costs off) select * from animal_classes where '{lion}' <@ (animals);
                      QUERY PLAN                       
-------------------------------------------------------
 Bitmap Heap Scan on animal_classes
   Recheck Cond: ('{lion}'::animal[] <@ animals)
   ->  Bitmap Index Scan on animal_classes_animals_idx
         Index Cond: ('{lion}'::animal[] <@ animals)
(4 rows)

andrew=# select * from animal_classes where '{lion}' <@ (animals);
 class |   animals    
-------+--------------
     1 | {lion,zebra}
(1 row)

andrew=# 

Gist and Gin support for enums

Recently I submitted a patch (too late for release 9.6, unfortunately) to add support for enum data types to the btree_gist and btree_gin additional modules. This came out of a desire to be able to use an enum field in an exclusion constraint. That really only requires GiST support, but since I was adding enum support to btree_gist it seemed a good idea to add it to btree_gin as well, so that, for example, enum fields can be used in multi-column GIN indexes.

Here's an exclusion constraint example using enums, adapted from the Postgres docs.

andrew=# create extension btree_gist;
CREATE EXTENSION
andrew=# \dx
                           List of installed extensions
    Name    | Version |   Schema   |                  Description                  
------------+---------+------------+-----------------------------------------------
 btree_gist | 1.2     | public     | support for indexing common datatypes in GiST
 plpgsql    | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

andrew=# create type animal as enum ('zebra','lion');
CREATE TYPE
andrew=# create table zoo (cage int, exhibit animal, exclude using gist(cage with =, exhibit with <>));
CREATE TABLE
andrew=# INSERT INTO zoo VALUES(123, 'zebra');
INSERT 0 1
andrew=# INSERT INTO zoo VALUES(123, 'zebra');
INSERT 0 1
andrew=# INSERT INTO zoo VALUES(123, 'lion');
ERROR:  conflicting key value violates exclusion constraint "zoo_cage_exhibit_excl"
DETAIL:  Key (cage, exhibit)=(123, lion) conflicts with existing key (cage, exhibit)=(123, zebra).
andrew=# 

Tuesday, March 8, 2016

Json dates, times, and binary data

If I had to pick one thing I would change about the JSON spec it's this: they should allow timestamps and dates as primitive values. It's a very common complaint. Another thing they should have provided for is binary data. This gets a bit annoying if you convert a record containing a bytea to json and hand it to the client. They see a string like '\x71951ce4c84c0b4de861377f27a4938a'. If you can, it's probably nicer to encode your bytea before converting it to json, using encode(mybytea,'base64') or encode(mybytea,'hex'). Then your client might see a string for which they have a standard routine to convert to binary, rather than having to strip off the the initial '\x' before converting it.