Monday, August 8, 2016

Using EXPLAIN json format output in plpgsql

It's possible to use EXPLAIN output in plpgsql. In the default text format, the result comes back as a set of text values, so you can process them in a loop like this:


declare
   exp text;
begin
   for exp in explain myquery
   loop
      raise notice '%', exp;
   end loop;
end;


If you use json format output, however, the result comes back as a single json document rather than a set of lines. You still might need a loop - I haven't found another way yet of getting the output from EXPLAIN into a variable - but the loop will only have one iteration. Here is an example taken from a function I wrote the other day that lets you get the estimated number of rows from the plan:



declare
   exp json;
begin
   for exp in explain (format json) myquery
   loop
      raise notice 'rows: %', exp#>>'{0,Plan,Plan Rows}';
   end loop;
end;

Thursday, June 9, 2016

Removing a Buildfarm client/server perl dependency

With one very small exception, there is nothing in the buildfarm server that requires the client to be running perl, even though both the server and the standard client are written in perl.

Andreas Scherbaum has written a new specialized client in python, and he asked me about removing that dependency. Essentially what this requires is that an alternative mechanism be allowed for transmitting the serialized configuration in the client's web transaction. Currently the client uses perl's Storable module to serialize the data, and the server uses the same module to de-serialize it. The obvious alternative candidate is to serialize it as JSON.

The first problem was to determine if we can sanely distinguish between data serialized by the two mechanisms. And we can. JSON is not allowed to contain any control characters, and a structure serialized using Storable's nfreeze() method is pretty much guaranteed to contain such characters. So I've added a test to the receiving program that looks for such characters and if it doesn't find them assumes that the data is JSON and decodes it accordingly. This has been tested using the client nightjar.

There are a few wrinkles, though. The main one is that we can't assume that every client has a module available to encode the data as JSON. In fact, nightjar didn't until I installed one. So in order to build this into the client I'll need to find a way to determine at runtime if it's available and fall back to using Storable otherwise.

We should also look at actually storing the data as JSON rather than as a serialized blob. When the buildfarm was created we didn't have any database support for JSON, but now the sensible thing to do would be to store the data as jsonb, and make it searchable.

But that's a project for another day.

Saturday, June 4, 2016

Using wal-e with Postgres on a non-standard port

I just discovered after a great deal of puzzlement and far too much time that if you run multiple instances of Postgres on a machine, and you're using wal-e for continuous backup, you'll need an entry for PGPORT or PGSERVICE in the environment directory for any instance that's not running on port 5432. Otherwise, wal-e will happily start and stop a backup on your port 5432 server and your non-5432 backups will be completely useless since they won't have a backup label.

Yet another example of the fact that any untested backup is not worth anything. If I hadn't tested this and found the problem we could have been left without a backup when we needed it.

Tuesday, May 31, 2016

Indiscriminate use of CTEs considered harmful

Common Table Expressions are a wonderful thing. Not only are they indespensible for creating recursive queries, but they can be a powerful tool in creating complex queries that are comprehensible. It's very easy to get lost in a fog of sub-sub-sub-queries, so using CTEs as a building block can make things a lot nicer.

However, there is one aspect of the current implementation of CTEs that should make you pause. Currently CTEs are in effect materialized before they can be used. That is, Postgres runs the query and stashes the data in a temporary store before it can be used in the larger query. There are a number of consequences of this.

First, this can be a good thing. I have on a number of occasions used this fact to good effect to get around problems with poorly performing query plans. It's more or less the same effect as putting "offset 0" on a subquery.

However, it can also result in some very inefficient plans. In particular, if CTEs return a lot of rows they can result in some spectacularly poorly performing plans at the point where you come to use them. Note also that any indexes on the underlying tables will be of no help to you at all at this point, since you are no longer querying against those tables but against the intermediate result mentioned above, which has no indexes at all.

This was brought home to me forcefully on Friday and Saturday when I was looking at a very poorly performing query. After some analysis and testing, the simple act of inlining two CTEs in the query in question resulted in the query running in 4% of the time it had previously taken. Indiscriminate use of CTEs had made the performance of this query 25 times worse.

So the moral is: be careful in using CTEs. They are not just a convenient tool for abstracting away subqueries.

There has been some discussion about removing this aspect of the implementation of CTEs. It's not something that is inherent in CTEs, it's simply a part of the way we have implemented them in PostgreSQL. However, for now, you need to be familiar with the optimization effects when using them, or you might face the same problem I was dealing with above.


Friday, April 29, 2016

Windows XP - PostgreSQL's long goodbye.

I just committed a patch to allow building with Visual Studio 2015. Due to a change in the way we need to detect locales with this compiler, it will not be possible to use binaries built with it on Windows XP. Despite the fact that Microsoft declared Windows XP to be completely out of support well over a year ago, it still lives on in a huge number of installations. My own instance still gets occasional updates from Microsoft. And you can still build and run the very latest PostgreSQL on Windows XP. But you can't use it on Windows XP if it's built with Visual Studio 2015 or later.

I will keep my instance (whose only job is to run several buildfarm members) running as long as it doesn't require any significant tweaks. But a day will eventually come when it does require such tweaks, or just won't build and run successfully any more,  and at that stage I'll shut it down.


Sunday, April 17, 2016

Building an extension with VS2015

I needed to get a DLL prepared of my tiny closed format ranges extension, and so I tried following the excellent instructions Craig Ringer gave a couple of years ago. I was using Visual Studio 2015, on a Windows 10 box, targeting PostgreSQL 9.4, 64bit.

But I ran into some problems. I could compile the thing quite cleanly, but Postgres couldn't find the functions when I tried to create the extension. I had a short and very helpful email exchange with Craig, and came up with this solution: the exported functions needed to be declared with a prototype before the call to PG_FUNCTION_INFO_V1, and both the prototype and the function definition needed to be decorated with PGDLLEXPORT. Once I had done that, the DLL loaded when I ran "create extension", and the types worked as expected.

I don't know if this is required with any earlier versions of Visual Studio, or even of it works with them.  When I get a chance I will see if I can find out.

Based on this, I think it's probably worth trying at some stage to put together a script to build the project and solution files for extensions. I don't have time to do that right now, however.

Thursday, March 24, 2016

Weird stuff happens

Five days ago, my buildfarm animal jacana suddenly started getting an error while trying to extract typedefs. It had been happily doing this for ages, and suddenly the quite reproducible error started. For now I have disabled its typedef analysis, but I will need to get to the bottom of it. It's bad enough to crash the buildfarm client leaving the build directory dirty and not able to process further builds until I clean it up. I'm assuming it's probably something that changed in the source code, as nothing else has changed at all. These are the commits that took place between a good run and the first appearance of the error.

  • 9a83564 Allow SSL server key file to have group read access if owned by root
  • 6eb2be1 Fix stupid omission in c4901a1e.
  • 07aed46 Fix missed update in _readForeignScan().
  • ff0a7e6 Use yylex_init not yylex_init_extra().
  • a3e39f8 Suppress FLEX_NO_BACKUP check for psqlscanslash.l.
  • 0ea9efb Split psql's lexer into two separate .l files for SQL and backslash cases.
 I don't know for dead certain that any of these has caused an issue, but finding out what the problem is is just one more way for me to spend my copious free time.

Saturday, March 19, 2016

GIN indexing an array of enums

GIN indexes of arrays can be pretty useful. And arrays of enums can also be pretty useful. Unfortunately, there is no built in GIN indexing of arrays of enums, and in fact there is no pseudo-type for arrays of enums. So, while we can declare an operator class for supporting a type of "anyenum", which I did in my recent work on btree_gin and btree_gist support for enums, we can't declare one for support of "anyenum[]".

However, we can declare one for support of a concrete enum type's array type. And all the pieces are already there.

After a lot of reading and experimentation, Here's what I found that seems to work just like the builtin GIN array operator classes.

Given an enum type of animal, you can declare an operator class for its array type like this (note how almost completely generic this is):

create operator class _animal_ops 
    default for type public.animal[] 
    using gin 
    family array_ops as  
        function 1 enum_cmp(anyenum,anyenum), 
        function 2 pg_catalog.ginarrayextract(anyarray, internal), 
        function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, 
                                        internal, internal, internal), 
        function 4 ginarrayconsistent(internal, smallint, anyarray, integer, 
                                      internal, internal, internal, internal),
        function 6 ginarraytriconsistent(internal, smallint, anyarray, integer, 
                                         internal, internal, internal), 
    storage oid ;

and here it is working:
andrew=# \d animal_classes
 Table "public.animal_classes"
 Column  |   Type   | Modifiers 
---------+----------+-----------
 class   | integer  | 
 animals | animal[] | 

andrew=# select * from animal_classes where '{lion}' <@ (animals);
 class |   animals    
-------+--------------
     1 | {lion,zebra}

andrew=# set enable_seqscan = off;
SET
andrew=# create index animal_classes_animals_idx on animal_classes using gin(animals);
CREATE INDEX
andrew=# explain (costs off) select * from animal_classes where '{lion}' <@ (animals);
                      QUERY PLAN                       
-------------------------------------------------------
 Bitmap Heap Scan on animal_classes
   Recheck Cond: ('{lion}'::animal[] <@ animals)
   ->  Bitmap Index Scan on animal_classes_animals_idx
         Index Cond: ('{lion}'::animal[] <@ animals)
(4 rows)

andrew=# select * from animal_classes where '{lion}' <@ (animals);
 class |   animals    
-------+--------------
     1 | {lion,zebra}
(1 row)

andrew=# 

Gist and Gin support for enums

Recently I submitted a patch (too late for release 9.6, unfortunately) to add support for enum data types to the btree_gist and btree_gin additional modules. This came out of a desire to be able to use an enum field in an exclusion constraint. That really only requires GiST support, but since I was adding enum support to btree_gist it seemed a good idea to add it to btree_gin as well, so that, for example, enum fields can be used in multi-column GIN indexes.

Here's an exclusion constraint example using enums, adapted from the Postgres docs.

andrew=# create extension btree_gist;
CREATE EXTENSION
andrew=# \dx
                           List of installed extensions
    Name    | Version |   Schema   |                  Description                  
------------+---------+------------+-----------------------------------------------
 btree_gist | 1.2     | public     | support for indexing common datatypes in GiST
 plpgsql    | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

andrew=# create type animal as enum ('zebra','lion');
CREATE TYPE
andrew=# create table zoo (cage int, exhibit animal, exclude using gist(cage with =, exhibit with <>));
CREATE TABLE
andrew=# INSERT INTO zoo VALUES(123, 'zebra');
INSERT 0 1
andrew=# INSERT INTO zoo VALUES(123, 'zebra');
INSERT 0 1
andrew=# INSERT INTO zoo VALUES(123, 'lion');
ERROR:  conflicting key value violates exclusion constraint "zoo_cage_exhibit_excl"
DETAIL:  Key (cage, exhibit)=(123, lion) conflicts with existing key (cage, exhibit)=(123, zebra).
andrew=# 

Tuesday, March 8, 2016

Json dates, times, and binary data

If I had to pick one thing I would change about the JSON spec it's this: they should allow timestamps and dates as primitive values. It's a very common complaint. Another thing they should have provided for is binary data. This gets a bit annoying if you convert a record containing a bytea to json and hand it to the client. They see a string like '\x71951ce4c84c0b4de861377f27a4938a'. If you can, it's probably nicer to encode your bytea before converting it to json, using encode(mybytea,'base64') or encode(mybytea,'hex'). Then your client might see a string for which they have a standard routine to convert to binary, rather than having to strip off the the initial '\x' before converting it.

Saturday, February 20, 2016

Release 4.17 of the PostgreSQL Buildfarm client

I have just cut release 4.17 of the PostgreSQL Buildfarm client. It is available at http://www.pgbuildfarm.org/downloads/latest-client.tgz.

Changes of note:
  • use PGCTLTIMEOUT instead of hardcoded timeout settings
  • shipped config file is renamed to build-farm.conf.sample to avoid
  • overwriting a default config file
  • use configured make command in make_bin_installcheck
  • restrict restart sleeps to Windows platforms
  • fix trigger_exclude pattern in build-farm.conf
  • protect git checkout with a lock if using git_use_workdirs

For the most part, the thing of significance it the PGCTLTIMEOUT change, which follows up on some changes in core code. The new code defaults this setting to 120, but it can be set in the config file. If you don't need any of the other changes (most people probably won't) then just adding something like

PGCTLTIMEOUT => '120',

to the build_env stanza of your config file should do the trick, and you can happily wait for the next release.

Monday, January 25, 2016

Fixed length record files

I've had a few requests recently for support for the Fixed Length File FDW that I created as an experiment about 5 years ago. Apparently people are still getting data from COBOL programs or some such source. The code has bitrotted some in the meanwhile, so I'll try to grab a little time to update it, add some test cases and docco, etc.