Saturday, November 21, 2015

Announcing Release 4.16 of the PostgreSQL Buildfarm client

I have just released version 4.16 of the PostgreSQL Buildfarm client

It can be downloaded from

Several bugs have been fixed, and there are these non-trivial changes:

  • capturing of TAP test logs
  • bin check enabled on Windows
  • rm_worktrees feature
  • an experimental module to run Sepgsql tests
  • try database shutdown following a start failure
  • report all PGBuild module versions to server

rm_worktrees is enabled by putting this in your config file:

rm_worktrees => 1,

The effect is that at the end of a run the checked out work tree is removed, leaving just the git repository, which in turn might be mostly linked to your HEAD branch if you use the git_use_workdirs option. Before a run, the work tree will be checked out again. The net effect is a significant saving in space used. with these two options, the additional space for each branch except when it's actually building is reduced to less than 30Mb. On crake, the git mirror, branch trees and cache now come in at about 1.5Gb. That's a lot less than it used to be. The additional cost of checking out the worktree each time is very modest.

Shutdown after a start failure tries to remedy a situation where we try to start the server, and don't detect that it has started, but it has in fact started. So now if we get a failure we try to shut down any server that might have started, just in case. This change is possibly redundant given the recent change where postgres detects that its data directory has disappeared and shuts down when it has, but it's probably worth having anyway.

Saturday, November 14, 2015

A record with all defaults

Fresh from yesterday's help file:

Say you want to insert a record into your table with all the default values set. Later on you'll update some of the values. You might want to do this so you can get the row's ID field. So you try this:
insert into mytable values () returning my_id;
but you find it doesn't work. VALUES isn't allowed to have an empty list. Here's what you do:
insert into mytable values (default) returning my_id;
This will insert the default value explicitly into first column in the table, and implicitly into all the other columns. It doesn't matter if any of the columns doesn't have an explicit default value set - in that case the default is NULL. Of course, that means this won't work if you have NOT NULL columns without a default set.

Sunday, November 8, 2015

Latest adventures with a buildfarm animal on Raspberry Pi 2

My buildfarm member axolotl, that runs (or ran) on a Raspberry Pi 2 running a Fedora 21 remix, died the other day, apparently with a corrupted file system.

While I'm trying to revive it, I thought I would try running the stock Fedora spin built for ARM, instead of a remix. This is pretty easy to do using the script found here. It looks like this is a bad idea, though. It's running so slowly just doing a "dnf update" that I wouldn't think of running a buildfarm member on it.

So, if I can't resuscitate the file system on the other SD card, I'll try a respin with a custom kernel again. There is someone maintaining a set here, which you can read about on the Raspberry Pi forums.


fsck on the root partition managed to resuscitate it without apparent problems, So in the end I haven't had to jump thought more hoops.

Tuesday, November 3, 2015

Keeping config files under version control - reprise

I recently blogged about the virtue of putting PostgreSQL config files under version control. My attention was drawn to etckeeper, and a day or so ago I started playing with it, and it seems pretty nice. If you keep your configs in /etc like Debian/Ubuntu, then just deploying etckeeper is probably all you need to do. But if you are using Fedora, Redhat, Centos, or anything where the config files aren't in /etc, it's a little bit harder. But not too much. I found a nice little script at serverfault that you can put in etckeeper's commit.d directory that seems to do the trick and happily pulls files outside /etc into the repository. This seems like quite a good solution.

Friday, October 23, 2015

A little permissions problem

So you're on Fedora 22 and you want to run the latest stable PostgreSQL release. You install the PGDG repository package and then install postgresql94-server. You decide to test it out first, so as your normal login user you run initdb and then try to start the server using pg_ctl, something you've done a hundred times before. It should "just work," right? Not this time. You examine the log file and you see it complaining like this:
could not create lock file "/var/run/postgresql/.s.PGSQL.5432.lock": Permission denied
Ouch! I've never needed that before. These files have always gone in /tmp. OK, so why is it different now we are putting them elsewhere? It turns out that the directory has been created by the RPM with permissions different from /tmp, and only the postgres user can write there.

My solution (which I hope our RPM maintainers adopt) was to change the permissions on the directory like this, making them the same as those of /tmp:
sudo chmod 1777 /var/run/postgresql
After that when I try to start the database everything is as expected.

Friday, October 16, 2015

How to find what's causing that error

Tom Lane just posted a terrific piece of advice about how to find what's causing an error you don't know the source of, which is worth quoting:
What I would recommend is that you get the data onto a non-production
machine where you can play around a bit more. One thing you could do
then is run a build with debug symbols, attach to the backend process
with gdb, and set a breakpoint at "errfinish". Then provoke the error,
and capture a backtrace from the call to errfinish. That would greatly
narrow things down, though it might not be enough to isolate the bug
I normally like to have debuginfo packages installed even on production machines so I can do this sort of thing quickly.

Another thing you can try in a non-production environment is to run your test case against a build with assertions enabled and see if it trips an assertion failure. That's often a quick way of finding where problems are occurring.

Thursday, October 8, 2015

Testing on Fedora

One of the things that is nice about Fedora is how easy it is to test something very quickly. They provide a considerable number of cloud images suitable for spinning up instances fast. In my case I have a machine with Vagrant and VirtualBox installed, and when we got a complaint recently about PostgreSQL misbehaving with some glibc locales in fedora 23, I wanted to test it. After downloading the image I wanted I was able to get running with this:

vagrant box add --name=f23
mkdir f23
cd f23
vagrant init f23
vagrant ssh

It took me about 5 minutes from the time I discovered the image I wanted to test to starting on setting up to build PostgreSQL on it, and most of that was download time.

Redis FDW doesn't suport Redis Cluster

Redis Cluster is a way of sharding data across a group of federated Redis nodes. I was asked recently if this is supported by the Redis Foreign Data Wrapper. Unfortunately, the answer is "No." The Redis site coyly states that "At this stage one of the problems with Redis Cluster is the lack of client libraries implementations." What they don't mention is that their own C client library, hiredis, has no support for it at all, happily reporting back redirects to the client rather than following them. It's rather as if a web client library refused to follow an HTTP redirect. The Redis FDW is based on hiredis, so as the library doesn't support Redis Cluster, neither do we. Of course, at the expense of very large amounts of code we could make the FDW handle it, but this seems quite wrong architecturally. The FDW should not have to know or care that the Redis server it's connected to is sharded.

Friday, October 2, 2015

Keep your configs under version control

A conversation yesterday with a client: "What changes were made in the PostgreSQL configs?" "I don't recall exactly." This is why you should keep all the configs under version control, and have a regular monitoring check that the controlled version is the same as the operating version, so if someone makes a manual change without checking it in the monitoring alarm will go off.

If you keep your config in the data directory, in the default PostgreSQL style, probably the simplest way to do this is to create a git repository with a symbolic link to the data directory and a .gitignore that ignores everything in the data directory but the config files. The add the config files and you're done. If you keep the config files separately, as Debian does, then you can probably just create a git repository right in the config directory if you don't already have one at a higher level.

I have been guilty of not doing this in the past, but I have come to the strong conclusion that this should be a part of standard best practice.


My attention has been drawn to etckeeper, which makes handling this super simple when you're using Debian-style config setups. It's available both for Debian and for RedHat-ish systems, as well as possibly others.

Tuesday, September 15, 2015

When to use json, and when to use jsonb

As well as being able to store JSON in the database, either as plain text (json type) or as a decomposed binary format (jsonb type) PostgreSQL has quite a number of useful utility functions for creating JSON from non-json data. In many cases, clients don't want to store the json or process it in any way, they just want to get hold of it outside the database. There are subtle differences between the two types - jsonb eliminates whitespace and duplicate object keys, and mangles the order of object keys.

PostgreSQL 9.5 will introduce jsonb equivalents of pretty much all the json building functions, namely json_agg, json_object_agg, json_build_array, json_build_object, json_object and to_json. The exceptions are the legacy functions array_to_json and row_to_json, for which we provided no jsonb equivalents. These are covered by to_jsonb.

If object key order preservation is important to you, or you want to be able to have duplicate keys, then you definitely don't want to be using the jsonb functions. But beyond that, if you don't want to store the json or process it in any way in general the jsonb functions are overkill, and they are definitely slower. A recent benchmark I did for a client had a query using the jsonb functions to be slower by a factor of about 4. When you think about it, this isn't entirely surprising - not only is it more work to turn data into jsonb, but when we're done we have to turn it back into a piece of text to deliver to the client. With json, once you've constructed it it's already text, so there's no extra work.

So don't assume that the jsonb functions are always going to be faster - in many cases they are not. Other things being equal, you will often want to be using the json generating functions.

On the other hand, if you want to store the data for later processing, e.g. by examining the contents of a field inside the json, jsonb is probably the way to go.

And of course, if speed is an issue you need to benchmark you application.

Thursday, August 27, 2015

Quick buildfarm recipe for CentOS and friends

I've been writing a little provisioning script for a vagrant Centos 7 machine that will be a buildfarm client.

Stripped of the vagrant stuff and some stuff that's special for the machine, here is what I did to get it running:

sudo yum install -y wget gcc make flex bison ccache git \
  perl-libwww-perl perl-Digest-SHA perl-devel perl-ExtUtils-Embed \
  zlib-devel openssl-devel readline-devel python-devel tcl-devel \
  libxml2-devel libxslt-devel openldap-devel

mkdir bf
cd bf
mv latest-client.tgz buildfarm-latest-client.tgz
tar -z --strip-components=1 -xf buildfarm-latest-client.tgz 
mkdir root
sed -i "s!build_root =>.*!build_root => '$HERE/root',!" build-farm.conf
At this stage the buildfarm client will run quite successfully, as can be tested using:
perl --test

Monday, August 17, 2015

Quick tips

Found in the last couple of days.

1. If you have more than one active database, and you're using text format logs, make sure you have %d (the database name) in your log line prefix. Not knowing which database the log line comes from can be confusing and makes life more difficult.

2. If you report on a possible database error, report from the Postgres logs, not from your client's logs. We need to see things from the database's point of view. In the case in point, the Postgres logs conatined no relevant errors at all, i.e. this wasn't a Postgres error.

3. Found when trying to resize a data volume on Amazon: certain Linux disk partitioning utilities don't optimally align partitions by default, notably sfdisk and cfdisk. Stay away from those, or make sure you explicitly choose proper alignment for partitions. Programs that do the right thing include fdisk and parted. In general you are probably best to stick to using parted.