Wednesday, May 20, 2015

Placeholder blues

There has been some debate going on on the hackers list about problems certain drivers have with operators that contain the ? character. Since drivers such as the JDBC driver consider any such character outside of quote marks to be a parameter placeholder, its use as part of an operator makes life difficult for them. Part of the problem has been that we allow almost every available non-alphanumeric ASCII character to be a part of an operator. Why, for example, do we allow `? I've never seen it that I can recall, but the fact means we can't use it for something else (such as marking an operator). That makes doing anything about the problem hard. Some people want to deprecate the use of ? in operators. I think it's far too late for that - its use is far too widespread.

Sunday, May 17, 2015

Yet another data recovery job

One of the things I like about data recovery jobs is that each one is different. Right now I'm wrestling with this problem: how can I recover whatever is possible from a backup that's missing a handful of xlog files. We know this will result in a probably corrupt database, but we still want to see what data is available.

My first two approaches didn't work too well. I tried mangling the control file, and creating dummy xlog files to replace the handful that are missing.

Right now I'm trying to follow a suggestion from Andres Freund to drive the process via the debugger, moving it on to the next available log file when it gets to where it would want the first missing log file.  The trouble is it's hard to find what to set exactly and where to set it (this is Postgres 9.3). Every time I try something I get errors like "unexpected page_addr". If I were more familiar with this section of the code it might be easier, but it's something I've pretty much never dabbled in.

It all makes for an exciting life.

Friday, May 15, 2015

Set types

Every so often I've wanted a SET metatype in Postgres.

We do have bit fields, which can be used as sets with a little work, and this is how MySQL does it's SET types as I understand it. But what if you want a set of enums (our enums can have very large lists of values), or a set of text values, or of integers? Bit fields won't work for these - we would need some sort of sparse representation.

One obvious way would be to store the set as a one-dimensional array, kept in sorted order for efficient searching. For text values, we could probably use some smart form of sorting such as is used by jsonb and hstore to store object keys, to reduce the need for string comparisons.

Obviously, I have done no work on this, but it seems like it would be a worthwhile project.

In the meantime, you can use hstore more or less in this way. Just treat it as a mapping from whatever values you have to 1.  It's a bit of a hack, but it should work. You'll just need to cast your values to text.

Wednesday, May 13, 2015

New jsonb features for 9.5.

Yesterday I committed some new features for jsonb. These are:
  • jsonb_pretty(jsonb) produces nicely indented json output.
  • jsonb || jsonb concatenates two jsonb values.
  • jsonb - text removes a key and its associated value, or a matching array element, from the json
  • jsonb - int removes the designated array element
  • jsonb - text[] removes a key and associated value or array element at the designated path
  • jsonb_replace(jsonb,text[],jsonb) replaces the array element designated by the path or the value associated with the key designated by the path with the given value.
This was based Dmitry Dolgov's jsonbx extension for 9.4, which in turn was based on earlier work by Oleg Bartunov, Teodor Sigaev and Alexander Korotkov on nested hstore. Petr Jelinek also did some useful review and tidy up work on the core features. So, a good team effort.

Tuesday, May 5, 2015

Reminder: checking Windows builds is incredibly easy.

Most PostgreSQL developers don't use Windows, and many fairly openly regard the Windows port as a nuisance. But we've had it for more than a decade, now, and it's long since time we got over that. And checking that at least your changes compile is easy. It doesn't need a Windows box at all - Postgres builds for Windows quite happily via a cross-compiler. Here's exactly what I did on a Fedora 20 machine this morning:
sudo yum install mingw64-gcc mingw64-zlib
git clone git://git.postgresql.org/git/postgresql.git
cd postgresql
./configure --host=x86_64-w64-mingw32
make world
And it just worked. I tested it on Windows, and it might even be possible to test it using WINE, but even without testing it this procedure is useful to see if there are compilation errors.

I assume all this could be (and possibly has been) built for other platforms such as OSX and FreeBSD. If not, it might be worth putting together some packages for those platforms - it shouldn't be too hard.

Saturday, April 18, 2015

Buildfarm bug fix for MSVC builds

Unfortunately there was a small bug in yesterday's buildfarm client release. The bug only affects MSVC builds, which would fail silently on the HEAD (master) branch.

There is a bug fix release available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_15_1.tgz or you can just pick up the fixed version of run_build.pl (the only thing changed) at https://raw.githubusercontent.com/PGBuildFarm/client-code/b80efc68c35ef8a1ced37b57b3d19a98b8ae5dd2/run_build.pl

Sorry for the inconvenience.

Friday, April 17, 2015

New PostgreSQL Buildfarm Client Release

I have just released version 4.15 of the PostgreSQL Buildfarm Client. Here's what's changed:
  • support the new location for pg_upgrade
  • support running tests of client programs
  • support building, installing and running testmodules
  • use a default ccache directory
  • improve logging when running pg_upgrade tests
  • handle odd location of Python3 regression tests
  • add timestamp to default log_line_prefix
  • make qw() errors in the config file fatal (helps detect errors)
  • minor bug fixes for web script settings.
  • allow for using linked git directories in non-master branches
The last item might need a little explanation.  Essentially this can reduce quite dramatically the amount of space required if you are building on more than one branch. Instead of keeping, say, 6 checked out repos for the current six tested branches, we keep one and link all the others to it. This works almost exactly the way git-new-workdir does (I stole the logic from there). This doesn't work in a couple of situations: if you are using Windows or if you are using git-reference. In these cases the new setting is simply ignored.

To enable this new setting in an existing installation, do the following after installing the new release:
  • in your config file, add this setting:
    git_use_workdirs => 1,
  • remove the pgsql directory in each branch directory other than HEAD
Another good thing to do in existing installations would be to add "%m" to the beginning of the log_line_prefix setting in extra_config stanza.

Enjoy!

Wednesday, April 15, 2015

Hot Fix for buildfarm client, currently broken by pg_upgrade move

Yesterday the pg_upgrade program was moved from contrib to bin in the source tree. Unfortunately this broke most of those buildfarm members which check pg_upgrade. There is a hot fix for the TestUpgrade buildfarm module that can be downloaded from github. I will work on cutting a new buildfarm release in the next few days, but this file can just be dropped in place on existing installations.

Tuesday, April 7, 2015

Dynamically disabling triggers without locks

Recently Simon Riggs committed a patch by himself and Andreas Karlsson to reduce the lock strength required by certain ALTER TABLE commands, including those to enable or disable triggers. Now the lock level required is SHARE ROW EXCLUSIVE instead of ACCESS EXCLUSIVE. That means it doesn't block SELECT commands any more, and isn't blocked by them, although it will still block and be blocked by INSERT, UPDATE and DELETE operations. Very nice.

However, without formally disabling a trigger you can tell it dynamically not to do anything in the current session without taking any locks at all. Here's a little bit of PLpgsql code I wrote recently for this sort of operation in an INSERT trigger:
    begin
        disabled := current_setting('mypackage.foo_trigger_disabled');
    exception
        when others then disabled := 'false';
    end;
    if disabled = 'true' then
       return NEW;
    end if;
Note that this will only block the trigger from doing anything in sessions where this variable is set. But that's often exactly what you want. In the case this was written for, the trigger is redundant (and expensive) for certain bulk operations, but required for normal operations.  So in a session where we are performing the bulk operation, we can simply set this and avoid taking out a heavy lock on the table, and do this instead, before running our bulk operation:
    set mypackage.foo_trigger_disabled = 'true';
The code above is a bit ugly because of the requirement for the exception handler. There's a cure for that coming, too. David Christensen has submitted a patch to provide a form of current_setting() which will return NULL for unset variables instead of raising an exception.

Note, too, that you could use a value in a one-row one-column table if you wanted something that could apply in all sessions, not just the current session. It would be a bit less efficient, though. This mechanism is pretty light-weight.

Monday, April 6, 2015

Fun with Raspberry Pi 2 and the buildfarm

Here's a picture of my two Raspberry Pi 2 boxes, both running headless and wireless.


One is running Raspbian, installed via NOOBS, and the other Fidora, a remix of Fedora 21 for Raspberry Pi 2. It turned out that Pidora doesn't work on the Raspberry Pi 2, a fact that is extremely well hidden on the Raspberry Pi web site.

I have set up test buildfarm animals on both of these. But something odd is happening. They are both getting intermittent failures of the stats regression test. Sometimes it happens during "make check", sometimes during "make installcheck" and sometimes during testing of pg_upgrade (which in turn runs "make installcheck").

These machines are not anything like speed demons. Far from it. But we also run other slow machines without getting this happening all the time. So I'm a bit perplexed about what might be going on.

Incidentally, if you want to play with one of these, I do recommend getting a starter kit from Amazon or elsewhere. It's probably cheaper than buying everything separately, and gets you everything you need to get started. Well worth the $69.99.

Monday, March 30, 2015

Testing patches with a couple of commands using a buildfarm animal

I've blogged before about how the buildfarm client software can be useful for developers amd reviewers. Yesterday was a perfect example. I was testing a set of patches for a bug fix for pg_upgrade running on Windows, and they go all the way back to the 9.0 release. The simplest way to test these was using a buildfarm animal. On jacana, I applied the relevant patch in each branch repo, and then simply did this to build and test them all:

for f in root/[RH]* ; do 
  br=`basename $f`
  perl ./run_build.pl --from-source=`pwd`/$f/pgsql --config=jacana.conf --verbose $br
done

After it was all done and everything worked, I cleaned up the git repositories so they were ready for more buildfarm runs:

for f in root/[RH]* ; do 
  pushd $f/pgsql
  git reset --hard
  git clean -dfxq
  popd
done

Pretty simple! The commands are shown here on multiple lines for clarity, but in fact I wrote each set on one line, so after applying the patches the whole thing took 2 lines. (Because jacana only builds back to release 9.2, I had to repeat the process on frogmouth for 9.0 and 9.1, using the same process).

Wednesday, March 4, 2015

new pg_partman release

Keith Fiske's pg_partman is a pretty nice tool for managing partitions of tables. I've recommended it recently to a couple of clients, and it's working well for them.

Based on that I have made a couple of suggestions for improvement, and today he's made a release including one of them. Previously, the script to rebuild the child indexes essentially removed them all and rebuilt them. Now it only removes those that are no longer on the parent table, and only adds those that are on the parent but not on the child, so if you just add or delete one index on the parent that's all that gets done on the children too.

I'm happy to say that he's also working on my other, more significant suggestion, which is to have a hybrid partitioning mode where the trigger has static inserts for the likely common tables and dynamic inserts for the rest. That will mean you don't have to make nasty choices between flexibility and speed. I'm looking forward to it.