Saturday, April 18, 2015

Buildfarm bug fix for MSVC builds

Unfortunately there was a small bug in yesterday's buildfarm client release. The bug only affects MSVC builds, which would fail silently on the HEAD (master) branch.

There is a bug fix release available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_15_1.tgz or you can just pick up the fixed version of run_build.pl (the only thing changed) at https://raw.githubusercontent.com/PGBuildFarm/client-code/b80efc68c35ef8a1ced37b57b3d19a98b8ae5dd2/run_build.pl

Sorry for the inconvenience.

Friday, April 17, 2015

New PostgreSQL Buildfarm Client Release

I have just released version 4.15 of the PostgreSQL Buildfarm Client. Here's what's changed:
  • support the new location for pg_upgrade
  • support running tests of client programs
  • support building, installing and running testmodules
  • use a default ccache directory
  • improve logging when running pg_upgrade tests
  • handle odd location of Python3 regression tests
  • add timestamp to default log_line_prefix
  • make qw() errors in the config file fatal (helps detect errors)
  • minor bug fixes for web script settings.
  • allow for using linked git directories in non-master branches
The last item might need a little explanation.  Essentially this can reduce quite dramatically the amount of space required if you are building on more than one branch. Instead of keeping, say, 6 checked out repos for the current six tested branches, we keep one and link all the others to it. This works almost exactly the way git-new-workdir does (I stole the logic from there). This doesn't work in a couple of situations: if you are using Windows or if you are using git-reference. In these cases the new setting is simply ignored.

To enable this new setting in an existing installation, do the following after installing the new release:
  • in your config file, add this setting:
    git_use_workdirs => 1,
  • remove the pgsql directory in each branch directory other than HEAD
Another good thing to do in existing installations would be to add "%m" to the beginning of the log_line_prefix setting in extra_config stanza.

Enjoy!

Wednesday, April 15, 2015

Hot Fix for buildfarm client, currently broken by pg_upgrade move

Yesterday the pg_upgrade program was moved from contrib to bin in the source tree. Unfortunately this broke most of those buildfarm members which check pg_upgrade. There is a hot fix for the TestUpgrade buildfarm module that can be downloaded from github. I will work on cutting a new buildfarm release in the next few days, but this file can just be dropped in place on existing installations.

Tuesday, April 7, 2015

Dynamically disabling triggers without locks

Recently Simon Riggs committed a patch by himself and Andreas Karlsson to reduce the lock strength required by certain ALTER TABLE commands, including those to enable or disable triggers. Now the lock level required is SHARE ROW EXCLUSIVE instead of ACCESS EXCLUSIVE. That means it doesn't block SELECT commands any more, and isn't blocked by them, although it will still block and be blocked by INSERT, UPDATE and DELETE operations. Very nice.

However, without formally disabling a trigger you can tell it dynamically not to do anything in the current session without taking any locks at all. Here's a little bit of PLpgsql code I wrote recently for this sort of operation in an INSERT trigger:
    begin
        disabled := current_setting('mypackage.foo_trigger_disabled');
    exception
        when others then disabled := 'false';
    end;
    if disabled = 'true' then
       return NEW;
    end if;
Note that this will only block the trigger from doing anything in sessions where this variable is set. But that's often exactly what you want. In the case this was written for, the trigger is redundant (and expensive) for certain bulk operations, but required for normal operations.  So in a session where we are performing the bulk operation, we can simply set this and avoid taking out a heavy lock on the table, and do this instead, before running our bulk operation:
    set mypackage.foo_trigger_disabled = 'true';
The code above is a bit ugly because of the requirement for the exception handler. There's a cure for that coming, too. David Christensen has submitted a patch to provide a form of current_setting() which will return NULL for unset variables instead of raising an exception.

Note, too, that you could use a value in a one-row one-column table if you wanted something that could apply in all sessions, not just the current session. It would be a bit less efficient, though. This mechanism is pretty light-weight.

Monday, April 6, 2015

Fun with Raspberry Pi 2 and the buildfarm

Here's a picture of my two Raspberry Pi 2 boxes, both running headless and wireless.


One is running Raspbian, installed via NOOBS, and the other Fidora, a remix of Fedora 21 for Raspberry Pi 2. It turned out that Pidora doesn't work on the Raspberry Pi 2, a fact that is extremely well hidden on the Raspberry Pi web site.

I have set up test buildfarm animals on both of these. But something odd is happening. They are both getting intermittent failures of the stats regression test. Sometimes it happens during "make check", sometimes during "make installcheck" and sometimes during testing of pg_upgrade (which in turn runs "make installcheck").

These machines are not anything like speed demons. Far from it. But we also run other slow machines without getting this happening all the time. So I'm a bit perplexed about what might be going on.

Incidentally, if you want to play with one of these, I do recommend getting a starter kit from Amazon or elsewhere. It's probably cheaper than buying everything separately, and gets you everything you need to get started. Well worth the $69.99.

Monday, March 30, 2015

Testing patches with a couple of commands using a buildfarm animal

I've blogged before about how the buildfarm client software can be useful for developers amd reviewers. Yesterday was a perfect example. I was testing a set of patches for a bug fix for pg_upgrade running on Windows, and they go all the way back to the 9.0 release. The simplest way to test these was using a buildfarm animal. On jacana, I applied the relevant patch in each branch repo, and then simply did this to build and test them all:

for f in root/[RH]* ; do 
  br=`basename $f`
  perl ./run_build.pl --from-source=`pwd`/$f/pgsql --config=jacana.conf --verbose $br
done

After it was all done and everything worked, I cleaned up the git repositories so they were ready for more buildfarm runs:

for f in root/[RH]* ; do 
  pushd $f/pgsql
  git reset --hard
  git clean -dfxq
  popd
done

Pretty simple! The commands are shown here on multiple lines for clarity, but in fact I wrote each set on one line, so after applying the patches the whole thing took 2 lines. (Because jacana only builds back to release 9.2, I had to repeat the process on frogmouth for 9.0 and 9.1, using the same process).

Wednesday, March 4, 2015

new pg_partman release

Keith Fiske's pg_partman is a pretty nice tool for managing partitions of tables. I've recommended it recently to a couple of clients, and it's working well for them.

Based on that I have made a couple of suggestions for improvement, and today he's made a release including one of them. Previously, the script to rebuild the child indexes essentially removed them all and rebuilt them. Now it only removes those that are no longer on the parent table, and only adds those that are on the parent but not on the child, so if you just add or delete one index on the parent that's all that gets done on the children too.

I'm happy to say that he's also working on my other, more significant suggestion, which is to have a hybrid partitioning mode where the trigger has static inserts for the likely common tables and dynamic inserts for the rest. That will mean you don't have to make nasty choices between flexibility and speed. I'm looking forward to it.

Wednesday, February 25, 2015

Stopping expensive queries before they start

Today we're releasing a code for a small PostgreSQL module called plan_filter that lets you stop queries from even starting if they meet certain criteria. Currently the module implements one such criterion: the estimated cost of the query.

After you have built and installed it, you add a couple of settings to the postgresql.conf file, like this:
shared_preload_libraries = 'plan_filter'
plan_filter.statement_cost_limit = 100000.0
Then if the planner estimates the cost as higher than the statement_cost_limit it will raise an error rather than allowing the query to run.

This module follows an idea from a discussion on the postgresql-hackers mailing list some time ago. It was developed by PostgreSQL Experts Inc for our client Twitch.TV, who have generously allowed us to make it publicly available.

Raspberry Pi 2 coming to the buildfarm

Yesterday I ordered a Raspberry Pi 2 Model B, and it should be delivered in a few days. I'm intending to set it up as a buildfarm member. The fact that you can purchase a general purpose computer the size of a credit card with  a quad-core processor and 1Gb of memory (I remember when RAM was counted in kilobytes) and all for USD35.00 is amazing, even when you remember Moore's Law.

Saturday, February 14, 2015

Statistics and ordering operations on JSON fields

The JSON type is not provided with any comparison operators built in, unlike the new JSONB type. One reason for this is that it's not really clear how to do the comparison. In particular, comparing the text values seems wrong, because the white space should not matter, and arguably the order of object keys should not matter either. JSONB doesn't have these problems because it dissolves all the white space and stores object keys in a canonical order, so we have come up with sane if slightly counter-intuitive comparison operations.

This limitation on JSON is somewhat irksome, however. It restricts you from doing some operations on JSON such as DISTINCT, GROUP BY, and ORDER BY.

Another issue is that it causes ANALYZE not to create any rows at all in pg_statistic for JSON columns, so even if all you want to know is the average column width, you can't find it. That makes doing things like measuring table bloat just about impossible.

If you have PLV8 available, you can create operators that work fairly sanely on JSON and that let you generate stats, use DISTINCT etc. The steps are outlined here.

But what if you don't want to load PLV8 just for this? Or what of you can't, like say on a managed service that doesn't provide it? All is not lost.  Here is a version which uses text comparison instead of a PLV8 function.  That means you don't need to have PLV8 loaded. Since it uses text comparison, it is subject to the caveats mentioned about about white space and object keys. But it will work, and you will see rows for the column in pg_statistic. Just be careful using ordering operations or creating indexes, as the results, while consistent, might be surprising.

Monday, February 9, 2015

Moving PLV8 to ECMAScript6

Taras Mitran has just published a blog post about using some of the advanced features of ECMAScript version 6 to make programming with SQL in JavaScript nicer.

He notes that Template Strings allow for multiline strings, which is in itself a significant advance, and that you can also have Tagged Template Strings which can transform the Template String automagically in many interesting ways. His example turns a Template String with embedded variables into a preparable query with a list of parameters. It's very nifty, quite clever in fact.

The other feature that I would really like is proper lexically scoped variables. Perl got these donkeys years ago, and their absence this long in JavaScript has been reprehensible. They are provided for in ES 6.

My understanding is that to get these features we need to provide for PLV8 to build against the latest version of V8 (and possibly provide some initialization flags too.) Unfortunately, the V8 API seems to have changed significantly since the 3.14.5.10 that's available on my Fedora 20 workstation, so enabling it to build with, say, V8 version 4.1.0.14, which is what io.js is using, will take quite a bit of work, probably by someone whose C++-fu is greater than mine.