Monday, September 8, 2014

PLV8 and harmony scoping

The other day I complained on the PostgreSQL hackers list about a couple of aspects of Javascript that make it quite bothersome for large scale programming, namely the old style variable scoping rules and the very limited string literals, which can't stretch across lines (don't mention the awful backslash trick, please) and don't have any provision for interpolation. If you're used as I am to Perl, which has had lexically scoped variables for about 20 years and awsome string literal support for a good deal longer than that, these things are really quite painful.

The good news if that the forthcoming standard, ECMAScript6, also known as "harmony", contains features to deal with both of these issues.

The latest versions of the V8 engine actually support harmony scoping rules, with one restriction, namely that it's only supported in contexts that are in "strict" mode. I believe that this restriction will go away in due course.

Petr Jelinek dropped me a note that other day to tell me how to set V8 flags, and based in that I have developed a patch for PLV8 that allows for harmony scoping. It requires a new GUC setting that is applied during the module's initialization code.

This is available in my clone of the plv8 code, and you can see what it's doing at https://code.google.com/r/amdunstan-harmony-scoping/source/detail?r=8acdcdabcd0c2b9ad99f66a5258920db805efdc3#

I'll do a bit more testing and then talk to the other PLV8 developers about merging it in.

Things are less rosy on the string handling front, I'm afraid. I have no idea when V8 will get the "template strings" feature that will address the string literal deficiencies. As far as I can tell nobody is working on it.

Friday, August 22, 2014

Hoist on my own PLV8 petard

I mentioned to a client that I was going to write a function they had asked me to write using PLPerl, because it has good dynamic string handling capabilities that make it a better fit for this particular task than PLPgsql. "Oh," said the client, "we don't do much Perl. But ever since you hooked us up with PLV8 we do everything in JavaScript. We all know that, and use it every day." So I'm now writing it in Javascript. Proof once again that no good deed goes unpunished. It remains to be seen if it's going to be quite such a good fit as Perl would be, but at least it will give me a thorough refresher in JS.

Thursday, August 21, 2014

Looking for emacs mixed mode editing for functions

I've been looking for a way to edit mixed mode files in emacs, so I can edit a plperl function, and have the create statement part (and everything except the body) use standard SQL mode and the body use CPerl mode or whatever mode some magic marker tells it to use. I've taken a bit of a look at mmm-mode, but haven't been able to get it to work, and don't have more time to spend on it. If someone has a good recipe for this please let me know.

Tuesday, August 19, 2014

New PostgreSQL buildfarm client release 4.14 - bug fix for MSVC

There is a new release - version 4.14 - of the buildfarm client, now available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_14.tgz

The only change of note is that a bug which only affects MSVC clients (such that the client will not complete a run) and is present in releases 4.12 and 4.13 is fixed. Clients on other platforms do not need to upgrade.

Monday, July 21, 2014

Code size

Someone was just talking about the size of some source files in PostgreSQL. The source code (.c, .h, .y and .l files) weighs in at a bit over 1 million lines of code. The documentation source has another roughly 300,000 lines. That's a large project, but by no means enormous by today's standards. The biggest source code file is pg_dump.c, at around 15,700 lines. The biggest documentation file is funcs.sgml, at around 17,600 lines. Both of these might well do with a bit of reorganization.

Wednesday, June 18, 2014

Buildfarm Client version 4.13 released

I have released version 4.13 of the PostgreSQL Buildfarm client.

This can be downloaded from http://www.pgbuildfarm.org/downloads/releases/build-farm-4_13.tgz

Changes in this release (from the git log):
  • fcc182b Don't run TestCollateLinuxUTF8 on unsupported branches.
  • 273af50 Don't run FileTextArrayFDW tests if not wanted.
  • 9944a4a Don't remove the ccache directory on failure, unless configured to.
  • 73e4187 Make ccache and vpath builds play nicely together.
  • 9ff8c22 Work around path idiocy in msysGit.
  • 0948ac7 revert naming change for log files
  • ca68525 Exclude ecpg/tests from find_typedefs code.

If you are using ccache, please note that there are adjustments to the recommended use pattern. The sample config file no longer suggests that the ccache directory have the branch name at the end. It is now recommended that you use a single cache for all branches for a particular member. To do this remove "/$branch" from the relevant line in your config file, if you have it, and remove those directories in the cache root. Your first run on each branch will rebuild some or all of the cache. My unifoed cache on crake is now at 615 Mb, rather than the multiples of Gb it had been previously.

It is recommended that this release be deployed by all users fairly quickly because of the fix in log file names that was discarding some that were quite important.

Sunday, June 15, 2014

Sunday, June 8, 2014

buildfarm vs vpath vs ccache

I think we've got more or less to the bottom of the ccache mystery I wrote about recently. It turns out that the problem of close to 100% of cache misses occurs only when the buildfarm is doing a vpath build, and then only because the buildfarm script sets up a build directory that is different each run ("pgsql.$$"). There is actually no need for this. The locking code makes sure that we can't collide with ourselves, so a hardcoded name would do just as well. This was simple an easy choice I made, I suspect without much thought, 10 years ago or so, before the buildfarm even supported vpath builds.

It also turns out there is no great point in keeping a separate cache per branch. That was a bit of a thinko on my part.

So, in my lab machine ("crake") I have made these changes: the build directory is hard coded with a ".build" suffix rather than using the PID. And it keeps a single cache, not one per branch. After making these changes, warming the new cache, and zeroing the stats, I did fresh builds on each branch. Here's what the stats looked like (cache compression is turned on):
cache directory                     ccache
cache hit (direct)                  5988
cache hit (preprocessed)             132
cache miss                             0
called for link                     1007
called for preprocessing             316
compile failed                       185
preprocessor error                    69
bad compiler arguments                 6
autoconf compile/link                737
no input file                         25
files in cache                     12201
cache size                         179.8 Mbytes
max cache size                       1.0 Gbytes

So I will probably limit this cache to, say, 300MB or so. That will be a whole lot better than the gigabytes I was using previously.

As for the benefits: on HEAD "make -j 4" now runs in 13 seconds on crake, as opposed to 90 seconds or more previously.

If we have a unified cache, it makes sense to disable the removal of the cache in failure cases, which is what started me looking at all this. We will just need to be a bit vigilant about failures, as many years ago there was at least some suspicion of persistent failures due to ccache problems.

All this should be coming to a buildfarm release soon, after I have let this test setup run for a week or so.

Friday, June 6, 2014

ccache mysteries

ccache is a nifty little utility for speeding up compilations by caching results. It's something we have long had support for in the buildfarm.

Tom Lane pinged me a couple of days ago about why, when there's a build failure, we remove the ccache. The answer is that a long time ago (about 8 years), we had some failures that weren't completely explained but where suspicion arose that ccache was returning stale compilations when it shouldn't have been. I didn't have a smoking gun then, and I certainly don't have one now. Eight years ago we just used this rather elephant-gun approach and moved on.

But Now I've started looking at the whole use of ccache. And the thing I find most surprising is that the hit rate is so low. Here, for example, are the stats from my FreeBSD animal nightjar, after a week since a failure:

cache directory                     HEAD
cache hit (direct)                  2540
cache hit (preprocessed)              45
cache miss                         32781
called for link                     5571
called for preprocessing            1532
compile failed                       899
preprocessor error                   248
bad compiler arguments                31
autoconf compile/link               3990
no input file                        155
files in cache                     25114
cache size                         940.9 Mbytes
max cache size                       1.0 Gbytes
So I'm a bit puzzled. Most changes that trigger a build leave most of the files intact. Surely we should have a higher hit rate than 7.3%. If that's the best we can do It seems like there is little value in using ccache for the buildfarm. If it's not the best we can do I need to find out what I need to change to get that best. But so far nothing stands out.

Tom also complained that we keep a separate cache per branch. The original theory was that we would be trading disk space for a higher hit rate, but that seems less tenable now, with some hindsight.

Friday, May 23, 2014

Pgcon

The jsquery stuff from Oleg and Teodor looks awesome. I will be exploring it very soon. Meanwhile, here are my conference slides: http://www.slideshare.net/amdunstan/94json where I cover mostly 9.4 Json features that aren't about indexing.

This has been one of the better pgcons. Well done Dan and the rest of the team.

Friday, May 2, 2014

pgbouncer enhancements

A couple of customers have recently asked for enhancements of pgbouncer, and I have provided them.

One that's been working for a while now, puts the address and port of the actual client (i.e. the program that connects to the proxy) into the session's application_name setting. That means that if you want to see where the client is that's running some query that's gone rogue, it's no longer hidden from you by the fact that all connections appear to be coming from the pgbouncer host.You can see it appearing in places like pg_stat_activity.

It only works when a client connects, so if the client itself sets application_name then the setting gets overridden. But few clients do this, and the original requester has found it useful. I've submitted this to the upsteam repo, as can be seen at  https://github.com/markokr/pgbouncer-dev/pull/23.

The other enhancement is the ability to include files in the config file. This actually involves a modification to the library pgbouncer uses as a git submodule, libusual. With this enhancement, a line that has "%include filename" causes the contents of that file to be included in place of the directive. Includes can be nested up to 10 deep. The pull request for this is at  https://github.com/markokr/libusual/pull/7. This one too sems to be working happily at the client's site.

There is one more enhancement on the horizon, which involves adding in host based authentication control similar to that used by Postgres. That's a rather larger bit of work, but I hope to get to it in the next month or two.

Saturday, April 5, 2014

Version 4.12 of the PostgreSQL Buildfarm client released.

I have released version 4.12 of the buildfarm client.

In addition to numerous bug fixes, it has the following:

  • the global option branches_to_build can now be 'HEADPLUSLATESTn' for any single digit n
  • there is a new module TestCollateLinuxUTF8
  • there is a new module TestDecoding which is enabled by default, (but does nothing on MSVC systems, where we can't yet run these tests.) This runs the new contrib test_decoding module, which can't run under "make installcheck".
  • running "perl -cw" on the scripts will now give you failures for missing perl modules on almost every platform. The only exception should now be on older Msys systems.
  • improvements in the sample config file to make it better organized and better reflecting of best practice.
  • find_typdefs is now supported on OSX

In addition I recently enhanced the HOWTO at http://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto covering especially best current git practice.

Thanks to Tom Lane for suggestions and complaints which are behind a number of the improvements and fixes, and for some code for OSX find_typedefs.

The release is available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_12.tgz