Wednesday, December 31, 2014
pg_repack redux
I got an idea the other day about how to get over the recent problem I had encountered with pg_repack colliding with a lock request on the table made while it was copying data, but before it tried to rebuild indexes. After some considerable time with the debugger, I found that the problem is that pg_repack tries to get the index definition only at the time it builds the index, and that pg_get_indexdef() requires an Access Share lock on the table. The end result is that the lock request gets blocked waiting for pg_repack, and pg_repack gets blocked waiting for the lock requester, and there they stay. My initial solution was to move the index definition fetching to just after the place where the table definition is fetched. However, that ran into problems with existing strong locks, because it's before the brief time near the start of pg_repack's run where it obtains a strong lock on the table and cancels anything else holding such a lock. So I then moved it inside the section where it already holds a strong lock on the table. With that change, it now works both when there is an existing strong lock on the table and when there is a subsequent strong lock request on the table during the copy data phase. I've sent a pull request for this fix to the repo.
Friday, December 19, 2014
Waiting for extra statement stats
One of the things I was hoping would make it into PostgreSQL 9.4 and didn't was some extra statistics in pg_stat_statements. Specifically, it was proposed to add minimum, maximum and (after I proposed it and pressed the case) the standard deviation for execution times. Keeping these stats is very cheap, so there should be close to zero performance impact. They will give you a much better idea of the spread of execution times than a simple average, which might be significantly dominated by outlier times. Unfortunately, these got left out for 9.4, which is rather disappointing. I'm going to see if I can revive the proposal for 9.5.
Thursday, December 18, 2014
PostgreSQL 9.4 and beyond
PostgreSQL 9.4 is released today, and once again my main input has been in the JSON area. We've introduced a new spiffy JSON type called "jsonb" that makes performing operations on JSON much, much faster, because it's not stored as plain text, but rather decomposed into the structures and scalar values that go to make up JSON. Much of the credit for this goes to Oleg Bartunov and Teodor Sigaev, the original authors of the hstore module, who created a "nested hstore" module that we eventually based jsonb on. Credit also goes to Alexander Korotkov and Peter Geoghegan, especially for their indexing work, which will make many operations super fast. My role was to make sure that all (or almost all) of the operations that are available for the original json type are also available for the jsonb type.
There have been some benchmark tests that suggest that our jsonb is faster in many respects that other database engines that use json as their native storage.
The nice thing about this is that you can now have a highly performing mixture of traditional relational data and semi-structured treeish data with proper transactional semantics, all in one engine. That's a very attractive proposition for a lot of people looking to design new data stores.
My other JSON contribution was to add a number of functions that allow the construction of arbitrarily complex json documents, with data from the database. This will be particularly useful for clients that expect json in a particular, non-regular shape, and is based on the needs of some actual json-using customers.
Already committed for 9.5, to be released some time next year, are jsonb versions of those json functions we didn't provide jsonb versions of in 9.4, and functions to allow the stripping of object fields with 'null' as the value from json and jsonb values (this can result in much more compact storage in some cases, with no significant semantic loss).
I made a few other contributions to 9.4, but they are not worth mentioning in detail.
I'm now looking for new PostgreSQL projects, not involving JSON. I'm not sure what those should be. For the most part my contributions over the last 12 years have tended not to be in the area that involves the planning and execution of queries, so maybe it's time for me to dive into that pool. But to do what I don't yet know.
There have been some benchmark tests that suggest that our jsonb is faster in many respects that other database engines that use json as their native storage.
The nice thing about this is that you can now have a highly performing mixture of traditional relational data and semi-structured treeish data with proper transactional semantics, all in one engine. That's a very attractive proposition for a lot of people looking to design new data stores.
My other JSON contribution was to add a number of functions that allow the construction of arbitrarily complex json documents, with data from the database. This will be particularly useful for clients that expect json in a particular, non-regular shape, and is based on the needs of some actual json-using customers.
Already committed for 9.5, to be released some time next year, are jsonb versions of those json functions we didn't provide jsonb versions of in 9.4, and functions to allow the stripping of object fields with 'null' as the value from json and jsonb values (this can result in much more compact storage in some cases, with no significant semantic loss).
I made a few other contributions to 9.4, but they are not worth mentioning in detail.
I'm now looking for new PostgreSQL projects, not involving JSON. I'm not sure what those should be. For the most part my contributions over the last 12 years have tended not to be in the area that involves the planning and execution of queries, so maybe it's time for me to dive into that pool. But to do what I don't yet know.
Monday, November 3, 2014
Assignment beats SELECT INTO
While working on some customer code, I noticed that they have a lot of code that reads like this:
But before I tried this out I decided to see how they performed. Here's what happened:
I'm going to dig into why this happens, but meanwhile, I have quite a lot of low hanging performance fruit to pick as a result of this.
I wondered why they were doing it that way, and if it might be easier to read if it was just:SELECT a,b,c INTO foo.x, foo,y, foo.z;
Now, these aren't quite the same, especially if foo has more than three fields. But even that could be got around.foo := (a,b,c);
But before I tried this out I decided to see how they performed. Here's what happened:
andrew=# do $x$ declare r abc; begin for i in 1 .. 10000000 loop select 'a','b',i into r.x,r.y,r.z; end loop; end; $x$; DO Time: 63731.434 ms
That's a very big difference! Direct assignment takes less than 30% of the time that SELECT INTO takes.andrew=# do $x$ declare r abc; begin for i in 1 .. 10000000 loop r := ('a','b',i); end loop; end; $x$; DO Time: 18744.151 ms
I'm going to dig into why this happens, but meanwhile, I have quite a lot of low hanging performance fruit to pick as a result of this.
Tuesday, October 28, 2014
One more time: Replication is no substitute for good backups.
I don't know how many times I have had to try to drum this into clients' heads. Having an up to date replica won't protect you against certain kinds of failures. If you really want to protect your data, you need to use a proper backup solution - preferable a continuous backup solution. The ones I prefer to use are barman and wal-e. Both have strengths and weaknesses, but both are incredibly useful, and fairly well documented and simple to set up. If you're not using one of them, or something similar, your data is at risk.
(In case you haven't guessed, today is another of those days when I'm called in to help someone where the master and the replica are corrupted and the last trusted pg_dump backup is four days old and rolling back to it would cost a world of pain. I like these jobs. They can stretch your ingenuity, and no two are exactly alike. But I'd still rather be paid for something more productive.)
(In case you haven't guessed, today is another of those days when I'm called in to help someone where the master and the replica are corrupted and the last trusted pg_dump backup is four days old and rolling back to it would cost a world of pain. I like these jobs. They can stretch your ingenuity, and no two are exactly alike. But I'd still rather be paid for something more productive.)
Monday, October 6, 2014
pg_repack pitfalls
pg_repack is a terrific tool for allowing you to reorganize a table without needing to hold long running strong locks on the table. That means that that your normal inserts, updates and deletes can continue to run against the table while the reorganization is proceeding.
I have had clients who have run into problems with it, however. In particular, it is possible to get it wedged so that the table is inaccessible and nothing can proceed, unless you either kill the repack operation or kill what is blocking it. Here is a simple example of how to cause problems.
In session 1, do:
and in session 2 in psql do:
The solution is to make sure that nothing holds or even tries to obtain any strong long running locks on the table.
One useful thing is to use the check_postgres.pl monitor script to look for things like long running transactions and processes waiting for locks.
Or you can create a more customized test to look for this exact situation.
Most importantly, you need to be aware that problems can occur, and to protect against them happening in the first place.
I have had clients who have run into problems with it, however. In particular, it is possible to get it wedged so that the table is inaccessible and nothing can proceed, unless you either kill the repack operation or kill what is blocking it. Here is a simple example of how to cause problems.
In session 1, do:
pg_repack -e -t foo dbnameset
and in session 2 in psql do:
The sleep gets us past the time when pg_repack is setting up, and happens while it is is doing its CREATE TABLE ... AS SELECT .... When that CREATE TABLE statement finishes, both sessions will be wedged. Session 2 will be hung because it is unable to lock the table, since pg_repack's other session will hold a weak lock on the table. And nothing, including pg_repack, will be able to do anything with the table.select pg_sleep(10); lock table foo; rollback;
The solution is to make sure that nothing holds or even tries to obtain any strong long running locks on the table.
One useful thing is to use the check_postgres.pl monitor script to look for things like long running transactions and processes waiting for locks.
Or you can create a more customized test to look for this exact situation.
Most importantly, you need to be aware that problems can occur, and to protect against them happening in the first place.
Friday, October 3, 2014
Towards a PostgreSQL Benchfarm
For years I have been wanting to set up a farm of machines, modelled after the buildfarm, that will run some benchmarks and let us see performance regressions. Today I'm publishing some progress on that front, namely a recipe for vagrant to set up an instance on AWS of the client I have been testing with. All this can be seen on the PostgreSQL Buildfarm Github Repository on a repo called aws-vagrant-benchfarm-client. The README explains how to set it up. The only requirement is that you have vagrant installed and the vagrant-aws provider set up (and, of course, an Amazon AWS account to use).
Of course, we don't want to run members of the benchfarm on smallish AWS instances. But this gives me (and you, if you want to play along) something to work on, and the provisioning script documents all the setup steps rather than relying on complex instructions.
The provisioner installs a bleeding edge version of the buildfarm client's experimental Pgbench module, which currently only exists on the "benchfarm" topic branch. This module essentially runs Greg Smith's pgbench-tools suite, gets the results from the results database's "tests" table, and bundles it as a CSV for upload to the server.
Currently the server does nothing with it. This will just look like another buildfarm step. So the next thing to do is to get the server to start producing some pretty and useful graphs. Also, we need to decide what else we might want to capture.
Of course, we don't want to run members of the benchfarm on smallish AWS instances. But this gives me (and you, if you want to play along) something to work on, and the provisioning script documents all the setup steps rather than relying on complex instructions.
The provisioner installs a bleeding edge version of the buildfarm client's experimental Pgbench module, which currently only exists on the "benchfarm" topic branch. This module essentially runs Greg Smith's pgbench-tools suite, gets the results from the results database's "tests" table, and bundles it as a CSV for upload to the server.
Currently the server does nothing with it. This will just look like another buildfarm step. So the next thing to do is to get the server to start producing some pretty and useful graphs. Also, we need to decide what else we might want to capture.
Friday, September 26, 2014
Big O playing catchup.
I see that a new release of MySQL has been made, and they are touting the fact that they are allowing the omission of unaggregated items in a SELECT list from a GROUP BY clause, if they are functionally dependent on the items in the GROUP BY clause. This would happen, for example, where the items in the GROUP BY list form a primary key. It's a nice feature.
It's also a feature that PostgreSQL has had for three years.
It's also a feature that PostgreSQL has had for three years.
Importing JSON data
Say you have a file that consists of one JSON document per line. How can you import it into a table easily? This is a problem I was called on to help a colleague with yesterday. Using COPY is the obvious answer, but this turns out not to be quite so simple to do.
In text mode, COPY will be simply defeated by the presence of a backslash in the JSON. So, for example, any field that contains an embedded double quote mark, or an embedded newline, or anything else that needs escaping according to the JSON spec, will cause failure. And in text mode you have very little control over how it works - you can't, for example, specify a different ESCAPE character. So text mode simply won't work.
CSV mode is more flexible, but poses different problems. Here, instead of backslash causing a problem, QUOTE characters can cause a problem. First, JSON itself uses the default QUOTE character (double quote) to quote all string values. But if we change use an alternative like single quote, then the presence of any single quote in the JSON leads us into difficulties. Second, JSON also uses the default DELIMITER (comma) extensively. So, clearly we need to use something else for the QUOTE and DELIMITER options. (By default, in CSV mode, the ESCAPE character is the same as the QUOTE character, so we don't need to worry about it separately.)
What we in fact want is to specify QUOTE and DELIMITER characters that can't appear at all in the JSON. Then the whole line will be seen as a single unquoted datum, which is exactly what we want. There is a small set of single-byte characters that happen to be illegal in JSON, so we can be sure that choosing them for these options should do the right thing with any legal JSON. These are the control characters. So the solution we came up with looks like this:
Now this solution is a bit of a hack. I wonder if there's a case for a COPY mode that simply treats each line as a single datum. I also wonder if we need some more specialized tools for importing JSON, possibly one or more Foreign Data Wrappers. Such things could handle, say, embedded newline punctuation.
Note too that files produced by PostgreSQL's COPY ... TO command will be properly quoted and escaped and won't need to be handled like this to read them back. Of course, if you want them to be readable by other non-CSV processors, then you might need to use similar options to those above to avoid unwanted quoting and escaping.
In text mode, COPY will be simply defeated by the presence of a backslash in the JSON. So, for example, any field that contains an embedded double quote mark, or an embedded newline, or anything else that needs escaping according to the JSON spec, will cause failure. And in text mode you have very little control over how it works - you can't, for example, specify a different ESCAPE character. So text mode simply won't work.
CSV mode is more flexible, but poses different problems. Here, instead of backslash causing a problem, QUOTE characters can cause a problem. First, JSON itself uses the default QUOTE character (double quote) to quote all string values. But if we change use an alternative like single quote, then the presence of any single quote in the JSON leads us into difficulties. Second, JSON also uses the default DELIMITER (comma) extensively. So, clearly we need to use something else for the QUOTE and DELIMITER options. (By default, in CSV mode, the ESCAPE character is the same as the QUOTE character, so we don't need to worry about it separately.)
What we in fact want is to specify QUOTE and DELIMITER characters that can't appear at all in the JSON. Then the whole line will be seen as a single unquoted datum, which is exactly what we want. There is a small set of single-byte characters that happen to be illegal in JSON, so we can be sure that choosing them for these options should do the right thing with any legal JSON. These are the control characters. So the solution we came up with looks like this:
Of course, if the JSON has embedded newlines as punctuation, this wont work. So it's important that you configure whatever is producing the JSON not to insert newlines anywhere but at the end of each JSON document.copy the_table(jsonfield) from '/path/to/jsondata' csv quote e'\x01' delimiter e'\x02';
Now this solution is a bit of a hack. I wonder if there's a case for a COPY mode that simply treats each line as a single datum. I also wonder if we need some more specialized tools for importing JSON, possibly one or more Foreign Data Wrappers. Such things could handle, say, embedded newline punctuation.
Note too that files produced by PostgreSQL's COPY ... TO command will be properly quoted and escaped and won't need to be handled like this to read them back. Of course, if you want them to be readable by other non-CSV processors, then you might need to use similar options to those above to avoid unwanted quoting and escaping.
Monday, September 8, 2014
PLV8 and harmony scoping
The other day I complained on the PostgreSQL hackers list about a couple of aspects of Javascript that make it quite bothersome for large scale programming, namely the old style variable scoping rules and the very limited string literals, which can't stretch across lines (don't mention the awful backslash trick, please) and don't have any provision for interpolation. If you're used as I am to Perl, which has had lexically scoped variables for about 20 years and awsome string literal support for a good deal longer than that, these things are really quite painful.
The good news if that the forthcoming standard, ECMAScript6, also known as "harmony", contains features to deal with both of these issues.
The latest versions of the V8 engine actually support harmony scoping rules, with one restriction, namely that it's only supported in contexts that are in "strict" mode. I believe that this restriction will go away in due course.
Petr Jelinek dropped me a note that other day to tell me how to set V8 flags, and based in that I have developed a patch for PLV8 that allows for harmony scoping. It requires a new GUC setting that is applied during the module's initialization code.
This is available in my clone of the plv8 code, and you can see what it's doing at https://code.google.com/r/amdunstan-harmony-scoping/source/detail?r=8acdcdabcd0c2b9ad99f66a5258920db805efdc3#
I'll do a bit more testing and then talk to the other PLV8 developers about merging it in.
Things are less rosy on the string handling front, I'm afraid. I have no idea when V8 will get the "template strings" feature that will address the string literal deficiencies. As far as I can tell nobody is working on it.
The good news if that the forthcoming standard, ECMAScript6, also known as "harmony", contains features to deal with both of these issues.
The latest versions of the V8 engine actually support harmony scoping rules, with one restriction, namely that it's only supported in contexts that are in "strict" mode. I believe that this restriction will go away in due course.
Petr Jelinek dropped me a note that other day to tell me how to set V8 flags, and based in that I have developed a patch for PLV8 that allows for harmony scoping. It requires a new GUC setting that is applied during the module's initialization code.
This is available in my clone of the plv8 code, and you can see what it's doing at https://code.google.com/r/amdunstan-harmony-scoping/source/detail?r=8acdcdabcd0c2b9ad99f66a5258920db805efdc3#
I'll do a bit more testing and then talk to the other PLV8 developers about merging it in.
Things are less rosy on the string handling front, I'm afraid. I have no idea when V8 will get the "template strings" feature that will address the string literal deficiencies. As far as I can tell nobody is working on it.
Friday, August 22, 2014
Hoist on my own PLV8 petard
I mentioned to a client that I was going to write a function they had asked me to write using PLPerl, because it has good dynamic string handling capabilities that make it a better fit for this particular task than PLPgsql. "Oh," said the client, "we don't do much Perl. But ever since you hooked us up with PLV8 we do everything in JavaScript. We all know that, and use it every day." So I'm now writing it in Javascript. Proof once again that no good deed goes unpunished. It remains to be seen if it's going to be quite such a good fit as Perl would be, but at least it will give me a thorough refresher in JS.
Thursday, August 21, 2014
Looking for emacs mixed mode editing for functions
I've been looking for a way to edit mixed mode files in emacs, so I can edit a plperl function, and have the create statement part (and everything except the body) use standard SQL mode and the body use CPerl mode or whatever mode some magic marker tells it to use. I've taken a bit of a look at mmm-mode, but haven't been able to get it to work, and don't have more time to spend on it. If someone has a good recipe for this please let me know.
Tuesday, August 19, 2014
New PostgreSQL buildfarm client release 4.14 - bug fix for MSVC
There is a new release - version 4.14 - of the buildfarm client, now available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_14.tgz
The only change of note is that a bug which only affects MSVC clients (such that the client will not complete a run) and is present in releases 4.12 and 4.13 is fixed. Clients on other platforms do not need to upgrade.
The only change of note is that a bug which only affects MSVC clients (such that the client will not complete a run) and is present in releases 4.12 and 4.13 is fixed. Clients on other platforms do not need to upgrade.
Monday, July 21, 2014
Code size
Someone was just talking about the size of some source files in PostgreSQL. The source code (.c, .h, .y and .l files) weighs in at a bit over 1 million lines of code. The documentation source has another roughly 300,000 lines. That's a large project, but by no means enormous by today's standards. The biggest source code file is pg_dump.c, at around 15,700 lines. The biggest documentation file is funcs.sgml, at around 17,600 lines. Both of these might well do with a bit of reorganization.
Wednesday, June 18, 2014
Buildfarm Client version 4.13 released
I have released version 4.13 of the PostgreSQL Buildfarm client.
This can be downloaded from http://www.pgbuildfarm.org/downloads/releases/build-farm-4_13.tgz
Changes in this release (from the git log):
If you are using ccache, please note that there are adjustments to the recommended use pattern. The sample config file no longer suggests that the ccache directory have the branch name at the end. It is now recommended that you use a single cache for all branches for a particular member. To do this remove "/$branch" from the relevant line in your config file, if you have it, and remove those directories in the cache root. Your first run on each branch will rebuild some or all of the cache. My unifoed cache on crake is now at 615 Mb, rather than the multiples of Gb it had been previously.
It is recommended that this release be deployed by all users fairly quickly because of the fix in log file names that was discarding some that were quite important.
This can be downloaded from http://www.pgbuildfarm.org/downloads/releases/build-farm-4_13.tgz
Changes in this release (from the git log):
- fcc182b Don't run TestCollateLinuxUTF8 on unsupported branches.
- 273af50 Don't run FileTextArrayFDW tests if not wanted.
- 9944a4a Don't remove the ccache directory on failure, unless configured to.
- 73e4187 Make ccache and vpath builds play nicely together.
- 9ff8c22 Work around path idiocy in msysGit.
- 0948ac7 revert naming change for log files
- ca68525 Exclude ecpg/tests from find_typedefs code.
If you are using ccache, please note that there are adjustments to the recommended use pattern. The sample config file no longer suggests that the ccache directory have the branch name at the end. It is now recommended that you use a single cache for all branches for a particular member. To do this remove "/$branch" from the relevant line in your config file, if you have it, and remove those directories in the cache root. Your first run on each branch will rebuild some or all of the cache. My unifoed cache on crake is now at 615 Mb, rather than the multiples of Gb it had been previously.
It is recommended that this release be deployed by all users fairly quickly because of the fix in log file names that was discarding some that were quite important.
Sunday, June 15, 2014
JSON in PostgreSQL 9.4 Video
Here is the video presentation of my pgcon talk on JSON in PostgreSQL 9.4. All the other pgcon presentations can be seen nearby, too.
Sunday, June 8, 2014
buildfarm vs vpath vs ccache
I think we've got more or less to the bottom of the ccache mystery I wrote about recently. It turns out that the problem of close to 100% of cache misses occurs only when the buildfarm is doing a vpath build, and then only because the buildfarm script sets up a build directory that is different each run ("pgsql.$$"). There is actually no need for this. The locking code makes sure that we can't collide with ourselves, so a hardcoded name would do just as well. This was simple an easy choice I made, I suspect without much thought, 10 years ago or so, before the buildfarm even supported vpath builds.
It also turns out there is no great point in keeping a separate cache per branch. That was a bit of a thinko on my part.
So, in my lab machine ("crake") I have made these changes: the build directory is hard coded with a ".build" suffix rather than using the PID. And it keeps a single cache, not one per branch. After making these changes, warming the new cache, and zeroing the stats, I did fresh builds on each branch. Here's what the stats looked like (cache compression is turned on):
So I will probably limit this cache to, say, 300MB or so. That will be a whole lot better than the gigabytes I was using previously.
As for the benefits: on HEAD "make -j 4" now runs in 13 seconds on crake, as opposed to 90 seconds or more previously.
If we have a unified cache, it makes sense to disable the removal of the cache in failure cases, which is what started me looking at all this. We will just need to be a bit vigilant about failures, as many years ago there was at least some suspicion of persistent failures due to ccache problems.
All this should be coming to a buildfarm release soon, after I have let this test setup run for a week or so.
It also turns out there is no great point in keeping a separate cache per branch. That was a bit of a thinko on my part.
So, in my lab machine ("crake") I have made these changes: the build directory is hard coded with a ".build" suffix rather than using the PID. And it keeps a single cache, not one per branch. After making these changes, warming the new cache, and zeroing the stats, I did fresh builds on each branch. Here's what the stats looked like (cache compression is turned on):
cache directory ccache cache hit (direct) 5988 cache hit (preprocessed) 132 cache miss 0 called for link 1007 called for preprocessing 316 compile failed 185 preprocessor error 69 bad compiler arguments 6 autoconf compile/link 737 no input file 25 files in cache 12201 cache size 179.8 Mbytes max cache size 1.0 Gbytes
So I will probably limit this cache to, say, 300MB or so. That will be a whole lot better than the gigabytes I was using previously.
As for the benefits: on HEAD "make -j 4" now runs in 13 seconds on crake, as opposed to 90 seconds or more previously.
If we have a unified cache, it makes sense to disable the removal of the cache in failure cases, which is what started me looking at all this. We will just need to be a bit vigilant about failures, as many years ago there was at least some suspicion of persistent failures due to ccache problems.
All this should be coming to a buildfarm release soon, after I have let this test setup run for a week or so.
Friday, June 6, 2014
ccache mysteries
ccache is a nifty little utility for speeding up compilations by caching results. It's something we have long had support for in the buildfarm.
Tom Lane pinged me a couple of days ago about why, when there's a build failure, we remove the ccache. The answer is that a long time ago (about 8 years), we had some failures that weren't completely explained but where suspicion arose that ccache was returning stale compilations when it shouldn't have been. I didn't have a smoking gun then, and I certainly don't have one now. Eight years ago we just used this rather elephant-gun approach and moved on.
But Now I've started looking at the whole use of ccache. And the thing I find most surprising is that the hit rate is so low. Here, for example, are the stats from my FreeBSD animal nightjar, after a week since a failure:
Tom also complained that we keep a separate cache per branch. The original theory was that we would be trading disk space for a higher hit rate, but that seems less tenable now, with some hindsight.
Tom Lane pinged me a couple of days ago about why, when there's a build failure, we remove the ccache. The answer is that a long time ago (about 8 years), we had some failures that weren't completely explained but where suspicion arose that ccache was returning stale compilations when it shouldn't have been. I didn't have a smoking gun then, and I certainly don't have one now. Eight years ago we just used this rather elephant-gun approach and moved on.
But Now I've started looking at the whole use of ccache. And the thing I find most surprising is that the hit rate is so low. Here, for example, are the stats from my FreeBSD animal nightjar, after a week since a failure:
So I'm a bit puzzled. Most changes that trigger a build leave most of the files intact. Surely we should have a higher hit rate than 7.3%. If that's the best we can do It seems like there is little value in using ccache for the buildfarm. If it's not the best we can do I need to find out what I need to change to get that best. But so far nothing stands out.cache directory HEAD cache hit (direct) 2540 cache hit (preprocessed) 45 cache miss 32781 called for link 5571 called for preprocessing 1532 compile failed 899 preprocessor error 248 bad compiler arguments 31 autoconf compile/link 3990 no input file 155 files in cache 25114 cache size 940.9 Mbytes max cache size 1.0 Gbytes
Tom also complained that we keep a separate cache per branch. The original theory was that we would be trading disk space for a higher hit rate, but that seems less tenable now, with some hindsight.
Friday, May 23, 2014
Pgcon
The jsquery stuff from Oleg and Teodor looks awesome. I will be exploring it very soon. Meanwhile, here are my conference slides: http://www.slideshare.net/amdunstan/94json where I cover mostly 9.4 Json features that aren't about indexing.
This has been one of the better pgcons. Well done Dan and the rest of the team.
This has been one of the better pgcons. Well done Dan and the rest of the team.
Friday, May 2, 2014
pgbouncer enhancements
A couple of customers have recently asked for enhancements of pgbouncer, and I have provided them.
One that's been working for a while now, puts the address and port of the actual client (i.e. the program that connects to the proxy) into the session's application_name setting. That means that if you want to see where the client is that's running some query that's gone rogue, it's no longer hidden from you by the fact that all connections appear to be coming from the pgbouncer host.You can see it appearing in places like pg_stat_activity.
It only works when a client connects, so if the client itself sets application_name then the setting gets overridden. But few clients do this, and the original requester has found it useful. I've submitted this to the upsteam repo, as can be seen at https://github.com/markokr/pgbouncer-dev/pull/23.
The other enhancement is the ability to include files in the config file. This actually involves a modification to the library pgbouncer uses as a git submodule, libusual. With this enhancement, a line that has "%include filename" causes the contents of that file to be included in place of the directive. Includes can be nested up to 10 deep. The pull request for this is at https://github.com/markokr/libusual/pull/7. This one too sems to be working happily at the client's site.
There is one more enhancement on the horizon, which involves adding in host based authentication control similar to that used by Postgres. That's a rather larger bit of work, but I hope to get to it in the next month or two.
One that's been working for a while now, puts the address and port of the actual client (i.e. the program that connects to the proxy) into the session's application_name setting. That means that if you want to see where the client is that's running some query that's gone rogue, it's no longer hidden from you by the fact that all connections appear to be coming from the pgbouncer host.You can see it appearing in places like pg_stat_activity.
It only works when a client connects, so if the client itself sets application_name then the setting gets overridden. But few clients do this, and the original requester has found it useful. I've submitted this to the upsteam repo, as can be seen at https://github.com/markokr/pgbouncer-dev/pull/23.
The other enhancement is the ability to include files in the config file. This actually involves a modification to the library pgbouncer uses as a git submodule, libusual. With this enhancement, a line that has "%include filename" causes the contents of that file to be included in place of the directive. Includes can be nested up to 10 deep. The pull request for this is at https://github.com/markokr/libusual/pull/7. This one too sems to be working happily at the client's site.
There is one more enhancement on the horizon, which involves adding in host based authentication control similar to that used by Postgres. That's a rather larger bit of work, but I hope to get to it in the next month or two.
Saturday, April 5, 2014
Version 4.12 of the PostgreSQL Buildfarm client released.
I have released version 4.12 of the buildfarm client.
In addition to numerous bug fixes, it has the following:
In addition I recently enhanced the HOWTO at http://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto covering especially best current git practice.
Thanks to Tom Lane for suggestions and complaints which are behind a number of the improvements and fixes, and for some code for OSX find_typedefs.
The release is available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_12.tgz
In addition to numerous bug fixes, it has the following:
- the global option branches_to_build can now be 'HEADPLUSLATESTn' for any single digit n
- there is a new module TestCollateLinuxUTF8
- there is a new module TestDecoding which is enabled by default, (but does nothing on MSVC systems, where we can't yet run these tests.) This runs the new contrib test_decoding module, which can't run under "make installcheck".
- running "perl -cw" on the scripts will now give you failures for missing perl modules on almost every platform. The only exception should now be on older Msys systems.
- improvements in the sample config file to make it better organized and better reflecting of best practice.
- find_typdefs is now supported on OSX
In addition I recently enhanced the HOWTO at http://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto covering especially best current git practice.
Thanks to Tom Lane for suggestions and complaints which are behind a number of the improvements and fixes, and for some code for OSX find_typedefs.
The release is available at http://www.pgbuildfarm.org/downloads/releases/build-farm-4_12.tgz
Tuesday, April 1, 2014
Almost a crosstab using JSON
Most of the recent focus on the new json facilities has been on jsonb. This is understandable - it shows great promise, and a lot of people put in a lot of work on it. Besides myself, there were Teodor Sigaev, Oleg Bartunov, Alexander Korotkov and Peter Geoghegan as principal authors, so it was quite a large effort.
But there are some other new facilities for JSON that shouldn't be forgotten. I was reminded today of some of the things we could do. In particular, we can generate json on all sorts of interesting and useful ways.
A client was finding the crosstab functionality of the tablefunc module unwieldy. The trouble is you need to know in advance what the categories are. But we can construct something with the JSON functions that doesn't need any such knowledge. It's not quite a crosstab, but it will give you very similar functionality. Instead of rolling up the values into SQL fields named after the categories, we roll them up as values in a JSON object where the fields are named for the categories.
It's fairly simple.
Consider the following table:
It's also possible to do this with hstore today:
But there are some other new facilities for JSON that shouldn't be forgotten. I was reminded today of some of the things we could do. In particular, we can generate json on all sorts of interesting and useful ways.
A client was finding the crosstab functionality of the tablefunc module unwieldy. The trouble is you need to know in advance what the categories are. But we can construct something with the JSON functions that doesn't need any such knowledge. It's not quite a crosstab, but it will give you very similar functionality. Instead of rolling up the values into SQL fields named after the categories, we roll them up as values in a JSON object where the fields are named for the categories.
It's fairly simple.
Consider the following table:
To get a crosstab we do something like this:andrew=# select * from xx; a | b | c ---+---+---- 1 | b | x 1 | c | y 1 | d | z 2 | d | dz 2 | c | cz 2 | b | bz 2 | e | ez (7 rows)
To get the JSON object, instead we do this:andrew=# select * from crosstab('select * from xx') a(r int, b text, c text, d text, e text); r | b | c | d | e ---+----+----+----+---- 1 | x | y | z | 2 | dz | cz | bz | ez (2 rows)
Note how we didn't need to supply a column list reflecting the category values. The new json_object() function takes one or two arrays and outputs the corresponding JSON object.andrew=# select a as r, json_object(cols, vals) as pivot from (select a, array_agg(b order by b) as cols, array_agg(c order by b) as vals from xx group by a) x; r | pivot ---+-------------------------------------------------- 1 | {"b" : "x", "c" : "y", "d" : "z"} 2 | {"b" : "bz", "c" : "cz", "d" : "dz", "e" : "ez"} (2 rows)
It's also possible to do this with hstore today:
andrew=# select a as r, hstore(cols, vals) as pivot from (select a, array_agg(b order by b) as cols, array_agg(c order by b) as vals from xx group by a) x; r | pivot ---+-------------------------------------------- 1 | "b"=>"x", "c"=>"y", "d"=>"z" 2 | "b"=>"bz", "c"=>"cz", "d"=>"dz", "e"=>"ez" (2 rows)
Monday, March 31, 2014
Try the personal touch, please
One of the laziest recruiter tools in use today is Linkedin contact requests. I use Linkedin as a consumer because it helps me keep in contact with various people in a way that would otherwise be difficult. But when I get contact requests from recruiters I almost always ignore them. If recruiters can't be bothered to dig out my email address and send me something personal, then I'm not interested in them. The other day was a new low. I got a completely generic contact request from a new recruiter at the most persistent recruiter of all - Google (they usually contact me about once a year). I once admired Google a lot, but this just added to the list of reasons I don't admire them so much now - one of the other reasons is their participation in a non-compete cartel among software companies. But in any case, I refuse to engage with cookie cutter recruiters. They tend to be lazy and incompetent.
Friday, March 28, 2014
Watch all the little new features
Sometimes it's quite difficult to keep on top of all the new features that we implement. One I had missed that I just found out about yesterday, goes all the way back to 9.1, when we got advisory locks with transactionaly scope. These locks work just like the normal Postgres locks, in that they go away when the transaction in which they are obtained commits or rolls back, and they can't be released prior to that.
What makes this so cool is that it makes these locks a whole lot safer to use. I'll be looking to use them now in places where I haven't previously used advisory locks, because making sure you release the older style advisory locks in all the possible code paths can sometimes be quite tricky, and getting it wrong can have nasty consequences.
What makes this so cool is that it makes these locks a whole lot safer to use. I'll be looking to use them now in places where I haven't previously used advisory locks, because making sure you release the older style advisory locks in all the possible code paths can sometimes be quite tricky, and getting it wrong can have nasty consequences.
Friday, March 7, 2014
New JSON standard passes the buck
There is a new JSON standard out. It contains a few interesting things. For example, here is one change noted in the changelog:
And there this is this delightful couple of new paragraphs:
Changed the definition of "JSON text" so that it can be any JSON value, removing the constraint that it be an object or array.This is consistent with what Postgres does, but we actually had a bit of a debate back when we implemented JSON about whether or not it was the right thing to do. Maybe it wasn't then but is now :-)
And there this is this delightful couple of new paragraphs:
An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates.
How is that for having it both ways? You might not be incorrect if you allow duplicate names or if your application relies on object member ordering, just not "interoperable." Once, RFCs laid down the law, now they just seem to codify existing, possibly "non-interoperable" practice.JSON parsing libraries have been observed to differ as to whether or not they make the ordering of object members visible to calling software. Implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences.
Tuesday, January 28, 2014
New Json functions
I've been pretty busy, and neglected this blog for a while, but I'm very happy that today I have been able to commit a parcel of new JSON functions that will be in the next PostgreSQL release. These include functions to create very complex JSON tree structures, and a function to aggregate pairs of values as JSON objects.
You can see some documentation in the development version Documents.
Thanks to Marko Tiikkaja for his timely review.
Stay tuned, there's more JSON goodness coming!
You can see some documentation in the development version Documents.
Thanks to Marko Tiikkaja for his timely review.
Stay tuned, there's more JSON goodness coming!
Monday, January 13, 2014
SSPI auth on Windows
Does anyone have experience setting up SSPI authentication for PostgreSQL on a Windows machine that's not part of a domain? If so, I'd like to pick your brains a bit, so please let me know.
Subscribe to:
Posts (Atom)