Andrew Dunstan's PostgreSQL and Technical blog: May 2013

Saturday, May 25, 2013

Redis FDW Singleton Key tables

I recently mentioned the possibility of tables being drawn from a single Redis object, so you would get one row per key/value pair in the named hash, or one row per element in the named scalar, set, list or zset. This has now been committed, for use with Release 9.2 and later. There are examples in the regression test files. This is going to be particularly important when we get to writable tables, which is the next order of business.

For those of you who were in my talk yesterday, the breakage I mentioned has now been fixed.

Friday, May 24, 2013

Redis talk slides

Here are the slides from my talk on PostgreSQL and Redis.

Thursday, May 23, 2013

Blackhole FDW

My Blackhole FDW talk seemed to go well. The line about no unintentional data loss got the best laugh. Here are the slides.

Besides being a bit of fun, this did have a serious purpose - creating a skeleton for building an FDW, including the writable API. The code has the contents of the docs on writing an FDW as comments in the appropriate functions, to help a new FDW writer.

The code is on bitbucket.

Buildfarm download location

It was just pointed out to me that the download link on the buidfarm server front page wasn't updated when I fixed the other links after switching to publishing them on the buildfarm server itself. That's been fixed now. The only valid link for downloading the client is http://www.pgbuildfarm.org/downloads/. Sorry for any confusion.

Developer meeting went well

There seems to be a consensus, which I share, that the annual PostgreSQL Developers Meeting went much better this year that in the previous couple of years.

One item of note: the commit fest managers are going to be much more vigilant about making sure that if you have signed up for a review you will work on it right away, and about removing reviewers who are not producing reviews. So you will be able to have much more confidence that if someone is signed up as a reviewer for a patch they will actually be doing the work.

After the meeting and the obligatory visit to the Royal Oak, a number of us went out and had a pleasant Indian meal, and then I came back to the hotel, fixed a customer problem, and wrote up some slides for my proposed lightning talk. More on this later.

Now, on to the conference proper!

Thursday, May 16, 2013

Our commitment to quality is second to none.

If anyone doubts the total commitment of the Postgres project to quality and correctness, let them be reassured by this completely correct but decidedly pedantic commit. I dips me lid to Tom Lane and Thom Brown.

Wednesday, May 15, 2013

Time down the drain

Annoyingly large amount of non-productive time today. First I went and renovated the Makefiles for all my Postgres extensions, having originally copied a piece of bad logic from somewhere into almost all of them. This was exposed by a change in the HEAD branch of Postgres that's about to be reverted, but it seemed best to remove the bogus logic anyway.

Then, for no apparent reason, Xfce forgot to load xfwm4 on one of my machines, and suddenly I had lost all my window decorations - no buttons to close or minimize windows, no icons in the window button bar, etc. That took me an hour or more to diagnose and fix.

All this is delaying my slide preparation for pgcon :-(

PostgreSQL tables for single Redis objects

After playing a bit with the Redis FDW and the Redis command extension, Josh Berkus was slightly dissatisfied. He was looking for a way to map the possibly huge set of values in a single Redis object to a set of rows on a PostgresSQL table, and the Redis FDW currently maps each object to a single row - as an array if it's appropriate. Of course, we could call unnest () on the array, but it seems roundabout to construct an array only to have to unpack it immediately. These aren't terribly cheap operations.

I've been thinking a bit about his complaint, and I think this will be doable. What we'll need is a new table option that specifies the key and designates the table as one sourced from a single key rather than a range of keys. Say we call this option singleton_key. Then we might do something like:

CREATE FOREIGN TABLE hugeset (value text)
SERVER localredis
OPTIONS (tabletype 'list', singleton_key 'myhugelist');

This option would be incompatible with the tablekeyset and tablekeyprefix options. If given, the key won't be looked up at all. We would simply use the given key and return the corresponding list of values. That would make selecting from such a table faster - possibly lots faster. For scalars, sets and lists, the table would have one column. In the case of a scalar there would only be one row. For zsets, it would have two columns - one for the value and one for the score. Finally, for hashes it would have two, one for the property and one for the value.

This looks like a useful possible enhancement.

Tuesday, May 14, 2013

Redis FDW gets a slightly new home

To avoid a few administrative difficulties imposed by github, Dave Page and I have moved the repo for the Redis FDW to https://github.com/pg-redis-fdw/redis_fdw. If you have a clone of the old repo, it should be sufficient to change the remote setting in the .git/config file, replacing "dpage" with "pg-redis-fdw". The commits are identical up to the point where we removed everything from the old repo.

Sorry for any inconvenience.

Sometimes you just get hit by a huge application load

I love emergency calls. They give me a chance to get into something new, very quickly, and get them fixed. A few days ago a company called SpaceInch made a new app called Say The Same Thing, which became an instant hit, possibly due to the quite awesome video by OK Go, which you should watch even if you don't look at anything else in this post.

Since the launch they have been hit by a huge flood of users and last night things started to drag, and they asked PostgreSQL Experts for some emergency help. The app is backed by a PostgreSQL database running on Heroku. During a phone call of two hours, I was able to get in, examine the database, and devise some steps that we hope will have the effect of making things run smoother.

The solution is going to involve some partitioning of one or two tables. This isn't being done so much for the benefit of constraint exclusion as because it lets them drop large amounts of data that is no longer needed, cheaply. Previously they had been cleaning one of these tables by deleting rows, but abandoned that as it adversely affected performance. But the consequence of that was that the table just grew and grew to several million rows in a few days. Dropping child tables that contain data that is of no further interest will work well to tame the data size without all the overhead of row deletion.

Troubles sometimes don't come singly. I actually had to deal with two emergencies last night, which is quite unusual - the other was also for a client running on Amazon. For good or ill, we are seeing more and more use of Postgres instances running in the cloud - whether managed by the client or on a managed service such as Heroku.

Friday, May 10, 2013

Only GROUP BY what you really need to

The old rule used to be that if you have a query that contained aggregated columns, you have to GROUP BY every other column selected. These days you are allowed to omit columns that are provably functionally dependent on one or more of the other grouped by columns. In practice, that means you can omit any columns that are not in the table's primary key if all the primary key columns are grouped by.

Sometimes people, often including me, just do this fairly mindlessly, but sometimes it bites you. Consider this simple query:

SELECT a.id as a_id, a.properties, sum(b.amount) as expenses
FROM people a
   JOIN expenses b on a.id = b.person_id
GROUP BY a.id, a.properties

We don't really want the expenses grouped by the person's properties. We just put that in because the parser complains if we don't. And if people turns out to be a view which joins a couple of tables, we probably can't leave it out either. This can increase the amount of sorting that the GROUP BY requires, which can sometime have dramatic effects on performance. But even worse, there are cases where this can actually cause the query to be unrunnable. One such case is if properties is a JSON column.

That might surprise you. It has certainly surprised a couple of people I know. The reason is that there is no equality operator for JSON.

So, how can we write this so we only GROUP BY what we really need to? One way is to pick up the extra column later in the query, after we have done the grouping, like this:

WITH exp as 
(
  SELECT a.id as a_id, sum(b.amount) as expenses
  FROM people a
     JOIN expenses b on a.id = b.person_id
  GROUP BY a.id
)
SELECT exp.*, p.properties
FROM exp 
   JOIN people p ON p.id = exp.a_id

This might look a bit silly. We're adding in an extra join to people that we shouldn't need. But it turns out in my experience that this actually often works pretty well, and what you pay by way of the extra join is often paid for by the fact that you're simplifying the GROUP BY, and that it is processing smaller rows, uncluttered by the extra columns you want carried through. And, in the case of a JSON column, it has the virtue that it will work.

I often get called in to look at queries that run slowly and have huge GROUP BY clauses (I have seen them with 50 or so columns). I almost always start by reducing the GROUP BY to the smallest set possible, and this almost always results in a performance gain.

Redis commands from Postgres

A while ago a client wanted a way to call Redis commands from Postgres, for example to push values to Redis, or perform other Redis housekeeping that can't be done by use of the Redis FDW. I put together a quick Postgres interface using the official hiredis client library, and I have been polishing it up a bit in preparation for my talk at PgCon, and the package is now available for public use.

The basic interface contains functions to provide persistent connection handles, to disconnect, and to call the library functions redisCommand() and redisCommandArgv(). Yesterday, I added functions to push a record to a Redis “table”, and to drop a Redis “table”. I am planning to add a function to push a whole Postgres table or view, but for now this can be achieved by pushing the records - tests yesterday on very modest hardware managed to push 1 million rows in 130 seconds and drop the same table in 45 seconds. Of course, in Redis dropping a table means in effect dropping its component objects, so it's still doing several million operations here.

Of course, Redis doesn't really have tables in the PostgreSQL sense. It has a single global namespace which contains keys, which are strings, and values, which are various sorts of things (strings, integers, lists, hashes, sets, ordered sets). Hence my use of inverted commas above. For this purpose, a Redis table consists of a set of objects whose keys have a common prefix, or whose keys are the elements of a Redis set object. These uses are critical to the effective use of the Redis Foreign Data Wrapper, which I will also be expanding upon in my talk.

I also added some regression tests which demonstrate the abilities of the interface package. I'll be working on improving the documentation shortly.

Monday, May 6, 2013

Reminder: you can't set high shared buffers on 32 bit systems.

Today I got a brutal reminder that using lots of memory on 32 bit systems is dangerous. One of my clients had followed all the usual advice and set shared_buffers at about 3Gb on a 32 bit 16Gb system. They started getting out of memory errors for no apparent reason. It was quite puzzling, as their system wasn't at all stressed. Then I remembered this excellent blog post by Robert Haas, who analyzes the situation perfectly. The client lowered shared buffers to 2Gb and the problem stopped.

Bottom line: no matter how much memory you have, and no matter whether or not you have a PAE kernel, there is a hard 4Gb limit on the address space of a single process on 32 bit systems, and thus the practical limit of shared buffers on ALL 32 bit systems is around 2Gb.

Utilities for building JSON

As we get more experience in using JSON in Postgres we get a better idea of what we need to make its use simple and effective. This work is the result of some such experience.

Today I'm releasing a new extension, json_build, which provides a couple of utility functions for building JSON, build_json_array() and build_json_object(). Both of these functions can take any number of arguments of any type, and they will build a JSON value of the appropriate kind. Note that JSON arrays, unlike Postgres arrays, can be heterogenous, and the values do not all need to be of the same type. When building an object, the arguments are treated pairwise, so it's an error if build_json_object() gets an odd number of arguments. The first item in each pair must be a non-null scalar, (i.e. not an array or record) and must not be a JSON value.

Any JSON value is simply passed through, either as an object field value or as an array element, so, since these functions return JSON, you can nest calls to them and thus build up your JSON values in a fairly natural way.

Examples:

SELECT build_json_object( 
       'a', build_json_object('b',false,'c',99), 
       'd', build_json_object('e',array[9,8,7]::int[],
           'f', (select row_to_json(r) from ( select relkind, oid::regclass as name 
                                              from pg_class where relname = 'pg_class') r)));
                                    build_json_object                                        
-------------------------------------------------------------------------------------------------
 {"a" : {"b" : false, "c" : 99}, "d" : {"e" : [9,8,7], "f" : {"relkind":"r","name":"pg_class"}}}
(1 row)

 SELECT build_json_array('a',1,'b',1.2,'c',true,'d',null,'e',json '{"x": 3, "y": [1,2,3]}');
                           build_json_array                            
-----------------------------------------------------------------------
 ["a", 1, "b", 1.2, "c", true, "d", null, "e", {"x": 3, "y": [1,2,3]}]
(1 row)

Enjoy!