Sunday, March 24, 2013

Parallel pg_dump committed

Today I committed Joachim Wieland's  patch to add a parallel option to pg_dump when outputting directory format. Barring some unexpected and evil occurrence, this should be in the forthcoming 9.3 release. This is the culmination of several years of work. I started talking about this shortly after I did the parallel pg_restore work for release 8.4.

This was a lot more difficult and complex than parallel pg_restore. The three pieces that were required to make this work were: an archive format that we could sensibly write to in parallel, a way of making sure that a dump done in parallel was consistent, and then the actual code to use those features to achieve parallel dumping. The first piece was directory archive format, that was introduced with Release 9.1. The second piece was snapshot cloning, that was introduced in Release 9.2. This final piece builds on those two earlier pieces of work to make parallel dumps a reality.

This development throws away some of my earlier work, We now have a common parallel processing infrastructure that is used for both pg_dump and pg_restore, The new code pre-creates the required number of workers (threads on Windows, processes on Unix) and farms out multiple pieces of work to them.  This is quite different from the earlier way pg_restore worked, which created workers on demand and allocated exactly one piece of work to them.

Congratulations to Joachim on this milestone..

3 comments:

  1. ...and there was much rejoicing!

    ReplyDelete
  2. Planning on upgrading soon from 8.4 to 9.3. Using the newer 9.3 pg_dump binary with parallel jobs, can it safely create a directory archive format backup of my 8.4 instance which will then be used to pg_restore to a latest 9.3 version?

    ReplyDelete