Friday, September 26, 2014

Big O playing catchup.

I see that a new release of MySQL has been made, and they are touting the fact that they are allowing the omission of unaggregated items in a SELECT list from a GROUP BY clause, if they are functionally dependent on the items in the GROUP BY clause. This would happen, for example, where the items in the GROUP BY list form a primary key. It's a nice feature.

It's also a feature that PostgreSQL has had for three years.


  1. MySQL permits a query to omit fields from GROUP BY even if they are not functionally dependent on a field included in the GROUP BY. The results are not deterministic unless you have an ORDER BY, much like PostgreSQL's DISTINCT ON. To prevent that, you need an sql_mode that prevents ONLY_FULL_GROUP_BY, such as STRICT mode.

    It looks like they've done their functional dependency detection better than what's currently supported in PostgreSQL as well.


  2. You are being economical with the truth. The 1992 SQL standard required that if a query included a GROUP BY clause then every column in the SELECT clause had to be either specified in the GROUP BY or appeared as part of an aggregate. This was changed in the 1999 and 2003 standards to allow columns in the SELECT clause to be omitted from the GROUP BY if they were functionally dependent on any columns that were included in the GROUP BY. So if a table had columns called ID and NAME with ID being the primary key then NAME could be included in the SELECT clause without having to be included in the GROUP BY clause because it was functionally dependent on the ID column.

    MySQL has always followed the latest SQL standard for GROUP BY, but PostgreSQL has stuck with the 1992 standard until fairly recently. The only change that is being added to MySQL is that they are implementing a better definition of "functionally dependent" to avoid those edge cases where a query could return confusing results.

    In that respect PostgreSQL is still lagging behind.

    1. You should be careful about throwing around terms like "economical with the truth." There is no intention to mislead here, and if I have made a mistake I will be happy to correct it. My understanding has been that until now MySQL in its default settings has allowed you to omit any unaggregated item from the GROUP BY clause, whether or not it is functionally dependent on the remaining items, and if it is not a functional dependency essentially an arbitrary value is returned. That seems to be what's happening here:

      mysql> create table abc (a int, b int, c int);
      Query OK, 0 rows affected (0.05 sec)

      mysql> insert into abc values (1,1,3),(1,2,4);
      Query OK, 2 rows affected (0.03 sec)
      Records: 2 Duplicates: 0 Warnings: 0

      mysql> select a,b,sum(c) from abc group by a;
      | a | b | sum(c) |
      | 1 | 1 | 7 |
      1 row in set (0.01 sec)

    2. My argument is that you are saying that MySQL have only just got round to implementing the SQL1999 standard by not forcing the GROUP BY clause to contain every column that is identified in the SELCT clause while PostgreSQL has had this for several years. MySQL has allows this for decades (far longer than PostgreSQL), but the idea of "functional dependencies" was open to interpretation and therefore could lead to confusing results. All that MySQL have done is to tighten up the implementation of "functional dependencies" in order to eliminate the confusion. Is the implementation in PostgreSQL as good?

  3. No, the idea that "functional dependencies" is something open to interpretation in the way you suggest is nonsense. This is a well understood expression. And in fact it's something that historically MySQL has not given a damn about. In the example I showed, column b is clearly not functionally dependent on column a, by any possible definition, and yet MySQL happily allowed me to omit it from the GROUP BY clause. Saying that this is in any sense an implementation of the SQL1999 standard is just not true.

    1. Yes, it is true. The SQL1999 standard removed the need for all columns in the SELECT clause to be also included in the GROUP BY clause. While it was possible to construct queries which produced unexpected results it was also possible to construct perfectly sensible queries with predictable results. The change that MySQL has made is to validate the functional dependencies so as to avoid any misunderstandings. Can Postgres do that?

      Postgres did not implement the SQL1999 standard until fairly recently. Before then it always insisted that EVERY column in the SELECT clause was also included in the GROUP BY.

    2. Now who's being economical with the truth? The functional dependency requirement is not a trivial optional detail. As a statement of fact, your claim that "The SQL1999 standard removed the need for all columns in the SELECT clause to be also included in the GROUP BY clause." without mentioning the functional dependency requirement is at best extremely misleading. And any claim that in ignoring that requirement MySQL was still nevertheless compliant with the standard is just nonsense.

  4. The simple fact is that the 1992 SQL standard contained this statement concerning GROUP BY:
    "If T is a grouped table, then each ; in each ; that references a column of T shall reference a grouping column or be specified within a ." which means that every column in the SELECT clause must also appear in the GROUP BY clause.

    This requirement was removed in the 1999 standard and was implemented by MySQL over a decade ago, yet in Postgres only recently. The only fault in MySQL's implementation is that it allowed badly written queries which could produce ambiguous results. The recent announcement from Oracle simply removes the ambiguities.

    Your entire article tries to give the impression that Postgres implemented the 1999 standard regarding GROUP BY several years before MySQL when in fact the truth is completely the opposite.

    1. No, The simple fact is that your statement that this requirement was removed in SQL 99 is wrong. I have pointed out above how and why it's wrong, so at this stage I can only assume you're not interested in the actual truth. The requirement was replaced with a more liberal one, not removed. MySQL in its default configuration has not been compliant with those more liberal requirements until now, as the example I gave illustrated.

      Here is the actual requirement from SQL 2003, the terms of which are pretty much the same as those in SQL 99:

      If T is a grouped table, then let G be the set of grouping columns of T. In each <value expression> contained in <select list>, each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a <set function specification> whose aggregation query is QS.

      Now MySQL has leapfrogged PostgtreSQL some, in that it now detects some functional dependencies that PostgreSQL is not yet able to. That's what Craig pointed out in his comment and I have not disagreed with him. But that's quite different from claiming that MySQL has been fully standards compliant in this respect all along. It hasn't.

      This is the last comment I'll allow on this posting. I'm simply not interested in rehashing the same argument over and over.