Andrew Dunstan's PostgreSQL and Technical blog: Big O playing catchup.

Friday, September 26, 2014

Big O playing catchup.

I see that a new release of MySQL has been made, and they are touting the fact that they are allowing the omission of unaggregated items in a SELECT list from a GROUP BY clause, if they are functionally dependent on the items in the GROUP BY clause. This would happen, for example, where the items in the GROUP BY list form a primary key. It's a nice feature.

It's also a feature that PostgreSQL has had for three years.

10 comments:

Craig RingerSeptember 28, 2014 at 8:08 AM
MySQL permits a query to omit fields from GROUP BY even if they are not functionally dependent on a field included in the GROUP BY. The results are not deterministic unless you have an ORDER BY, much like PostgreSQL's DISTINCT ON. To prevent that, you need an sql_mode that prevents ONLY_FULL_GROUP_BY, such as STRICT mode.

It looks like they've done their functional dependency detection better than what's currently supported in PostgreSQL as well.

See http://rpbouman.blogspot.com.au/2014/09/mysql-575-group-by-respects-functional.html
ReplyDelete
Replies
Tony MarstonOctober 5, 2014 at 6:25 AM
You are being economical with the truth. The 1992 SQL standard required that if a query included a GROUP BY clause then every column in the SELECT clause had to be either specified in the GROUP BY or appeared as part of an aggregate. This was changed in the 1999 and 2003 standards to allow columns in the SELECT clause to be omitted from the GROUP BY if they were functionally dependent on any columns that were included in the GROUP BY. So if a table had columns called ID and NAME with ID being the primary key then NAME could be included in the SELECT clause without having to be included in the GROUP BY clause because it was functionally dependent on the ID column.

MySQL has always followed the latest SQL standard for GROUP BY, but PostgreSQL has stuck with the 1992 standard until fairly recently. The only change that is being added to MySQL is that they are implementing a better definition of "functionally dependent" to avoid those edge cases where a query could return confusing results.

In that respect PostgreSQL is still lagging behind.
ReplyDelete
Replies
Andrew DunstanOctober 11, 2014 at 9:18 AM
No, the idea that "functional dependencies" is something open to interpretation in the way you suggest is nonsense. This is a well understood expression. And in fact it's something that historically MySQL has not given a damn about. In the example I showed, column b is clearly not functionally dependent on column a, by any possible definition, and yet MySQL happily allowed me to omit it from the GROUP BY clause. Saying that this is in any sense an implementation of the SQL1999 standard is just not true.
ReplyDelete
Replies
Tony MarstonOctober 19, 2014 at 6:37 AM
The simple fact is that the 1992 SQL standard contained this statement concerning GROUP BY:
"If T is a grouped table, then each ; in each ; that references a column of T shall reference a grouping column or be specified within a ." which means that every column in the SELECT clause must also appear in the GROUP BY clause.

This requirement was removed in the 1999 standard and was implemented by MySQL over a decade ago, yet in Postgres only recently. The only fault in MySQL's implementation is that it allowed badly written queries which could produce ambiguous results. The recent announcement from Oracle simply removes the ambiguities.

Your entire article tries to give the impression that Postgres implemented the 1999 standard regarding GROUP BY several years before MySQL when in fact the truth is completely the opposite.
ReplyDelete
Replies

Add comment

New comments are not allowed.