I have a table that looks like this caller 'makerar'
cname | wmname | avg
--------+-------------+------------------------
canada | zoro | 2.0000000000000000
spain | luffy | 1.00000000000000000000
spain | usopp | 5.0000000000000000
And I want to select the maximum avg for each cname.
SELECT cname, wmname, MAX(avg) FROM makerar GROUP BY cname;
but I will get an error,
ERROR: column "makerar.wmname" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT cname, wmname, MAX(avg) FROM makerar GROUP BY cname;
so i do this
SELECT cname, wmname, MAX(avg) FROM makerar GROUP BY cname, wmname;
however this will not give the intented results, and the incorrect output below is shown.
cname | wmname | max
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | luffy | 1.00000000000000000000
spain | usopp | 5.0000000000000000
Actual Results should be
cname | wmname | max
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000
How can I go about fixing this issue?
Note: This table is a VIEW created from a previous operation.
wmname="usopp"
expected and not for example wmname="luffy"
?
Yes, this is a common aggregation problem. Before SQL3 (1999), the selected fields must appear in the GROUP BY
clause[*].
To workaround this issue, you must calculate the aggregate in a sub-query and then join it with itself to get the additional columns you'd need to show:
SELECT m.cname, m.wmname, t.mx
FROM (
SELECT cname, MAX(avg) AS mx
FROM makerar
GROUP BY cname
) t JOIN makerar m ON m.cname = t.cname AND t.mx = m.avg
;
cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000
But you may also use window functions, which looks simpler:
SELECT cname, wmname, MAX(avg) OVER (PARTITION BY cname) AS mx
FROM makerar
;
The only thing with this method is that it will show all records (window functions do not group). But it will show the correct (i.e. maxed at cname
level) MAX
for the country in each row, so it's up to you:
cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | luffy | 5.0000000000000000
spain | usopp | 5.0000000000000000
The solution, arguably less elegant, to show the only (cname, wmname)
tuples matching the max value, is:
SELECT DISTINCT /* distinct here matters, because maybe there are various tuples for the same max value */
m.cname, m.wmname, t.avg AS mx
FROM (
SELECT cname, wmname, avg, ROW_NUMBER() OVER (PARTITION BY avg DESC) AS rn
FROM makerar
) t JOIN makerar m ON m.cname = t.cname AND m.wmname = t.wmname AND t.rn = 1
;
cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000
[*]: Interestingly enough, even though the spec sort of allows to select non-grouped fields, major engines seem to not really like it. Oracle and SQLServer just don't allow this at all. Mysql used to allow it by default, but now since 5.7 the administrator needs to enable this option (ONLY_FULL_GROUP_BY
) manually in the server configuration for this feature to be supported...
In Postgres, you can also use the special DISTINCT ON (expression)
syntax:
SELECT DISTINCT ON (cname)
cname, wmname, avg
FROM
makerar
ORDER BY
cname, avg DESC ;
BY cname
?
The problem with specifying non-grouped and non-aggregate fields in group by
selects is that engine has no way of knowing which record's field it should return in this case. Is it first? Is it last? There is usually no record that naturally corresponds to aggregated result (min
and max
are exceptions).
However, there is a workaround: make the required field aggregated as well. In postgres, this should work:
SELECT cname, (array_agg(wmname ORDER BY avg DESC))[1], MAX(avg)
FROM makerar GROUP BY cname;
Note that this creates an array of all wnames, ordered by avg, and returns the first element (arrays in postgres are 1-based).
array_agg
I didn't know you can sort right in the params
array_agg
is what I was looking for to return all the items that the GROUP BY
matches
For me, it is not about a "common aggregation problem", but just about an incorrect SQL query. The single correct answer for "select the maximum avg for each cname..." is
SELECT cname, MAX(avg) FROM makerar GROUP BY cname;
The result will be:
cname | MAX(avg)
--------+---------------------
canada | 2.0000000000000000
spain | 5.0000000000000000
This result in general answers the question "What is the best result for each group?". We see that the best result for spain is 5 and for canada the best result is 2. It is true, and there is no error. If we need to display wmname also, we have to answer the question: "What is the RULE to choose wmname from resulting set?" Let's change the input data a bit to clarify the mistake:
cname | wmname | avg
--------+--------+-----------------------
spain | zoro | 1.0000000000000000
spain | luffy | 5.0000000000000000
spain | usopp | 5.0000000000000000
Which result do you expect on runnig this query: SELECT cname, wmname, MAX(avg) FROM makerar GROUP BY cname;
? Should it be spain+luffy
or spain+usopp
? Why? It is not determined in the query how to choose "better" wmname if several are suitable, so the result is also not determined. That's why SQL interpreter returns an error - the query is not correct.
In the other word, there is no correct answer to the question "Who is the best in spain
group?". Luffy is not better than usopp, because usopp has the same "score".
SELECT cname, id, MAX(avg) FROM makerar GROUP BY cname;
, which did give this misleading error.
SELECT t1.cname, t1.wmname, t2.max
FROM makerar t1 JOIN (
SELECT cname, MAX(avg) max
FROM makerar
GROUP BY cname ) t2
ON t1.cname = t2.cname AND t1.avg = t2.max;
Using rank()
window function:
SELECT cname, wmname, avg
FROM (
SELECT cname, wmname, avg, rank()
OVER (PARTITION BY cname ORDER BY avg DESC)
FROM makerar) t
WHERE rank = 1;
Note
Either one will preserve multiple max values per group. If you want only single record per group even if there is more than one record with avg equal to max you should check @ypercube's answer.
This seems to work as well
SELECT *
FROM makerar m1
WHERE m1.avg = (SELECT MAX(avg)
FROM makerar m2
WHERE m1.cname = m2.cname
)
I recently run into this problem, when trying to count using case when
, and found that changing the order of the which
and count
statements fixes the problem:
SELECT date(dateday) as pick_day,
COUNT(CASE WHEN (apples = 'TRUE' OR oranges 'TRUE') THEN fruit END) AS fruit_counter
FROM pickings
GROUP BY 1
Instead of using - in the latter, where I got errors that apples and oranges should appear in aggregate functions
CASE WHEN ((apples = 'TRUE' OR oranges 'TRUE') THEN COUNT(*) END) END AS fruit_counter
which
statement?
Success story sharing
MAX
(see answer by @ypercube, there is also another solution in my answer) but not the way you do it. Check expected output.avg
percname
) but it doesn't restrict the rows of the result (as the OP wants). See the Actual Results should be paragraph in the question.ONLY_FULL_GROUP_BY
in MySQL 5.7 does not activate the way the SQL standard specifies when columns can be omitted from thegroup by
(or makes MySQL behave like Postgres). It just reverts to the old behavior where MySQL returns random (= "indeterminate") results instead.