ChatGPT解决这个技术问题 Extra ChatGPT

Get records with max value for each group of grouped SQL results

How do you get the rows that contain the max value for each grouped set?

I've seen some overly-complicated variations on this question, and none with a good answer. I've tried to put together the simplest possible example:

Given a table like that below, with person, group, and age columns, how would you get the oldest person in each group? (A tie within a group should give the first alphabetical result)

Person | Group | Age
---
Bob  | 1     | 32  
Jill | 1     | 34  
Shawn| 1     | 42  
Jake | 2     | 29  
Paul | 2     | 36  
Laura| 2     | 39  

Desired result set:

Shawn | 1     | 42    
Laura | 2     | 39  
Caution: The Accepted Answer worked in 2012 when it was written. However, it no longer works for multiple reasons, as given in the Comments.
@RickJames - Found a solution on your page here: mysql.rjweb.org/doc.php/groupwise_max#using_variables. 'Using "windowing functions"' for MySQL 8+. Thank you!
@kJamesy - Yes, but this is the pointer directly to "windowing functions" for that use: mysql.rjweb.org/doc.php/…
I find it amazing that SQLite has always just got this right by assuming that when you group by you automatically want other fields in the same record to tag along. It boggles my mind that this is not just standard practice!

a
axiac

The correct solution is:

SELECT o.*
FROM `Persons` o                    # 'o' from 'oldest person in group'
  LEFT JOIN `Persons` b             # 'b' from 'bigger age'
      ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL                 # bigger age not found

How it works:

It matches each row from o with all the rows from b having the same value in column Group and a bigger value in column Age. Any row from o not having the maximum value of its group in column Age will match one or more rows from b.

The LEFT JOIN makes it match the oldest person in group (including the persons that are alone in their group) with a row full of NULLs from b ('no biggest age in the group').
Using INNER JOIN makes these rows not matching and they are ignored.

The WHERE clause keeps only the rows having NULLs in the fields extracted from b. They are the oldest persons from each group.

Further readings

This solution and many others are explained in the book SQL Antipatterns: Avoiding the Pitfalls of Database Programming


BTW this can return two or more rows for a same group if o.Age = b.Age, e.g. if Paul from group 2 is on 39 like Laura. However if we do not want such behavior we can do: ON o.Group = b.Group AND (o.Age < b.Age or (o.Age = b.Age and o.id < b.id))
Incredible! For 20M records it's like 50 times faster than "naive" algorithm (join against a subquery with max())
Works perfectly with @Todor comments. I would add that if there are further query conditions they must be added in the FROM and in the LEFT JOIN. Something LIKE : FROM (SELECT * FROM Person WHERE Age != 32) o LEFT JOIN (SELECT * FROM Person WHERE Age != 32) b - if you want to dismiss people who are 32
@AlainZelink aren't these "further query conditions" be better put in the final WHERE condition list, in order to not introduce subqueries - which were not needed in the original @ axiac answer?
This solution worked; however, it started getting reported in the slow query log when attempted with 10,000+ rows sharing same ID. Was JOINing on indexed column. A rare case, but figured it's worth mentioning.
C
Community

There's a super-simple way to do this in mysql:

select * 
from (select * from mytable order by `Group`, age desc, Person) x
group by `Group`

This works because in mysql you're allowed to not aggregate non-group-by columns, in which case mysql just returns the first row. The solution is to first order the data such that for each group the row you want is first, then group by the columns you want the value for.

You avoid complicated subqueries that try to find the max() etc, and also the problems of returning multiple rows when there are more than one with the same maximum value (as the other answers would do)

Note: This is a mysql-only solution. All other databases I know will throw an SQL syntax error with the message "non aggregated columns are not listed in the group by clause" or similar. Because this solution uses undocumented behavior, the more cautious may want to include a test to assert that it remains working should a future version of MySQL change this behavior.

Version 5.7 update:

Since version 5.7, the sql-mode setting includes ONLY_FULL_GROUP_BY by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).


"mysql just returns the first row." - maybe this is how it works but it is not guaranteed. The documentation says: "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.". The server doesn't select rows but values (not necessarily from the same row) for each column or expression that appears in the SELECT clause and is not computed using an aggregate function.
This behaviour changed on MySQL 5.7.5 and by default, it rejects this query because the columns in the SELECT clause are not functionally dependent on the GROUP BY columns. If it is configured to accept it (` ONLY_FULL_GROUP_BY` is disabled), it works like the previous versions (i.e. the values of those columns are indeterminate).
I am surprised this answer got so many upvotes. It is wrong and it is bad. This query is not guaranteed to work. Data in a subquery is an unordered set in spite of the order by clause. MySQL may really order the records now and keep that order, but it woudn't break any rule if it stopped doing so in some future version. Then the GROUP BY condenses to one record, but all fields will be arbitrarily picked from the records. It may be that MySQL currently simply always picks the first row, but it could just as well pick any other row or even values from different rows in a future version.
Okay, we disagree here. I don't use undocumented features that just happen to work currently and rely on some tests that will hopefully cover this. You know that you are just lucky that the current implementation gets you the complete first record where the docs clearly state that you might got any indeterminate values instead, but you still use it. Some simple session or database setting may change this anytime. I'd consider this too risky.
This answer seems wrong. Per the doc, the server is free to choose any value from each group ... Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Result set sorting occurs after values have been chosen, and ORDER BY does not affect which value within each group the server chooses.
T
Tim Biegeleisen

You can join against a subquery that pulls the MAX(Group) and Age. This method is portable across most RDBMS.

SELECT t1.*
FROM yourTable t1
INNER JOIN
(
    SELECT `Group`, MAX(Age) AS max_age
    FROM yourTable
    GROUP BY `Group`
) t2
    ON t1.`Group` = t2.`Group` AND t1.Age = t2.max_age;

Michael, thanks for this- but do you have an answer for the issue of returning multiple rows on ties, per Bohemian's comments?
@Yarin If there were 2 rows for example where Group = 2, Age = 20, the subquery would return one of them, but the join ON clause would match both of them, so you would get 2 rows back with the same group/age though different vals for the other columns, rather than one.
So are we saying it's impossible to limit results to one per group unless we go Bohemians MySQL-only route?
@Yarin no not impossible, just requires more work if there are additional columns - possibly another nested subquery to pull the max associated id for each like pair of group/age, then join against that to get the rest of the row based on id.
This should be the accepted answer (the currently accepted answer will fail on most other RDBMS, and in fact would even fail on many versions of MySQL).
I
Igor Kulagin

My simple solution for SQLite (and probably MySQL):

SELECT *, MAX(age) FROM mytable GROUP BY `Group`;

However it doesn't work in PostgreSQL and maybe some other platforms.

In PostgreSQL you can use DISTINCT ON clause:

SELECT DISTINCT ON ("group") * FROM "mytable" ORDER BY "group", "age" DESC;

@IgorKulagin - Doesn't work in Postgres- Error message: column "mytable.id" must appear in the GROUP BY clause or be used in an aggregate function
The MySQL query may only work by accident on many occasions. The "SELECT *" may return information that does not correspond to the belonging MAX(age). This answer is wrong. This is probably also the case for SQLite.
But this fits the case where we need to select the grouped column and the max column. This does not fits the above requirement where it would results ('Bob', 1, 42) but the expected result is ('Shawn', 1, 42)
Good for postgres
This is a wrong answer as mysql "randomly" chooses values from columns that are not GROUP or AGE. This is fine only when you need only these columns.
u
user130268

Not sure if MySQL has row_number function. If so you can use it to get the desired result. On SQL Server you can do something similar to:

CREATE TABLE p
(
 person NVARCHAR(10),
 gp INT,
 age INT
);
GO
INSERT  INTO p
VALUES  ('Bob', 1, 32);
INSERT  INTO p
VALUES  ('Jill', 1, 34);
INSERT  INTO p
VALUES  ('Shawn', 1, 42);
INSERT  INTO p
VALUES  ('Jake', 2, 29);
INSERT  INTO p
VALUES  ('Paul', 2, 36);
INSERT  INTO p
VALUES  ('Laura', 2, 39);
GO

SELECT  t.person, t.gp, t.age
FROM    (
         SELECT *,
                ROW_NUMBER() OVER (PARTITION BY gp ORDER BY age DESC) row
         FROM   p
        ) t
WHERE   t.row = 1;

It does, since 8.0.
D
David

Using ranking method.

SELECT @rn :=  CASE WHEN @prev_grp <> groupa THEN 1 ELSE @rn+1 END AS rn,  
   @prev_grp :=groupa,
   person,age,groupa  
FROM   users,(SELECT @rn := 0) r        
HAVING rn=1
ORDER  BY groupa,age DESC,person

This sql can be explained as below,

select * from users, (select @rn := 0) r order by groupa, age desc, person @prev_grp is null @rn := CASE WHEN @prev_grp <> groupa THEN 1 ELSE @rn+1 END this is a three operator expression like this, rn = 1 if prev_grp != groupa else rn=rn+1 having rn=1 filter out the row you need


sel - need some explanation - I've never even seen := before - what is that?
:= is assignment operator. You could read more on dev.mysql.com/doc/refman/5.0/en/user-variables.html
I'll have to dig into this- I think the answer overcomplicates our scenario, but thanks for teaching me something new..
G
Giacomo1968

Improving on axiac's solution to avoid selecting multiple rows per group while also allowing for use of indexes

SELECT o.*
FROM `Persons` o 
  LEFT JOIN `Persons` b 
      ON o.Group = b.Group AND o.Age < b.Age
  LEFT JOIN `Persons` c 
      ON o.Group = c.Group AND o.Age = c.Age and o.id < c.id
WHERE b.Age is NULL and c.id is null

B
Bae Cheol Shin

I would not use Group as column name since it is reserved word. However following SQL would work.

SELECT a.Person, a.Group, a.Age FROM [TABLE_NAME] a
INNER JOIN 
(
  SELECT `Group`, MAX(Age) AS oldest FROM [TABLE_NAME] 
  GROUP BY `Group`
) b ON a.Group = b.Group AND a.Age = b.oldest

Thanks, though this returns multiple records for an age when there is a tie
@Yarin how would decide which is the correct oldest person? Multiple answers seem to be the rightest answer otherwise use limit and order
A
Arthur C

axiac's solution is what worked best for me in the end. I had an additional complexity however: a calculated "max value", derived from two columns.

Let's use the same example: I would like the oldest person in each group. If there are people that are equally old, take the tallest person.

I had to perform the left join two times to get this behavior:

SELECT o1.* WHERE
    (SELECT o.*
    FROM `Persons` o
    LEFT JOIN `Persons` b
    ON o.Group = b.Group AND o.Age < b.Age
    WHERE b.Age is NULL) o1
LEFT JOIN
    (SELECT o.*
    FROM `Persons` o
    LEFT JOIN `Persons` b
    ON o.Group = b.Group AND o.Age < b.Age
    WHERE b.Age is NULL) o2
ON o1.Group = o2.Group AND o1.Height < o2.Height 
WHERE o2.Height is NULL;

Hope this helps! I guess there should be better way to do this though...


A
Antonio Giovanazzi

My solution works only if you need retrieve only one column, however for my needs was the best solution found in terms of performance (it use only one single query!):

SELECT SUBSTRING_INDEX(GROUP_CONCAT(column_x ORDER BY column_y),',',1) AS xyz,
   column_z
FROM table_name
GROUP BY column_z;

It use GROUP_CONCAT in order to create an ordered concat list and then I substring to only the first one.


Can confirm that you can get multiple columns by sorting on the same key inside the group_concat, but need to write a separate group_concat/index/substring for each column.
Bonus here is that you can add multiple columns to the sort inside the group_concat and it would resolve the ties easily and guarantee only one record per group. Well done on the simple and efficient solution!
M
Marvin

Using CTEs - Common Table Expressions:

WITH MyCTE(MaxPKID, SomeColumn1)
AS(
SELECT MAX(a.MyTablePKID) AS MaxPKID, a.SomeColumn1
FROM MyTable1 a
GROUP BY a.SomeColumn1
  )
SELECT b.MyTablePKID, b.SomeColumn1, b.SomeColumn2 MAX(b.NumEstado)
FROM MyTable1 b
INNER JOIN MyCTE c ON c.MaxPKID = b.MyTablePKID
GROUP BY b.MyTablePKID, b.SomeColumn1, b.SomeColumn2

--Note: MyTablePKID is the PrimaryKey of MyTable

R
Ritwik

You can also try

SELECT * FROM mytable WHERE age IN (SELECT MAX(age) FROM mytable GROUP BY `Group`) ;

Thanks, though this returns multiple records for an age when there is a tie
Also, this query would be incorrect in the case that there is a 39-year-old in group 1. In that case, that person would also be selected, even though the max age in group 1 is higher.
D
DataScientYst

This is how I'm getting the N max rows per group in mysql

SELECT co.id, co.person, co.country
FROM person co
WHERE (
SELECT COUNT(*)
FROM person ci
WHERE  co.country = ci.country AND co.id < ci.id
) < 1
;

how it works:

self join to the table

groups are done by co.country = ci.country

N elements per group are controlled by ) < 1 so for 3 elements - ) < 3

to get max or min depends on: co.id < ci.id co.id < ci.id - max co.id > ci.id - min

co.id < ci.id - max

co.id > ci.id - min

Full example here:

mysql select n max values per group


s
slfan

In Oracle below query can give the desired result.

SELECT group,person,Age,
  ROWNUMBER() OVER (PARTITION BY group ORDER BY age desc ,person asc) as rankForEachGroup
  FROM tablename where rankForEachGroup=1

R
Rajesh
with CTE as 
(select Person, 
[Group], Age, RN= Row_Number() 
over(partition by [Group] 
order by Age desc) 
from yourtable)`


`select Person, Age from CTE where RN = 1`

R
Ray Foss

This method has the benefit of allowing you to rank by a different column, and not trashing the other data. It's quite useful in a situation where you are trying to list orders with a column for items, listing the heaviest first.

Source: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat

SELECT person, group,
    GROUP_CONCAT(
        DISTINCT age
        ORDER BY age DESC SEPARATOR ', follow up: '
    )
FROM sql_table
GROUP BY group;

u
user3475425

let the table name be people

select O.*              -- > O for oldest table
from people O , people T
where O.grp = T.grp and 
O.Age = 
(select max(T.age) from people T where O.grp = T.grp
  group by T.grp)
group by O.grp; 

F
Faisal

If ID(and all coulmns) is needed from mytable

SELECT
    *
FROM
    mytable
WHERE
    id NOT IN (
        SELECT
            A.id
        FROM
            mytable AS A
        JOIN mytable AS B ON A. GROUP = B. GROUP
        AND A.age < B.age
    )

A
Andrew Kin Fat Choi
SELECT o.*
FROM `Persons` o                   
  LEFT JOIN `Persons` b            
      ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL  
group by o.Group 

Please explain what your answer is doing.