ChatGPT解决这个技术问题 Extra ChatGPT

MySQL JOIN the most recent row only?

I have a table customer that stores a customer_id, email and reference. There is an additional table customer_data that stores a historical record of the changes made to the customer, i.e. when there's a change made a new row is inserted.

In order to display the customer information in a table, the two tables need to be joined, however only the most recent row from customer_data should be joined to the customer table.

It gets a little more complicated in that the query is paginated, so has a limit and an offset.

How can I do this with MySQL? I think I'm wanting to put a DISTINCT in there somewhere...

The query at the minute is like this-

SELECT *, CONCAT(title,' ',forename,' ',surname) AS name
FROM customer c
INNER JOIN customer_data d on c.customer_id=d.customer_id
WHERE name LIKE '%Smith%' LIMIT 10, 20

Additionaly, am I right in thinking I can use CONCAT with LIKE in this way?

(I appreciate that INNER JOIN might be the wrong type of JOIN to use. I actually have no clue what the difference is between the different JOINs. I'm going to look into that now!)

How does the customer history table look like? How is the most recent row determined? Is there a timestamp field?
Most recent is simply the last row inserted - so its primary key is the highest number.
Why not a trigger? take a look at this answer: stackoverflow.com/questions/26661314/…
Most/all of the answers were taking too long with millions of rows. There are some solutions with a better performance.
Does this answer your question? Fetch the row which has the Max value for a column

D
Daniel Vassallo

You may want to try the following:

SELECT    CONCAT(title, ' ', forename, ' ', surname) AS name
FROM      customer c
JOIN      (
              SELECT    MAX(id) max_id, customer_id 
              FROM      customer_data 
              GROUP BY  customer_id
          ) c_max ON (c_max.customer_id = c.customer_id)
JOIN      customer_data cd ON (cd.id = c_max.max_id)
WHERE     CONCAT(title, ' ', forename, ' ', surname) LIKE '%Smith%' 
LIMIT     10, 20;

Note that a JOIN is just a synonym for INNER JOIN.

Test case:

CREATE TABLE customer (customer_id int);
CREATE TABLE customer_data (
   id int, 
   customer_id int, 
   title varchar(10),
   forename varchar(10),
   surname varchar(10)
);

INSERT INTO customer VALUES (1);
INSERT INTO customer VALUES (2);
INSERT INTO customer VALUES (3);

INSERT INTO customer_data VALUES (1, 1, 'Mr', 'Bobby', 'Smith');
INSERT INTO customer_data VALUES (2, 1, 'Mr', 'Bob', 'Smith');
INSERT INTO customer_data VALUES (3, 2, 'Mr', 'Jane', 'Green');
INSERT INTO customer_data VALUES (4, 2, 'Miss', 'Jane', 'Green');
INSERT INTO customer_data VALUES (5, 3, 'Dr', 'Jack', 'Black');

Result (query without the LIMIT and WHERE):

SELECT    CONCAT(title, ' ', forename, ' ', surname) AS name
FROM      customer c
JOIN      (
              SELECT    MAX(id) max_id, customer_id 
              FROM      customer_data 
              GROUP BY  customer_id
          ) c_max ON (c_max.customer_id = c.customer_id)
JOIN      customer_data cd ON (cd.id = c_max.max_id);

+-----------------+
| name            |
+-----------------+
| Mr Bob Smith    |
| Miss Jane Green |
| Dr Jack Black   |
+-----------------+
3 rows in set (0.00 sec)

In a long run this approach might create performance problems as it would need to create a temporary table. So another solution (if possible) is to add a new boolean field (is_last) in customer_data which you would have to update every time a new entry is added. The last entry will have is_last=1, all other for this customer - is_last=0.
People should (please) also read the following answer (from Danny Coulombe), because this answer (sorry Daniel) is terribly slow with longer queries / more data. Made my page "wait" for 12 seconds to load; So please also check stackoverflow.com/a/35965649/2776747 . I didn't notice it till after a lot of other changes so it took me very long to find out.
D
Danny Coulombe

If you are working with heavy queries, you better move the request for the latest row in the where clause. It is a lot faster and looks cleaner.

SELECT c.*,
FROM client AS c
LEFT JOIN client_calling_history AS cch ON cch.client_id = c.client_id
WHERE
   cch.cchid = (
      SELECT MAX(cchid)
      FROM client_calling_history
      WHERE client_id = c.client_id AND cal_event_id = c.cal_event_id
   )

Wow I'm almost in disbelief at how much of a performance difference this is. Not sure why that was so drastic yet, but so far it was so much quicker it that it feels like I messed up somewhere else...
I really wish I could +1 this more than once so it gets seen more. I've tested this quite a bit and somehow it makes my queries virtually instantaneous (WorkBench literally says 0.000 seconds, even with sql_no_cache set), whereas doing the search in the join took multiple seconds to complete. Still baffled, but I mean you can't argue with results like that.
You are directly joining 2 tables first and then filtering with WHERE. I think it is a massive performance issue if you have a million client and tens of millions of calling history. Because SQL will try to join 2 tables first and then filter down to the single client. I would rather filter the clients and related calling histories from the tables first in a sub-query and then join the tables.
I agree with @NickCoons. NULL values won't be returned because they're excluded by the where clause. How would you go about to include the NULL values and still keep the excellent performance of this query?
I think you can get the NULL records like this SELECT c.*, FROM client AS c LEFT JOIN client_calling_history AS cch ON cch.client_id = c.client_id WHERE ( SELECT MAX(cchid) FROM client_calling_history WHERE client_id = c.client_id AND cal_event_id = c.cal_event_id ) IS NULL OR cch.cchid = ( SELECT MAX(cchid) FROM client_calling_history WHERE client_id = c.client_id AND cal_event_id = c.cal_event_id )
J
John Magnolia

Presuming the autoincrement column in customer_data is named Id, you can do:

SELECT CONCAT(title,' ',forename,' ',surname) AS name *
FROM customer c
    INNER JOIN customer_data d 
        ON c.customer_id=d.customer_id
WHERE name LIKE '%Smith%'
    AND d.ID = (
                Select Max(D2.Id)
                From customer_data As D2
                Where D2.customer_id = D.customer_id
                )
LIMIT 10, 20

p
payne8

For anyone who must work with an older version of MySQL (pre-5.0 ish) you are unable to do sub-queries for this type of query. Here is the solution I was able to do and it seemed to work great.

SELECT MAX(d.id), d2.*, CONCAT(title,' ',forename,' ',surname) AS name
FROM customer AS c 
LEFT JOIN customer_data as d ON c.customer_id=d.customer_id 
LEFT JOIN customer_data as d2 ON d.id=d2.id
WHERE CONCAT(title, ' ', forename, ' ', surname) LIKE '%Smith%'
GROUP BY c.customer_id LIMIT 10, 20;

Essentially this is finding the max id of your data table joining it to the customer then joining the data table to the max id found. The reason for this is because selecting the max of a group doesn't guarantee that the rest of the data matches with the id unless you join it back onto itself.

I haven't tested this on newer versions of MySQL but it works on 4.0.30.


This is exquisite in its simplicity. Why is this the first time I have ever seen this approach? Note that EXPLAIN indicates that this uses a temporary table and filesort. Adding ORDER BY NULL at the end weeds out the filesort.
To my regret, my own, not-as-beautiful solution is 3.5 times as fast for my data. I used a subquery to select the main table plus the most recent IDs of the joined tables, and then an outer query that selects the subquery and reads actual data from the joined tables. I am joining 5 tables onto the main table, and testing with a where condition that selects 1000 records. Indexes are optimal.
I was using your solution with SELECT *, MAX(firstData.id), MAX(secondData.id) [...]. Logically, by changing to SELECT main.*, firstData2.*, secondData2.*, MAX(firstData.id), MAX(secondData.id), [...] I was able to make it significantly faster. This allows the first joins to read only from the index, rather than also having to read all data from the primary index. Now the pretty solution takes only 1.9 times as long as the subquery-based solution.
It does not work anymore in MySQL 5.7. Now d2.* will return data for the first row in the group, not the last. SELECT MAX(R1.id), R2.* FROM invoices I LEFT JOIN responses R1 ON I.id=R1.invoice_id LEFT JOIN responses R2 ON R1.id=R2.id GROUP BY I.id LIMIT 0,10
B
BenMorel

I know this question is old, but it's got a lot of attention over the years and I think it's missing a concept which may help someone in a similar case. I'm adding it here for completeness sake.

If you cannot modify your original database schema, then a lot of good answers have been provided and solve the problem just fine.

If you can, however, modify your schema, I would advise to add a field in your customer table that holds the id of the latest customer_data record for this customer:

CREATE TABLE customer (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  current_data_id INT UNSIGNED NULL DEFAULT NULL
);

CREATE TABLE customer_data (
   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
   customer_id INT UNSIGNED NOT NULL, 
   title VARCHAR(10) NOT NULL,
   forename VARCHAR(10) NOT NULL,
   surname VARCHAR(10) NOT NULL
);

Querying customers

Querying is as easy and fast as it can be:

SELECT c.*, d.title, d.forename, d.surname
FROM customer c
INNER JOIN customer_data d on d.id = c.current_data_id
WHERE ...;

The drawback is the extra complexity when creating or updating a customer.

Updating a customer

Whenever you want to update a customer, you insert a new record in the customer_data table, and update the customer record.

INSERT INTO customer_data (customer_id, title, forename, surname) VALUES(2, 'Mr', 'John', 'Smith');
UPDATE customer SET current_data_id = LAST_INSERT_ID() WHERE id = 2;

Creating a customer

Creating a customer is just a matter of inserting the customer entry, then running the same statements:

INSERT INTO customer () VALUES ();

SET @customer_id = LAST_INSERT_ID();
INSERT INTO customer_data (customer_id, title, forename, surname) VALUES(@customer_id, 'Mr', 'John', 'Smith');
UPDATE customer SET current_data_id = LAST_INSERT_ID() WHERE id = @customer_id;

Wrapping up

The extra complexity for creating/updating a customer might be fearsome, but it can easily be automated with triggers.

Finally, if you're using an ORM, this can be really easy to manage. The ORM can take care of inserting the values, updating the ids, and joining the two tables automatically for you.

Here is how your mutable Customer model would look like:

class Customer
{
    private int id;
    private CustomerData currentData;

    public Customer(String title, String forename, String surname)
    {
        this.update(title, forename, surname);
    }

    public void update(String title, String forename, String surname)
    {
        this.currentData = new CustomerData(this, title, forename, surname);
    }

    public String getTitle()
    {
        return this.currentData.getTitle();
    }

    public String getForename()
    {
        return this.currentData.getForename();
    }

    public String getSurname()
    {
        return this.currentData.getSurname();
    }
}

And your immutable CustomerData model, that contains only getters:

class CustomerData
{
    private int id;
    private Customer customer;
    private String title;
    private String forename;
    private String surname;

    public CustomerData(Customer customer, String title, String forename, String surname)
    {
        this.customer = customer;
        this.title    = title;
        this.forename = forename;
        this.surname  = surname;
    }

    public String getTitle()
    {
        return this.title;
    }

    public String getForename()
    {
        return this.forename;
    }

    public String getSurname()
    {
        return this.surname;
    }
}

I combined this approach with @payne8's solution (above) to get my desired result without any subqueries.
P
Pramendra Gupta
SELECT CONCAT(title,' ',forename,' ',surname) AS name * FROM customer c 
INNER JOIN customer_data d on c.id=d.customer_id WHERE name LIKE '%Smith%' 

i think you need to change c.customer_id to c.id

else update table structure


a
alexandre-rousseau

You can also do this

SELECT    CONCAT(title, ' ', forename, ' ', surname) AS name
FROM      customer c
LEFT JOIN  (
              SELECT * FROM  customer_data ORDER BY id DESC
          ) customer_data ON (customer_data.customer_id = c.customer_id)
GROUP BY  c.customer_id          
WHERE     CONCAT(title, ' ', forename, ' ', surname) LIKE '%Smith%' 
LIMIT     10, 20;

B
Burçin

It's a good idea that logging actual data into "customer_data" table. With this data you can select all data from "customer_data" table as you wish.


this answer is irrelevent to the quetsion.
S
Saghachi

Easy solution to left join the 1 most/least recent row is using select over ON phrase

SELECT *
FROM A
LEFT JOIN B
ON A.id = (SELECT MAX(id) FROM B WHERE id = A.id)

Where A.id is the auto-incremental primary key.