ChatGPT解决这个技术问题 Extra ChatGPT

How to select all records from one table that do not exist in another table?

table1 (id, name) table2 (id, name)

Query:

SELECT name   
FROM table2  
-- that are not in table1 already
Look at the solution with UNION at the bottom which is orders of magnitude faster than any other solution listed here.

K
Kris
SELECT t1.name
FROM table1 t1
LEFT JOIN table2 t2 ON t2.name = t1.name
WHERE t2.name IS NULL

Q: What is happening here?

A: Conceptually, we select all rows from table1 and for each row we attempt to find a row in table2 with the same value for the name column. If there is no such row, we just leave the table2 portion of our result empty for that row. Then we constrain our selection by picking only those rows in the result where the matching row does not exist. Finally, We ignore all fields from our result except for the name column (the one we are sure that exists, from table1).

While it may not be the most performant method possible in all cases, it should work in basically every database engine ever that attempts to implement ANSI 92 SQL


@z-boss: It's also the least performant on SQL Server: explainextended.com/2009/09/15/…
@BunkerBoy: A left join allows rows on the right to not exist without that affecting the inclusion of rows on the left. An inner join requires rows on the left and the right to be present. What i'm doing here is applying some logic to basically get the reverse selection of an inner join.
omg this helped visualize very easily, others had put it like 5 different ways but this helped. simple: first you get left join, everything in A, and everything in B that matches A. But as happens in left join fields which don't join are just null. Then you tell, ok i only want that are null. This way you now have all rows in A that didn't have a match In B
It should be noted that this solutions (accepted and voted up) is the only one, I think, the could be edited for a scenario where more than one field comes into play. Specifically, I am returning field, field 2, field 3 from table one where the combination of field ad field2 is not in the second table. Other than modifying the join in this answer, I do not see a way to do it with some of the other "more efficient answers" argued for below
Just make sure you use "WHERE t2.name IS NULL" and not "AND t2.name IS NULL" because "and" will not give correct results. I don't really understand why but it's a fact, I tested it.
t
txemsukr

You can either do

SELECT name
FROM table2
WHERE name NOT IN
    (SELECT name 
     FROM table1)

or

SELECT name 
FROM table2 
WHERE NOT EXISTS 
    (SELECT * 
     FROM table1 
     WHERE table1.name = table2.name)

See this question for 3 techniques to accomplish this


This is incredibly slow with large amounts of data.
Yeah, indeed it is very slow
Shouldn't it be "from table1" in the subquery of the not exists query.
Very confused at how this got so many upvotes. I find it very hard to think of a reason to ever use this, when there is an approach to this problem that is incredibly faster with roughly the same number of keystrokes.
@searchengine27 Is it really that slow when we have query optimizers?
F
Felipe Saldanha

I don't have enough rep points to vote up froadie's answer. But I have to disagree with the comments on Kris's answer. The following answer:

SELECT name
FROM table2
WHERE name NOT IN
    (SELECT name 
     FROM table1)

Is FAR more efficient in practice. I don't know why, but I'm running it against 800k+ records and the difference is tremendous with the advantage given to the 2nd answer posted above. Just my $0.02.


In the NOT IN query the subquery is performed only once, in the EXISTS query the subquery is performed for every row
you are awesome :) this way I convert my 25 sec query using left join to just 0.1 sec
answers are not in any specific order, so second answer does not mean what you thought it meant.
I think this also may be the only solution if you're looking to add some extra filters/criteria to the sub query.
A
Anuraj
SELECT <column_list>
FROM TABLEA a
LEFTJOIN TABLEB b 
ON a.Key = b.Key 
WHERE b.Key IS NULL;

https://i.stack.imgur.com/mjS7g.png

https://www.cloudways.com/blog/how-to-join-two-tables-mysql/


Too bad Join diagrams are much less clear and much harder to understand intuitively than Venn diagrams.
Thank you for the diagrams
z
zkanoca

This is pure set theory which you can achieve with the minus operation.

select id, name from table1
minus
select id, name from table2

Do you think this is much efficient than left join ?
It should be. The minus command is designed for this exact situation. Of course the only way to judge for any particular data set is to try it both ways and see which runs faster.
In T-SQL, the set operator is "except". This is very convenient for me and has not caused any slowdown.
In SQLite, the "minus" operator is also "except".
MySQL do not support MINUS operator.
D
Diligent Key Presser

Here's what worked best for me.

SELECT *
FROM @T1
EXCEPT
SELECT a.*
FROM @T1 a
JOIN @T2 b ON a.ID = b.ID

This was more than twice as fast as any other method I tried.


Thanks, This work well with large amount of Data too! But I'm just wondering about the 'Except' term.
767ms for me on 5k records across 200k records. Everything else took minutes.
d
dekkard

Watch out for pitfalls. If the field Name in Table1 contain Nulls you are in for surprises. Better is:

SELECT name
FROM table2
WHERE name NOT IN
    (SELECT ISNULL(name ,'')
     FROM table1)

COALESCE > ISNULL (ISNULL is a useless T-SQL addition to the language that does nothing new or better than COALESCE)
I
Izzy

You can use EXCEPT in mssql or MINUS in oracle, they are identical according to :

http://blog.sqlauthority.com/2008/08/07/sql-server-except-clause-in-sql-server-is-similar-to-minus-clause-in-oracle/


O
OhBeWise

That work sharp for me

SELECT * 
FROM [dbo].[table1] t1
LEFT JOIN [dbo].[table2] t2 ON t1.[t1_ID] = t2.[t2_ID]
WHERE t2.[t2_ID] IS NULL

K
Kaiser

See query:

SELECT * FROM Table1 WHERE
id NOT IN (SELECT 
        e.id
    FROM
        Table1 e
            INNER JOIN
        Table2 s ON e.id = s.id);

Conceptually would be: Fetching the matching records in subquery and then in main query fetching the records which are not in subquery.


T
Tomerikoo

First define alias of table like t1 and t2. After that get record of second table. After that match that record using where condition:

SELECT name FROM table2 as t2
WHERE NOT EXISTS (SELECT * FROM table1 as t1 WHERE t1.name = t2.name)

Yours is the same that answer. Please read all the answers especially before answering old questions.
the professional of the others answers replication!
w
w.Daya

You can use following query structure :

SELECT t1.name FROM table1 t1 JOIN table2 t2 ON t2.fk_id != t1.id;

table1 :

id name 1 Amit 2 Sagar

table2 :

id fk_id email 1 1 amit@ma.com

Output:

name Sagar


A
Adrian K

All the above queries are incredibly slow on big tables. A change of strategy is needed. Here there is the code I used for a DB of mine, you can transliterate changing the fields and table names.

This is the strategy: you create two implicit temporary tables and make a union of them.

The first temporary table comes from a selection of all the rows of the first original table the fields of which you wanna control that are NOT present in the second original table. The second implicit temporary table contains all the rows of the two original tables that have a match on identical values of the column/field you wanna control. The result of the union is a table that has more than one row with the same control field value in case there is a match for that value on the two original tables (one coming from the first select, the second coming from the second select) and just one row with the control column value in case of the value of the first original table not matching any value of the second original table. You group and count. When the count is 1 there is not match and, finally, you select just the rows with the count equal to 1.

Seems not elegant, but it is orders of magnitude faster than all the above solutions.

IMPORTANT NOTE: enable the INDEX on the columns to be checked.

SELECT name, source, id
FROM 
(
    SELECT name, "active_ingredients" as source, active_ingredients.id as id 
        FROM active_ingredients

    UNION ALL
        
    SELECT active_ingredients.name as name, "UNII_database" as source, temp_active_ingredients_aliases.id as id 
    FROM active_ingredients
    INNER JOIN temp_active_ingredients_aliases ON temp_active_ingredients_aliases.alias_name = active_ingredients.name

) tbl
GROUP BY name
HAVING count(*) = 1
ORDER BY name

A
Adrian Roth

I'm going to repost (since I'm not cool enough yet to comment) in the correct answer....in case anyone else thought it needed better explaining.

SELECT temp_table_1.name
FROM original_table_1 temp_table_1
LEFT JOIN original_table_2 temp_table_2 ON temp_table_2.name = temp_table_1.name
WHERE temp_table_2.name IS NULL

And I've seen syntax in FROM needing commas between table names in mySQL but in sqlLite it seemed to prefer the space.

The bottom line is when you use bad variable names it leaves questions. My variables should make more sense. And someone should explain why we need a comma or no comma.


N
Nauman Bashir

I tried all solution above but did not work in my case. The following query worked for me.

SELECT name FROM table_1 WHERE name NOT IN (SELECT a.name FROM table_1 AS a 
LEFT JOIN table_2 as b ON a.name = b.name WHERE ANY FURTHER CONDITION );