ChatGPT解决这个技术问题 Extra ChatGPT

What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?

sql

It seems to me that you can do the same thing in a SQL query using either NOT EXISTS, NOT IN, or LEFT JOIN WHERE IS NULL. For example:

SELECT a FROM table1 WHERE a NOT IN (SELECT a FROM table2)

SELECT a FROM table1 WHERE NOT EXISTS (SELECT * FROM table2 WHERE table1.a = table2.a)

SELECT a FROM table1 LEFT JOIN table2 ON table1.a = table2.a WHERE table1.a IS NULL

I'm not sure if I got all the syntax correct, but these are the general techniques I've seen. Why would I choose to use one over the other? Does performance differ...? Which one of these is the fastest / most efficient? (If it depends on implementation, when would I use each one?)

Many common SQL engines give you the ability to see an execution plan. You can often spot significant differences in efficiency for logically equivalent queries in this way. The success of any method depends on factors such as table size, what indexes are present, and others.
@wich: no database cares about what exactly you return inside the EXISTS clause. You may return *, NULL or whatever: all this will be optimized away.
@wich - why? Both here: techonthenet.com/sql/exists.php and here: msdn.microsoft.com/en-us/library/ms188336.aspx seem to use *...
@wich: this is not about "expressing interest". This is about the query parser demands you to put something between SELECT and FROM. And * is just easier to type. Yes, SQL does bear some resemblance to a natural language, but it is parsed and executed by a machine, a programmed machine. It's not that it will ever suddenly break into your cubicle and shout "stop demanding for the extra fields in an EXISTS query because I'm f**g sick of parsing them and then throwing them off!". It's OK with a computer, really.
@Quassnoi if you wrote code for the sole purpose of a machine interpreting it the code would look horrible, and unfortunately quite a few people work like that. If however you write code in another optic, writing code to express what you want the machine to do as a communiqué to your peers you will write better and more maintainable code. Be smart, write code for people, not for the computer.

Q
Quassnoi

NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server

NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL

NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle

NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

In a nutshell:

NOT IN is a little bit different: it never matches if there is but a single NULL in the list.

In MySQL, NOT EXISTS is a little bit less efficient

In SQL Server, LEFT JOIN / IS NULL is less efficient

In PostgreSQL, NOT IN is less efficient

In Oracle, all three methods are the same.


Thanks for the links! And thanks for the quick overview... My office is blocking the link for some reason :P but I'll check it out as soon as I get to a regular computer.
Another point is that if table1 .a contains NULL the EXISTS query will not return this row but the NOT IN query will if table2 is empty. NOT IN vs. NOT EXISTS Nullable Columns: SQL Server
@MartinSmith: NULL NOT IN () evaluates to true (not NULL), just as NOT EXISTS (NULL = column)
@Quassnoi - er, Good point, got that the wrong way round. The NOT EXISTS will always return the row but NOT IN will only do so if the sub query returns no rows.
G
Guffa

If the database is good at optimising the query, the two first will be transformed to something close to the third.

For simple situations like the ones in you question, there should be little or no difference, as they all will be executed as joins. In more complex queries, the database might not be able to make a join out of the not in and not exists queryes. In that case the queries will get a lot slower. On the other hand, a join may also perform badly if there is no index that can be used, so just because you use a join doesn't mean that you are safe. You would have to examine the execution plan of the query to tell if there may be any performance problems.


o
onedaywhen

Assuming you are avoiding nulls, they are all ways of writing an anti-join using Standard SQL.

An obvious omission is the equivalent using EXCEPT:

SELECT a FROM table1
EXCEPT
SELECT a FROM table2

Note in Oracle you need to use the MINUS operator (arguably a better name):

SELECT a FROM table1
MINUS
SELECT a FROM table2

Speaking of proprietary syntax, there may also be non-Standard equivalents worth investigating depending on the product you are using e.g. OUTER APPLY in SQL Server (something like):

SELECT t1.a
  FROM table1 t1
       OUTER APPLY 
       (
        SELECT t2.a
          FROM table2 t2
         WHERE t2.a = t1.a
       ) AS dt1
 WHERE dt1.a IS NULL;

j
j0k

When need to insert data in table with multi-field primary key, consider that it will be much faster (I tried in Access but I think in any Database) not to check that "not exists records with 'such' values in table", - rather just insert into table, and excess records (by the key) will not be inserted twice.


L
Lahiru Cooray

Performance perspective always avoid using inverse keywords like NOT IN, NOT EXISTS, ... Because to check the inverse items DBMS need to runs through all the available and drop the inverse selection.


And what do you propose as workaround when you actually need NOT?
Well when there's no option of cause we need to use NOT operations and thats why they are exists. Best practise is it avoid them when we have any other alternative solutions.
@onedaywhen, if an optimizer transforms a query and it returns the wrong result then it is a bug
@DuduMarkovitz: yes and if you contact the SQL Server team and they acknowledge the bug but refuse to fix it because they say doing so may make queries run slower, then it it is a bug you need to deal with.
@onedaywhen - This was not a hypothetical scenario I presume :-) Do you by any chance remember the bug details?