ChatGPT解决这个技术问题 Extra ChatGPT

SQL JOIN and different types of JOINs

What is a SQL JOIN and what are different types?


T
TylerH

An illustration from W3schools:

https://i.stack.imgur.com/eJhrt.png

https://i.stack.imgur.com/fZaT4.png

https://i.stack.imgur.com/n9Bps.png

https://i.stack.imgur.com/aFHpq.png


@KNU The w3fools should give credit from where they have taken the idea for the pictures. See A visualization of SQL joins by Jeff Atwood (yes, the one who co-authored SO) and the linked article by Ligaya Turmelle where Jeff got the idea and explanded it.
@avi left and right joins are similar, if you are not bothered which is the primary table the join is based on.
@philipxy: That's a weird definition (even if you're correct). But I'd rather go the other way round and start with cross join and then "build" inner join on top of it. After all, the mere concept of cross join invalidates these informal and inaccurate Venn diagram visualisations...
These pictures seem to imply that union is same as full outer join and intersection is same as inner join which is not correct as far as I know.
@DevDave, because contrary to popular belief - a picture is not worth a thousand words. See next answer.
p
pgSystemTester

What is SQL JOIN ?

SQL JOIN is a method to retrieve data from two or more database tables.

What are the different SQL JOINs ?

There are a total of five JOINs. They are :

  1. JOIN or INNER JOIN
  2. OUTER JOIN

     2.1 LEFT OUTER JOIN or LEFT JOIN
     2.2 RIGHT OUTER JOIN or RIGHT JOIN
     2.3 FULL OUTER JOIN or FULL JOIN

  3. NATURAL JOIN
  4. CROSS JOIN
  5. SELF JOIN

1. JOIN or INNER JOIN :

In this kind of a JOIN, we get all records that match the condition in both tables, and records in both tables that do not match are not reported.

In other words, INNER JOIN is based on the single fact that: ONLY the matching entries in BOTH the tables SHOULD be listed.

Note that a JOIN without any other JOIN keywords (like INNER, OUTER, LEFT, etc) is an INNER JOIN. In other words, JOIN is a Syntactic sugar for INNER JOIN (see: Difference between JOIN and INNER JOIN).

2. OUTER JOIN :

OUTER JOIN retrieves

Either, the matched rows from one table and all rows in the other table Or, all rows in all tables (it doesn't matter whether or not there is a match).

There are three kinds of Outer Join :

2.1 LEFT OUTER JOIN or LEFT JOIN

This join returns all the rows from the left table in conjunction with the matching rows from the right table. If there are no columns matching in the right table, it returns NULL values.

2.2 RIGHT OUTER JOIN or RIGHT JOIN

This JOIN returns all the rows from the right table in conjunction with the matching rows from the left table. If there are no columns matching in the left table, it returns NULL values.

2.3 FULL OUTER JOIN or FULL JOIN

This JOIN combines LEFT OUTER JOIN and RIGHT OUTER JOIN. It returns rows from either table when the conditions are met and returns NULL value when there is no match.

In other words, OUTER JOIN is based on the fact that: ONLY the matching entries in ONE OF the tables (RIGHT or LEFT) or BOTH of the tables(FULL) SHOULD be listed.

Note that `OUTER JOIN` is a loosened form of `INNER JOIN`.

3. NATURAL JOIN :

It is based on the two conditions :

the JOIN is made on all the columns with the same name for equality. Removes duplicate columns from the result.

This seems to be more of theoretical in nature and as a result (probably) most DBMS don't even bother supporting this.

4. CROSS JOIN :

It is the Cartesian product of the two tables involved. The result of a CROSS JOIN will not make sense in most of the situations. Moreover, we won't need this at all (or needs the least, to be precise).

5. SELF JOIN :

It is not a different form of JOIN, rather it is a JOIN (INNER, OUTER, etc) of a table to itself.

JOINs based on Operators

Depending on the operator used for a JOIN clause, there can be two types of JOINs. They are

Equi JOIN Theta JOIN

1. Equi JOIN :

For whatever JOIN type (INNER, OUTER, etc), if we use ONLY the equality operator (=), then we say that the JOIN is an EQUI JOIN.

2. Theta JOIN :

This is same as EQUI JOIN but it allows all other operators like >, <, >= etc.

Many consider both EQUI JOIN and Theta JOIN similar to INNER, OUTER etc JOINs. But I strongly believe that its a mistake and makes the ideas vague. Because INNER JOIN, OUTER JOIN etc are all connected with the tables and their data whereas EQUI JOIN and THETA JOIN are only connected with the operators we use in the former. Again, there are many who consider NATURAL JOIN as some sort of "peculiar" EQUI JOIN. In fact, it is true, because of the first condition I mentioned for NATURAL JOIN. However, we don't have to restrict that simply to NATURAL JOINs alone. INNER JOINs, OUTER JOINs etc could be an EQUI JOIN too.


There are relatively new LATERAL JOIN .. SELECT * FROM r1, LATERAL fx(r1)
While this seems reasonable, I don't think answers "what is an SQL join" in any way that conveys useful information. The answer as a whole is a reference written for people who already understand joins, not for the sorts of people who are asking those questions. It also omits references, both to support its claims (as is appropriate if making an authorative answer) and to provide additional explanation via external resources. If you're trying to write an authoritative answer to link new SQL users to, it might be worth filling in the blanks a bit, especially the "what is a join" part.
can you provide some examples?
D
Dev

Definition:

JOINS are way to query the data that combined together from multiple tables simultaneously.

Types of JOINS:

Concern to RDBMS there are 5-types of joins:

Equi-Join: Combines common records from two tables based on equality condition. Technically, Join made by using equality-operator (=) to compare values of Primary Key of one table and Foreign Key values of another table, hence result set includes common(matched) records from both tables. For implementation see INNER-JOIN.

Natural-Join: It is enhanced version of Equi-Join, in which SELECT operation omits duplicate column. For implementation see INNER-JOIN

Non-Equi-Join: It is reverse of Equi-join where joining condition is uses other than equal operator(=) e.g, !=, <=, >=, >, < or BETWEEN etc. For implementation see INNER-JOIN.

Self-Join:: A customized behavior of join where a table combined with itself; This is typically needed for querying self-referencing tables (or Unary relationship entity). For implementation see INNER-JOINs.

Cartesian Product: It cross combines all records of both tables without any condition. Technically, it returns the result set of a query without WHERE-Clause.

As per SQL concern and advancement, there are 3-types of joins and all RDBMS joins can be achieved using these types of joins.

INNER-JOIN: It merges(or combines) matched rows from two tables. The matching is done based on common columns of tables and their comparing operation. If equality based condition then: EQUI-JOIN performed, otherwise Non-EQUI-Join. OUTER-JOIN: It merges(or combines) matched rows from two tables and unmatched rows with NULL values. However, can customized selection of un-matched rows e.g, selecting unmatched row from first table or second table by sub-types: LEFT OUTER JOIN and RIGHT OUTER JOIN. 2.1. LEFT Outer JOIN (a.k.a, LEFT-JOIN): Returns matched rows from two tables and unmatched from the LEFT table(i.e, first table) only. 2.2. RIGHT Outer JOIN (a.k.a, RIGHT-JOIN): Returns matched rows from two tables and unmatched from the RIGHT table only. 2.3. FULL OUTER JOIN (a.k.a OUTER JOIN): Returns matched and unmatched from both tables. CROSS-JOIN: This join does not merges/combines instead it performs Cartesian product.

https://i.stack.imgur.com/rhqY7.jpg

For more information:

Examples:

1.1: INNER-JOIN: Equi-join implementation

SELECT  *
FROM Table1 A 
 INNER JOIN Table2 B ON A.<Primary-Key> =B.<Foreign-Key>;

1.2: INNER-JOIN: Natural-JOIN implementation

Select A.*, B.Col1, B.Col2          --But no B.ForeignKeyColumn in Select
 FROM Table1 A
 INNER JOIN Table2 B On A.Pk = B.Fk;

1.3: INNER-JOIN with NON-Equi-join implementation

Select *
 FROM Table1 A INNER JOIN Table2 B On A.Pk <= B.Fk;

1.4: INNER-JOIN with SELF-JOIN

Select *
 FROM Table1 A1 INNER JOIN Table1 A2 On A1.Pk = A2.Fk;

2.1: OUTER JOIN (full outer join)

Select *
 FROM Table1 A FULL OUTER JOIN Table2 B On A.Pk = B.Fk;

2.2: LEFT JOIN

Select *
 FROM Table1 A LEFT OUTER JOIN Table2 B On A.Pk = B.Fk;

2.3: RIGHT JOIN

Select *
 FROM Table1 A RIGHT OUTER JOIN Table2 B On A.Pk = B.Fk;

3.1: CROSS JOIN

Select *
 FROM TableA CROSS JOIN TableB;

3.2: CROSS JOIN-Self JOIN

Select *
 FROM Table1 A1 CROSS JOIN Table1 A2;

//OR//

Select *
 FROM Table1 A1,Table1 A2;

Labels "Table 1" & "Table 2" & the labels underneath are inappropriate, they are from illustrations of intersect/except/union; here the circles are the rows returned by left & right join, as the numbered labels say. The A X B picture is nonsense. cross join = inner join on 1=1 & is a special case of the first diagram.
It's worth mentioning SQL-92 defines the UNION JOIN. Now made obsolete in SQL:2003.
L
Lukas Eder

Interestingly most other answers suffer from these two problems:

They focus on basic forms of join only

They (ab)use Venn diagrams, which are an inaccurate tool for visualising joins (they're much better for unions).

I've recently written an article on the topic: A Probably Incomplete, Comprehensive Guide to the Many Different Ways to JOIN Tables in SQL, which I'll summarise here.

First and foremost: JOINs are cartesian products

This is why Venn diagrams explain them so inaccurately, because a JOIN creates a cartesian product between the two joined tables. Wikipedia illustrates it nicely:

https://i.stack.imgur.com/yrWin.png

The SQL syntax for cartesian products is CROSS JOIN. For example:

SELECT *

-- This just generates all the days in January 2017
FROM generate_series(
  '2017-01-01'::TIMESTAMP,
  '2017-01-01'::TIMESTAMP + INTERVAL '1 month -1 day',
  INTERVAL '1 day'
) AS days(day)

-- Here, we're combining all days with all departments
CROSS JOIN departments

Which combines all rows from one table with all rows from the other table:

Source:

+--------+   +------------+
| day    |   | department |
+--------+   +------------+
| Jan 01 |   | Dept 1     |
| Jan 02 |   | Dept 2     |
| ...    |   | Dept 3     |
| Jan 30 |   +------------+
| Jan 31 |
+--------+

Result:

+--------+------------+
| day    | department |
+--------+------------+
| Jan 01 | Dept 1     |
| Jan 01 | Dept 2     |
| Jan 01 | Dept 3     |
| Jan 02 | Dept 1     |
| Jan 02 | Dept 2     |
| Jan 02 | Dept 3     |
| ...    | ...        |
| Jan 31 | Dept 1     |
| Jan 31 | Dept 2     |
| Jan 31 | Dept 3     |
+--------+------------+

If we just write a comma separated list of tables, we'll get the same:

-- CROSS JOINing two tables:
SELECT * FROM table1, table2

INNER JOIN (Theta-JOIN)

An INNER JOIN is just a filtered CROSS JOIN where the filter predicate is called Theta in relational algebra.

For instance:

SELECT *

-- Same as before
FROM generate_series(
  '2017-01-01'::TIMESTAMP,
  '2017-01-01'::TIMESTAMP + INTERVAL '1 month -1 day',
  INTERVAL '1 day'
) AS days(day)

-- Now, exclude all days/departments combinations for
-- days before the department was created
JOIN departments AS d ON day >= d.created_at

Note that the keyword INNER is optional (except in MS Access).

(look at the article for result examples)

EQUI JOIN

A special kind of Theta-JOIN is equi JOIN, which we use most. The predicate joins the primary key of one table with the foreign key of another table. If we use the Sakila database for illustration, we can write:

SELECT *
FROM actor AS a
JOIN film_actor AS fa ON a.actor_id = fa.actor_id
JOIN film AS f ON f.film_id = fa.film_id

This combines all actors with their films.

Or also, on some databases:

SELECT *
FROM actor
JOIN film_actor USING (actor_id)
JOIN film USING (film_id)

The USING() syntax allows for specifying a column that must be present on either side of a JOIN operation's tables and creates an equality predicate on those two columns.

NATURAL JOIN

Other answers have listed this "JOIN type" separately, but that doesn't make sense. It's just a syntax sugar form for equi JOIN, which is a special case of Theta-JOIN or INNER JOIN. NATURAL JOIN simply collects all columns that are common to both tables being joined and joins USING() those columns. Which is hardly ever useful, because of accidental matches (like LAST_UPDATE columns in the Sakila database).

Here's the syntax:

SELECT *
FROM actor
NATURAL JOIN film_actor
NATURAL JOIN film

OUTER JOIN

Now, OUTER JOIN is a bit different from INNER JOIN as it creates a UNION of several cartesian products. We can write:

-- Convenient syntax:
SELECT *
FROM a LEFT JOIN b ON <predicate>

-- Cumbersome, equivalent syntax:
SELECT a.*, b.*
FROM a JOIN b ON <predicate>
UNION ALL
SELECT a.*, NULL, NULL, ..., NULL
FROM a
WHERE NOT EXISTS (
  SELECT * FROM b WHERE <predicate>
)

No one wants to write the latter, so we write OUTER JOIN (which is usually better optimised by databases).

Like INNER, the keyword OUTER is optional, here.

OUTER JOIN comes in three flavours:

LEFT [ OUTER ] JOIN: The left table of the JOIN expression is added to the union as shown above.

RIGHT [ OUTER ] JOIN: The right table of the JOIN expression is added to the union as shown above.

FULL [ OUTER ] JOIN: Both tables of the JOIN expression are added to the union as shown above.

All of these can be combined with the keyword USING() or with NATURAL (I've actually had a real world use-case for a NATURAL FULL JOIN recently)

Alternative syntaxes

There are some historic, deprecated syntaxes in Oracle and SQL Server, which supported OUTER JOIN already before the SQL standard had a syntax for this:

-- Oracle
SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)

-- SQL Server
SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id *= fa.actor_id
AND fa.film_id *= f.film_id

Having said so, don't use this syntax. I just list this here so you can recognise it from old blog posts / legacy code.

Partitioned OUTER JOIN

Few people know this, but the SQL standard specifies partitioned OUTER JOIN (and Oracle implements it). You can write things like this:

WITH

  -- Using CONNECT BY to generate all dates in January
  days(day) AS (
    SELECT DATE '2017-01-01' + LEVEL - 1
    FROM dual
    CONNECT BY LEVEL <= 31
  ),

  -- Our departments
  departments(department, created_at) AS (
    SELECT 'Dept 1', DATE '2017-01-10' FROM dual UNION ALL
    SELECT 'Dept 2', DATE '2017-01-11' FROM dual UNION ALL
    SELECT 'Dept 3', DATE '2017-01-12' FROM dual UNION ALL
    SELECT 'Dept 4', DATE '2017-04-01' FROM dual UNION ALL
    SELECT 'Dept 5', DATE '2017-04-02' FROM dual
  )
SELECT *
FROM days 
LEFT JOIN departments 
  PARTITION BY (department) -- This is where the magic happens
  ON day >= created_at

Parts of the result:

+--------+------------+------------+
| day    | department | created_at |
+--------+------------+------------+
| Jan 01 | Dept 1     |            | -- Didn't match, but still get row
| Jan 02 | Dept 1     |            | -- Didn't match, but still get row
| ...    | Dept 1     |            | -- Didn't match, but still get row
| Jan 09 | Dept 1     |            | -- Didn't match, but still get row
| Jan 10 | Dept 1     | Jan 10     | -- Matches, so get join result
| Jan 11 | Dept 1     | Jan 10     | -- Matches, so get join result
| Jan 12 | Dept 1     | Jan 10     | -- Matches, so get join result
| ...    | Dept 1     | Jan 10     | -- Matches, so get join result
| Jan 31 | Dept 1     | Jan 10     | -- Matches, so get join result

The point here is that all rows from the partitioned side of the join will wind up in the result regardless if the JOIN matched anything on the "other side of the JOIN". Long story short: This is to fill up sparse data in reports. Very useful!

SEMI JOIN

Seriously? No other answer got this? Of course not, because it doesn't have a native syntax in SQL, unfortunately (just like ANTI JOIN below). But we can use IN() and EXISTS(), e.g. to find all actors who have played in films:

SELECT *
FROM actor a
WHERE EXISTS (
  SELECT * FROM film_actor fa
  WHERE a.actor_id = fa.actor_id
)

The WHERE a.actor_id = fa.actor_id predicate acts as the semi join predicate. If you don't believe it, check out execution plans, e.g. in Oracle. You'll see that the database executes a SEMI JOIN operation, not the EXISTS() predicate.

https://i.stack.imgur.com/u0oe2.png

ANTI JOIN

This is just the opposite of SEMI JOIN (be careful not to use NOT IN though, as it has an important caveat)

Here are all the actors without films:

SELECT *
FROM actor a
WHERE NOT EXISTS (
  SELECT * FROM film_actor fa
  WHERE a.actor_id = fa.actor_id
)

Some folks (especially MySQL people) also write ANTI JOIN like this:

SELECT *
FROM actor a
LEFT JOIN film_actor fa
USING (actor_id)
WHERE film_id IS NULL

I think the historic reason is performance.

LATERAL JOIN

OMG, this one is too cool. I'm the only one to mention it? Here's a cool query:

SELECT a.first_name, a.last_name, f.*
FROM actor AS a
LEFT OUTER JOIN LATERAL (
  SELECT f.title, SUM(amount) AS revenue
  FROM film AS f
  JOIN film_actor AS fa USING (film_id)
  JOIN inventory AS i USING (film_id)
  JOIN rental AS r USING (inventory_id)
  JOIN payment AS p USING (rental_id)
  WHERE fa.actor_id = a.actor_id -- JOIN predicate with the outer query!
  GROUP BY f.film_id
  ORDER BY revenue DESC
  LIMIT 5
) AS f
ON true

It will find the TOP 5 revenue producing films per actor. Every time you need a TOP-N-per-something query, LATERAL JOIN will be your friend. If you're a SQL Server person, then you know this JOIN type under the name APPLY

SELECT a.first_name, a.last_name, f.*
FROM actor AS a
OUTER APPLY (
  SELECT f.title, SUM(amount) AS revenue
  FROM film AS f
  JOIN film_actor AS fa ON f.film_id = fa.film_id
  JOIN inventory AS i ON f.film_id = i.film_id
  JOIN rental AS r ON i.inventory_id = r.inventory_id
  JOIN payment AS p ON r.rental_id = p.rental_id
  WHERE fa.actor_id = a.actor_id -- JOIN predicate with the outer query!
  GROUP BY f.film_id
  ORDER BY revenue DESC
  LIMIT 5
) AS f

OK, perhaps that's cheating, because a LATERAL JOIN or APPLY expression is really a "correlated subquery" that produces several rows. But if we allow for "correlated subqueries", we can also talk about...

MULTISET

This is only really implemented by Oracle and Informix (to my knowledge), but it can be emulated in PostgreSQL using arrays and/or XML and in SQL Server using XML.

MULTISET produces a correlated subquery and nests the resulting set of rows in the outer query. The below query selects all actors and for each actor collects their films in a nested collection:

SELECT a.*, MULTISET (
  SELECT f.*
  FROM film AS f
  JOIN film_actor AS fa USING (film_id)
  WHERE a.actor_id = fa.actor_id
) AS films
FROM actor

As you have seen, there are more types of JOIN than just the "boring" INNER, OUTER, and CROSS JOIN that are usually mentioned. More details in my article. And please, stop using Venn diagrams to illustrate them.


Equijoin is the special case of theta-join where theta is equality. Theta-join is analogous to a special case of inner join where the on is a theta comparison on of a column from each. Some decades after Codd defined them some textbook(s) misdefined theta join as a generalization that is the analogue of inner join.
@philipxy: Anything specific I should change in my answer? You could suggest an edit...
T
TylerH

https://i.stack.imgur.com/iKBL4.png


@Niraj The circles A & B do not contain the rows of A & B. They are blindly copied from elsewhere without credit. Cross join is included in the inner join case, it's inner join on 1=1. In what way are these parts of the picture "perfect"?
@philipxy Sorry but I am not bothered about if it is copied from elsewhere. and I am not sure which is not correct in the above picture. for me this is ok. Cross join is not described here. It is not included in an inner join..
T
TylerH

I'm going to push my pet peeve: the USING keyword.

If both tables on both sides of the JOIN have their foreign keys properly named (ie, same name, not just "id) then this can be used:

SELECT ...
FROM customers JOIN orders USING (customer_id)

I find this very practical, readable, and not used often enough.


This does not answer the question. It belongs as a comment either under the question or one of the answers.