What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

sql postgresql subquery lateral-join

Since Postgres came out with the ability to do LATERAL joins, I've been reading up on it, since I currently do complex data dumps for my team with lots of inefficient subqueries that make the overall query take four minutes or more.

I understand that LATERAL joins may be able to help me, but even after reading articles like this one from Heap Analytics, I still don't quite follow.

What is the use case for a LATERAL join? What is the difference between a LATERAL join and a subquery?

blog.heapanalytics.com/… and explainextended.com/2009/07/16/inner-join-vs-cross-apply (SQL Server's apply is the same as the lateral from the SQL standard)

The LATERAL keyword belongs to its following derived table (subquery), i.e. it's not a JOIN type.

Erwin Brandstetter

What is a LATERAL join?

The feature was introduced with PostgreSQL 9.3. The manual:

Subqueries appearing in FROM can be preceded by the key word LATERAL. This allows them to reference columns provided by preceding FROM items. (Without LATERAL, each subquery is evaluated independently and so cannot cross-reference any other FROM item.) Table functions appearing in FROM can also be preceded by the key word LATERAL, but for functions the key word is optional; the function's arguments can contain references to columns provided by preceding FROM items in any case.

Basic code examples are given there.

More like a correlated subquery

A LATERAL join is more like a correlated subquery, not a plain subquery, in that expressions to the right of a LATERAL join are evaluated once for each row left of it - just like a correlated subquery - while a plain subquery (table expression) is evaluated once only. (The query planner has ways to optimize performance for either, though.)
Related answer with code examples for both side by side, solving the same problem:

Optimize GROUP BY query to retrieve latest row per user

For returning more than one column, a LATERAL join is typically simpler, cleaner and faster.
Also, remember that the equivalent of a correlated subquery is LEFT JOIN LATERAL ... ON true:

Call a set-returning function with an array argument multiple times

Things a subquery can't do

There are things that a LATERAL join can do, but a (correlated) subquery cannot (easily). A correlated subquery can only return a single value, not multiple columns and not multiple rows - with the exception of bare function calls (which multiply result rows if they return multiple rows). But even certain set‑returning functions are only allowed in the FROM clause. Like unnest() with multiple parameters in Postgres 9.4 or later. The manual:

This is only allowed in the FROM clause;

So this works, but cannot (easily) be replaced with a subquery:

CREATE TABLE tbl (a1 int[], a2 int[]);
SELECT * FROM tbl, unnest(a1, a2) u(elem1, elem2);  -- implicit LATERAL

The comma (,) in the FROM clause is short notation for CROSS JOIN.
LATERAL is assumed automatically for table functions.
About the special case of UNNEST( array_expression [, ... ] ):

How do you declare a set-returning-function to only be allowed in the FROM clause?

Set-returning functions in the SELECT list

You can also use set-returning functions like unnest() in the SELECT list directly. This used to exhibit surprising behavior with more than one such function in the same SELECT list up to Postgres 9.6. But it has finally been sanitized with Postgres 10 and is a valid alternative now (even if not standard SQL). See:

What is the expected behaviour for multiple set-returning functions in SELECT clause?

Building on above example:

SELECT *, unnest(a1) AS elem1, unnest(a2) AS elem2
FROM   tbl;

Comparison:

dbfiddle for pg 9.6 here
dbfiddle for pg 10 here

Clarify misinformation

The manual:

For the INNER and OUTER join types, a join condition must be specified, namely exactly one of NATURAL, ON join_condition, or USING (join_column [, ...]). See below for the meaning. For CROSS JOIN, none of these clauses can appear.

So these two queries are valid (even if not particularly useful):

SELECT *
FROM   tbl t
LEFT   JOIN LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t ON TRUE;

SELECT *
FROM   tbl t, LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t;

While this one is not:

SELECT *
FROM   tbl t
LEFT   JOIN LATERAL (SELECT * FROM b WHERE b.t_id = t.t_id) t;

That's why Andomar's code example is correct (the CROSS JOIN does not require a join condition) and Attila's is was not.

There are some things a subquery can do a LATERAL JOIN can't do. Like window functions. As here

@EvanCarroll: I couldn't find any correlated subqueries in the link. But I added another answer to demonstrate a window function in a LATERAL subquery: gis.stackexchange.com/a/230070/7244

Cleaner and faster? Like magnitudes faster in some cases. I had a query that went from days to seconds after switching to LATERAL.

Andomar

The difference between a non-lateral and a lateral join lies in whether you can look to the left hand table's row. For example:

select  *
from    table1 t1
cross join lateral
        (
        select  *
        from    t2
        where   t1.col1 = t2.col1 -- Only allowed because of lateral
        ) sub

This "outward looking" means that the subquery has to be evaluated more than once. After all, t1.col1 can assume many values.

By contrast, the subquery after a non-lateral join can be evaluated once:

select  *
from    table1 t1
cross join
        (
        select  *
        from    t2
        where   t2.col1 = 42 -- No reference to outer query
        ) sub

As is required without lateral, the inner query does not depend in any way on the outer query. A lateral query is an example of a correlated query, because of its relation with rows outside the query itself.

how does select * from table1 left join t2 using (col1) compare? It's unclear to me when a join using / on condition is insufficient and it would make more sense to use lateral.

Simply stated. Easy on the eyes. Thank you!

Vlad Mihalcea

Database table

Having the following blog database table storing the blogs hosted by our platform:

https://i.stack.imgur.com/5MZLr.png

And, we have two blogs currently hosted:

id created_on title url 1 2013-09-30 Vlad Mihalcea's Blog https://vladmihalcea.com 2 2017-01-22 Hypersistence https://hypersistence.io

Getting our report without using the SQL LATERAL JOIN

We need to build a report that extracts the following data from the blog table:

the blog id

the blog age, in years

the date for the next blog anniversary

the number of days remaining until the next anniversary.

If you're using PostgreSQL, then you have to execute the following SQL query:

SELECT
  b.id as blog_id,
  extract(
    YEAR FROM age(now(), b.created_on)
  ) AS age_in_years,
  date(
    created_on + (
      extract(YEAR FROM age(now(), b.created_on)) + 1
    ) * interval '1 year'
  ) AS next_anniversary,
  date(
    created_on + (
      extract(YEAR FROM age(now(), b.created_on)) + 1
    ) * interval '1 year'
  ) - date(now()) AS days_to_next_anniversary
FROM blog b
ORDER BY blog_id

As you can see, the age_in_years has to be defined three times because you need it when calculating the next_anniversary and days_to_next_anniversary values.

And, that's exactly where LATERAL JOIN can help us.

Getting the report using the SQL LATERAL JOIN

The following relational database systems support the LATERAL JOIN syntax:

Oracle since 12c

PostgreSQL since 9.3

MySQL since 8.0.14

SQL Server can emulate the LATERAL JOIN using CROSS APPLY and OUTER APPLY.

LATERAL JOIN allows us to reuse the age_in_years value and just pass it further when calculating the next_anniversary and days_to_next_anniversary values.

The previous query can be rewritten to use the LATERAL JOIN, as follows:

SELECT
  b.id as blog_id,
  age_in_years,
  date(
    created_on + (age_in_years + 1) * interval '1 year'
  ) AS next_anniversary,
  date(
    created_on + (age_in_years + 1) * interval '1 year'
  ) - date(now()) AS days_to_next_anniversary
FROM blog b
CROSS JOIN LATERAL (
  SELECT
    cast(
      extract(YEAR FROM age(now(), b.created_on)) AS int
    ) AS age_in_years
) AS t
ORDER BY blog_id

And, the age_in_years value can be calculated one and reused for the next_anniversary and days_to_next_anniversary computations:

blog_id age_in_years next_anniversary days_to_next_anniversary 1 7 2021-09-30 295 2 3 2021-01-22 44

Much better, right?

The age_in_years is calculated for every record of the blog table. So, it works like a correlated subquery, but the subquery records are joined with the primary table and, for this reason, we can reference the columns produced by the subquery.

Community

First, Lateral and Cross Apply is same thing. Therefore you may also read about Cross Apply. Since it was implemented in SQL Server for ages, you will find more information about it then Lateral.

Second, according to my understanding, there is nothing you can not do using subquery instead of using lateral. But:

Consider following query.

Select A.*
, (Select B.Column1 from B where B.Fk1 = A.PK and Limit 1)
, (Select B.Column2 from B where B.Fk1 = A.PK and Limit 1)
FROM A

You can use lateral in this condition.

Select A.*
, x.Column1
, x.Column2
FROM A LEFT JOIN LATERAL (
  Select B.Column1,B.Column2,B.Fk1 from B  Limit 1
) x ON X.Fk1 = A.PK

In this query you can not use normal join, due to limit clause. Lateral or Cross Apply can be used when there is not simple join condition.

There are more usages for lateral or cross apply but this is most common one I found.

Exactly, I wonder why PostgreSQL uses lateral instead of apply. Perhaps Microsoft patented the syntax?

@Andomar AFAIK lateral is in the SQL standard but apply isn't.

The LEFT JOIN requires a join condition. Make it ON TRUE unless you want to restrict somehow.

Erwin is right, you'll get an error unless you use a cross join or an on condition

@Andomar: Spurred by this misinformation I added another answer to clarify.

Theodore R. Smith

One thing no one has pointed out is that you can use LATERAL queries to apply a user-defined function on every selected row.

For instance:

CREATE OR REPLACE FUNCTION delete_company(companyId varchar(255))
RETURNS void AS $$
    BEGIN
        DELETE FROM company_settings WHERE "company_id"=company_id;
        DELETE FROM users WHERE "company_id"=companyId;
        DELETE FROM companies WHERE id=companyId;
    END; 
$$ LANGUAGE plpgsql;

SELECT * FROM (
    SELECT id, name, created_at FROM companies WHERE created_at < '2018-01-01'
) c, LATERAL delete_company(c.id);

That's the only way I know how to do this sort of thing in PostgreSQL.

What is the difference between LATERAL JOIN and a subquery in PostgreSQL?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US