ChatGPT解决这个技术问题 Extra ChatGPT

When to use STRAIGHT_JOIN with MySQL

I just had a fairly complex query I was working with and it was taking 8 seconds to run. EXPLAIN was showing a weird table order and my indexes were not all being used even with the FORCE INDEX hint. I came across the STRAIGHT_JOIN join keyword and started replacing some of my INNER JOIN keywords with it. I noticed considerable speed improvement. Eventually I just replaced all my INNER JOIN keywords with STRAIGHT_JOIN for this query and it now runs in .01 seconds.

My question is when do you use STRAIGHT_JOIN and when do you use INNER JOIN? Is there any reason to not use STRAIGHT_JOIN if you are writing good queries?

upvoting for the liono avatar

n
nathan

I wouldn't recommend using STRAIGHT_JOIN without a good reason. My own experience is that the MySQL query optimizer chooses a poor query plan more often than I'd like, but not often enough that you should just bypass it in general, which is what you would be doing if you always used STRAIGHT_JOIN.

My recommendation is to leave all queries as regular JOINs. If you discover that one query is using a sub-optimal query plan, I would suggest first trying to rewrite or re-structure the query a bit to see if the optimizer will then pick a better query plan. Also, for innodb at least, make sure it's not just that your index statistics are out-of-date (ANALYZE TABLE). That can cause the optimizer to choose a poor query plan. Optimizer hints should generally be your last resort.

Another reason not to use query hints is that your data distribution may change over time, or your index selectivity may change, etc. as your table grows. Your query hints that are optimal now, may become sub-optimal over time. But the optimizer will be unable to adapt the query plan because of your now outdated hints. You stay more flexible if you allow the optimizer to make the decisions.


This answer doesn't actually explain when to use straight_join.
I think if one is happy with predictable execution times and STRAIGHT_JOINs, they should be using it.
For me, running ANALYZE TABLE correct the query plan and no longer need to use query hint.
B
Barry Kelly

Here's a scenario that came up just recently at work.

Consider three tables, A, B, C.

A has 3,000 rows; B has 300,000,000 rows; and C has 2,000 rows.

Foreign keys are defined: B(a_id), B(c_id).

Suppose you had a query that looks like this:

select a.id, c.id
from a
join b on b.a_id = a.id
join c on c.id = b.c_id

In my experience, MySQL may choose to go C -> B -> A in this case. C is smaller than A and B is enormous, and they're all equijoins.

The trouble is MySQL doesn't necessarily take into account the size of the intersection between (C.id and B.c_id) vs (A.id and B.a_id). If the join between B and C returns just as many rows as B, then it's a very poor choice; if starting with A would have filtered down B to as many rows as A, then it would have been a much better choice. straight_join could be used to force this order like this:

select a.id, c.id
from a
straight_join b on b.a_id = a.id
join c on c.id = b.c_id

Now a must be joined on before b.

Generally you want to do your joins in an order that minimizes the number of rows in the resulting set. So starting with a small table and joining such that the resulting join will also be small, is ideal. Things go pear-shaped if starting with a small table and joining it to a bigger table ends up just as large as the big table.

It's stats dependent though. If the data distribution changes, the calculation may change. It's also dependent on the implementation details of the join mechanism.

The worst cases that I've seen for MySQL that all but required straight_join or aggressive index hinting are queries that paginate over a lot of data in a strict sort order with light filtering. MySQL strongly prefers to use indexes for any filters and joins over sorts; this makes sense because most people aren't trying to sort the whole database but rather have a limited subset of rows that are responsive to the query, and sorting a limited subset is much faster than filtering the whole table, no matter whether it's sorted or not. In this case, putting straight join immediately after the table that had the indexed column I wanted to sort on fixed things.


How would you use straight join to fix the problem?
@Hannele straight_join evaluates left table before right. So if you want to go from A -> B -> C in my example, the first join keyword could be replaced with straight_join.
Ah neat. It would be useful to include that as an example in your answer :)
Fantastic, thank you for providing an idiomatic example data set.
Just to add, in my experience, STRAIGHT_JOIN is not simply always joining from the left table to the right, but from any of the tables on the left side of it to the given table.
j
jjclarkson

From MySQL JOIN reference:

"STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order."


Thanks, but I already read the MySQL manual on it. Hoping for some further explaination.
I
IAdapter

MySQL isn’t necessarilly good at choosing the join order in complex queries. By specifying a complex query as a straight_join the query executes the joins in the order they’re specified. By placing the table to be the least common denominator first and specifying straight_join you are able to improve the query performance.


H
Himanshu

STRAIGHT_JOIN, using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.


What are outer loop and inner loop ?
@IstiaqueAhmed tables are joined by nested loops (take first row from table A and loop throw table B then take second row ... and so on . Here table A is at the outer loop)
N
Nicolas Thery

I will tell you why I had to use STRAIGHT_JOIN :

I had a performance issue with a query.

Simplifying the query, the query was suddently more efficient

Trying to figure out which specific part was bringing the issue, I just couldn't. (2 left joins together were slow, and each one was independently fast)

I then executed the EXPLAIN with both slow and fast query (addind one of the left joins)

Surprisingly, MySQL changed entirely the JOIN orders between the 2 queries.

Therefore I forced one of the joins to be straight_join to FORCE the previous join to be read first. This prevented MySQL to change the execution order and worked like a charm !


A
Accountant م

In my short experience, one of the situations that STRAIGHT_JOIN has reduced my query from 30 seconds to 100 milliseconds is that the first table in the execution plan was not the table that has the order by columns

-- table sales (45000000) rows
-- table stores (3) rows
SELECT whatever
FROM 
    sales 
    INNER JOIN stores ON sales.storeId = stores.id
ORDER BY sales.date, sales.id 
LIMIT 50;
-- there is an index on (date, id)

IF the optimizer chooses to hit stores first it will cause Using index; Using temporary; Using filesort because

if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.

source

here the optimizer needs a little help by telling him to hit sales first using

sales STRAIGHT_JOIN stores

(I embellished on your Answer.)
R
Rick James

If your query ends with ORDER BY... LIMIT..., it may be optimal to reformulate the query to trick the optimizer into doing the LIMIT before the JOIN.

(This Answer does not apply only to the original question about STRAIGHT_JOIN, nor does it apply to all cases of STRAIGHT_JOIN.)

Starting with the example by @Accountantم, this should run faster in most situations. (And it avoids needing hints.)

SELECT  whatever
    FROM  ( SELECT id FROM sales
                ORDER BY  date, id
                LIMIT  50
          ) AS x
    JOIN  sales   ON sales.id = x.id
    JOIN  stores  ON sales.storeId = stores.id
    ORDER BY  sales.date, sales.id;

Notes:

First, 50 ids are fetched. This will be especially fast with INDEX(date, id).

Then the join back to sales lets you get only 50 "whatevers" without hauling them around in a temp table.

since a subquery is, by definition, unordered, the ORDER BY must be repeated in the outer query. (The Optimizer may find a way to avoid actually doing another sort.)

Yes, it is messier. But it is usually faster.

I am opposed to using hits because "Even if it is faster today, it may fail to be faster tomorrow."


r
rai

I know it's a bit old but here's a scenario, I've been doing batch script to populate a certain table. At some point, the query ran very slow. It appears that the join order was incorrect on particular records:

In correct order

https://i.stack.imgur.com/oIOsX.png

Incrementing the id by 1 messes up the order. Notice the 'Extra' field

https://i.stack.imgur.com/Vfpvv.png

Using straight_join fixes the issue

https://i.stack.imgur.com/BdIaU.png

Incorrect order runs for about 65 secs while using straight_join runs in milliseconds


l
lhs295988029
--use 120s, 18 million data
    explain SELECT DISTINCT d.taid
    FROM tvassist_recommend_list_everyday_diverse d, tvassist_taid_all t
    WHERE d.taid = t.taid
      AND t.client_version >= '21004007'
      AND t.utdid IS NOT NULL
      AND d.recommend_day = '20170403'
    LIMIT 0, 10000

--use 3.6s repalce by straight join
 explain SELECT DISTINCT d.taid
    FROM tvassist_recommend_list_everyday_diverse d
    STRAIGHT_JOIN 
      tvassist_taid_all t on d.taid = t.taid 
    WHERE 
     t.client_version >= '21004007'
       AND d.recommend_day = '20170403'

      AND t.utdid IS NOT NULL  
    LIMIT 0, 10000

This doesn't give you nearly enough information to figure out when straight joins are appropriate.