ChatGPT解决这个技术问题 Extra ChatGPT

Why is LINQ JOIN so much faster than linking with WHERE?

I've recently upgraded to VS 2010 and am playing around with LINQ to Dataset. I have a strong typed dataset for Authorization that is in HttpCache of an ASP.NET WebApplication.

So i wanted to know what actually is the fastest way to check if a user is authorized to do something. Here is my datamodel and some other informations if somebody is interested.

I have checked 3 ways:

direct database LINQ query with Where conditions as "Join" - Syntax LINQ query with Join - Syntax

These are the results with 1000 calls on each function:

1.Iteration:

4,2841519 sec. 115,7796925 sec. 2,024749 sec.

2.Iteration:

3,1954857 sec. 84,97047 sec. 1,5783397 sec.

3.Iteration:

2,7922143 sec. 97,8713267 sec. 1,8432163 sec.

Average:

Database: 3,4239506333 sec. Where: 99,5404964 sec. Join: 1,815435 sec.

Why is the Join-version so much faster than the where-syntax which makes it useless although as a LINQ newbie it seems to be the most legible. Or have i missed something in my queries?

Here are the LINQ queries, i skip the database:

Where:

Public Function hasAccessDS_Where(ByVal accessRule As String) As Boolean
    Dim userID As Guid = DirectCast(Membership.GetUser.ProviderUserKey, Guid)
    Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule, _
                roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule, _
                role In Authorization.dsAuth.aspnet_Roles, _
                userRole In Authorization.dsAuth.aspnet_UsersInRoles _
                Where accRule.idAccessRule = roleAccRule.fiAccessRule _
                And roleAccRule.fiRole = role.RoleId _
                And userRole.RoleId = role.RoleId _
                And userRole.UserId = userID And accRule.RuleName.Contains(accessRule)
                Select accRule.idAccessRule
    Return query.Any
End Function

Join:

Public Function hasAccessDS_Join(ByVal accessRule As String) As Boolean
    Dim userID As Guid = DirectCast(Membership.GetUser.ProviderUserKey, Guid)
    Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule _
                Join roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule _
                On accRule.idAccessRule Equals roleAccRule.fiAccessRule _
                Join role In Authorization.dsAuth.aspnet_Roles _
                On role.RoleId Equals roleAccRule.fiRole _
                Join userRole In Authorization.dsAuth.aspnet_UsersInRoles _
                On userRole.RoleId Equals role.RoleId _
                Where userRole.UserId = userID And accRule.RuleName.Contains(accessRule)
                Select accRule.idAccessRule
    Return query.Any
End Function

Thank you in advance.

Edit: after some improvements on both queries to get more meaningful perfomance-values, the advantage of the JOIN is even many times greater than before:

Join:

Public Overloads Shared Function hasAccessDS_Join(ByVal userID As Guid, ByVal idAccessRule As Int32) As Boolean
    Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule _
                   Join roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule _
                   On accRule.idAccessRule Equals roleAccRule.fiAccessRule _
                   Join role In Authorization.dsAuth.aspnet_Roles _
                   On role.RoleId Equals roleAccRule.fiRole _
                   Join userRole In Authorization.dsAuth.aspnet_UsersInRoles _
                   On userRole.RoleId Equals role.RoleId _
                   Where accRule.idAccessRule = idAccessRule And userRole.UserId = userID
             Select role.RoleId
    Return query.Any
End Function

Where:

Public Overloads Shared Function hasAccessDS_Where(ByVal userID As Guid, ByVal idAccessRule As Int32) As Boolean
    Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule, _
           roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule, _
           role In Authorization.dsAuth.aspnet_Roles, _
           userRole In Authorization.dsAuth.aspnet_UsersInRoles _
           Where accRule.idAccessRule = roleAccRule.fiAccessRule _
           And roleAccRule.fiRole = role.RoleId _
           And userRole.RoleId = role.RoleId _
           And accRule.idAccessRule = idAccessRule And userRole.UserId = userID
           Select role.RoleId
    Return query.Any
End Function

Result for 1000 calls (on a faster computer)

Join | 2. Where

1.Iteration:

0,0713669 sec. 12,7395299 sec.

2.Iteration:

0,0492458 sec. 12,3885925 sec.

3.Iteration:

0,0501982 sec. 13,3474216 sec.

Average:

Join: 0,0569367 sec. Where: 12,8251813 sec.

Join is 225 times faster

Conclusion: avoid WHERE to specify relations and use JOIN whenever possible(definitely in LINQ to DataSet and Linq-To-Objects in general).

For others who read this and are using LinqToSQL and think that it might be good to change all your WHEREs to JOINs, please make sure you read the comment by THomas Levesque where he says "there is such an optimisation when you use Linq to SQL or Linq to Entities, because the generated SQL query is treated as a join by the DBMS. But in that case you're using Linq to DataSet, there is no translation to SQL". In other words, don't bother changing anything when you are using linqtosql as the WHERE's translate to joins.
@JonH: it doesn't hurt to use Join anywhy, why rely on an optimizer if you can write the opimized code from the beginning? It also makes your intentions clearer. So the same reasons why you should prefer JOIN in sql.
Am I correct to assume that this would not be the case with EntityFramework?

T
Thomas Levesque

Your first approach (SQL query in the DB) is quite efficient because the DB knows how to perform a join. But it doesn't really make sense to compare it with the other approaches, since they work directly in memory (Linq to DataSet) The query with multiple tables and a Where condition actually performs a cartesian product of all the tables, then filters the rows that satisfy the condition. This means the Where condition is evaluated for each combination of rows (n1 * n2 * n3 * n4) The Join operator takes the rows from the first tables, then takes only the rows with a matching key from the second table, then only the rows with a matching key from the third table, and so on. This is much more efficient, because it doesn't need to perform as many operations


Thank you for clarifying the background. The db approach was not really part of this question, but it was interesting to me to see if the memory approach is really faster. I assumed that .net would optimize the where-query in some way just as a dbms. Actually the JOIN was even 225 times faster than the WHERE(last edit).
G
Guffa

The Join is much faster, because the method knows how to combine the tables to reduce the result to the relevant combinations. When you use Where to specify the relation, it has to create every possible combination, and then test the condition to see which combinations are relevant.

The Join method can set up a hash table to use as an index to quicky zip two tables together, while the Where method runs after all the combinations are already created, so it can't use any tricks to reduce the combinations beforehand.


Thank you. Are there no implicit optimizations from compiler/runtime like in dbms? It should not be impossible to see that the where-relation actually is a join.
A good RDBMS should indeed spot that the WHERE condition is a test for equality on two UNIQUE columns and treat it as a JOIN.
@Tim Schelter, there is such an optimisation when you use Linq to SQL or Linq to Entities, because the generated SQL query is treated as a join by the DBMS. But in that case you're using Linq to DataSet, there is no translation to SQL
@Tim: LINQ to DataSets actually uses LINQ to Objects. As a result, true joins can only be captured with the join keyword, as there's no runtime analysis of the query to produce anything analogous to an execution plan. You'll also notice that LINQ-based joins can only accommodate single-column equijoins.
@Adam, that not exactly true: you can do equijoins with multiple keys, using anonymous types: ... on new { f1.Key1, f1.Key2 } equals new { f2.Key1, f2.Key2 }
p
phillip

what you really need to know is the sql that was created for the two statements. There are a few ways of getting to it but the simplest is to use LinqPad. There are several buttons right above the query results that will change to the sql. That will give you a lot more information than anything else.

Great information you shared there though.


Thank for the LinqPad-hint. Actually my two queries are linQ to Dataset in memory queries, hence i assume that there is no SQL generated. Normally it would be optimized by the dbms.