ChatGPT解决这个技术问题 Extra ChatGPT

Fastest Way of Inserting in Entity Framework

I'm looking for the fastest way of inserting into Entity Framework.

I'm asking this because of the scenario where you have an active TransactionScope and the insertion is huge (4000+). It can potentially last more than 10 minutes (default timeout of transactions), and this will lead to an incomplete transaction.

How are you currently doing it?
Creating the TransactionScope, instantiating the DBContext, Opening the connection, and in a for-each statement doing the insertions and SavingChanges (for each record) , NOTE: TransactionScope and DBContext are in using statements, and i'm closing the connection in a finally block
Another answer for reference: stackoverflow.com/questions/5798646/…
The fastest way of inserting into a SQL database does not involve EF. AFAIK Its BCP then TVP+Merge/insert.
For those who will read comments: Most applicable, modern answer is here.

S
Shimmy Weitzhandler

To your remark in the comments to your question:

"...SavingChanges (for each record)..."

That's the worst thing you can do! Calling SaveChanges() for each record slows bulk inserts extremely down. I would do a few simple tests which will very likely improve the performance:

Call SaveChanges() once after ALL records.

Call SaveChanges() after for example 100 records.

Call SaveChanges() after for example 100 records and dispose the context and create a new one.

Disable change detection

For bulk inserts I am working and experimenting with a pattern like this:

using (TransactionScope scope = new TransactionScope())
{
    MyDbContext context = null;
    try
    {
        context = new MyDbContext();
        context.Configuration.AutoDetectChangesEnabled = false;

        int count = 0;            
        foreach (var entityToInsert in someCollectionOfEntitiesToInsert)
        {
            ++count;
            context = AddToContext(context, entityToInsert, count, 100, true);
        }

        context.SaveChanges();
    }
    finally
    {
        if (context != null)
            context.Dispose();
    }

    scope.Complete();
}

private MyDbContext AddToContext(MyDbContext context,
    Entity entity, int count, int commitCount, bool recreateContext)
{
    context.Set<Entity>().Add(entity);

    if (count % commitCount == 0)
    {
        context.SaveChanges();
        if (recreateContext)
        {
            context.Dispose();
            context = new MyDbContext();
            context.Configuration.AutoDetectChangesEnabled = false;
        }
    }

    return context;
}

I have a test program which inserts 560.000 entities (9 scalar properties, no navigation properties) into the DB. With this code it works in less than 3 minutes.

For the performance it is important to call SaveChanges() after "many" records ("many" around 100 or 1000). It also improves the performance to dispose the context after SaveChanges and create a new one. This clears the context from all entites, SaveChanges doesn't do that, the entities are still attached to the context in state Unchanged. It is the growing size of attached entities in the context what slows down the insertion step by step. So, it is helpful to clear it after some time.

Here are a few measurements for my 560000 entities:

commitCount = 1, recreateContext = false: many hours (That's your current procedure)

commitCount = 100, recreateContext = false: more than 20 minutes

commitCount = 1000, recreateContext = false: 242 sec

commitCount = 10000, recreateContext = false: 202 sec

commitCount = 100000, recreateContext = false: 199 sec

commitCount = 1000000, recreateContext = false: out of memory exception

commitCount = 1, recreateContext = true: more than 10 minutes

commitCount = 10, recreateContext = true: 241 sec

commitCount = 100, recreateContext = true: 164 sec

commitCount = 1000, recreateContext = true: 191 sec

The behaviour in the first test above is that the performance is very non-linear and decreases extremely over time. ("Many hours" is an estimation, I never finished this test, I stopped at 50.000 entities after 20 minutes.) This non-linear behaviour is not so significant in all other tests.


@Bongo Sharp: Don't forget to set AutoDetectChangesEnabled = false; on the DbContext. It also has a big additional performance effect: stackoverflow.com/questions/5943394/…
Yeah, the problem is that i'm using Entity Framework 4, and AutoDetectChangesEnabled is part of the 4.1, nevertheless, i did the performance test and i had AMAZING RESULTS, it went from 00:12:00 to 00:00:22 SavinChanges on each entity was doing the olverload... THANKS so much for your answare! this is what i was looking for
Thank you for the context.Configuration.AutoDetectChangesEnabled = false; tip, it makes a huge difference.
@dahacker89: Are you using the correct version EF >= 4.1 and DbContext, NOT ObjectContext?
@dahacker89: I suggest that you create a separate question for your problem with perhaps more details. I'm not able to figure out here what is wrong.
a
arkhivania

This combination increase speed well enough.

context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;

Don't blindly disable ValidateOnSaveEnabled you may be depending on that behavior, and not realize it until it's too late. Then again you may be performing validation elsewhere in code and having EF validate yet again is completely unnecessary.
In my test saving 20.000 rows went down from 101 seconds to 88 seconds. Not a lot and what are the implications.
@JeremyCook I think what you're trying to get at is this answer would be much better if it explained the possible implications of changing these properties from their default values (aside from performance improvement). I agree.
This worked for me, although if you're updating records in the context you will need to call DetectChanges() explicitly
These can be disabled and then re-enabled with a try-finally block: msdn.microsoft.com/en-us/data/jj556205.aspx
A
Adam Rackis

You should look at using the System.Data.SqlClient.SqlBulkCopy for this. Here's the documentation, and of course there are plenty of tutorials online.

Sorry, I know you were looking for a simple answer to get EF to do what you want, but bulk operations are not really what ORMs are meant for.


I have run into the SqlBulkCopy a couple of times while researching this, but it seems to be more oriented to table-to-table inserts, saddly i was not expecting easy solutions, but rather performance tips, like for example managing the State of the connection manually, insted of letting EF do it for you
I've used SqlBulkCopy to insert large amounts of data right from my application. You basically have to create a DataTable, fill it up, then pass that to BulkCopy. There are a few gotchas as you're setting up your DataTable (most of which I've forgotten, sadly), but it should work just fine
I did the proof of concept, and as promissed, it works really fast, but one of the reasons why i'm using EF is becuase the insertion of relational data is easier, Eg if i'm insert an entity that already contains relational data, it will also insert it, have you ever got into this scenario? Thanks!
Unfortunately inserting a web of objects into a DBMS is not really something BulkCopy will do. That's the benefit of an ORM like EF, the cost being that it won't scale to do hundreds of similar object graphs efficiently.
SqlBulkCopy is definitely the way to go if you need raw speed or if you will be re-running this insert. I've inserted several million records with it before and it is extremely fast. That said, unless you will need to re-run this insert, it might be easier to just use EF.
m
maxlego

The fastest way would be using bulk insert extension, which I developed

note: this is a commercial product, not free of charge

https://i.stack.imgur.com/AdWD3.png

usage is extremely simple

context.BulkInsert(hugeAmountOfEntities);

Fast but only does the top layer of a hierarchy.
It is not free.
Ads are getting smarter... this is paid product and very expensive for a freelance. Be warned!
USD600 for 1-year support and upgrades? Are you out of your mind?
im not the owner of the product any longer
M
Manfred Wippel

as it was never mentioned here I want to recomment EFCore.BulkExtensions here

context.BulkInsert(entitiesList);                 context.BulkInsertAsync(entitiesList);
context.BulkUpdate(entitiesList);                 context.BulkUpdateAsync(entitiesList);
context.BulkDelete(entitiesList);                 context.BulkDeleteAsync(entitiesList);
context.BulkInsertOrUpdate(entitiesList);         context.BulkInsertOrUpdateAsync(entitiesList);         // Upsert
context.BulkInsertOrUpdateOrDelete(entitiesList); context.BulkInsertOrUpdateOrDeleteAsync(entitiesList); // Sync
context.BulkRead(entitiesList);                   context.BulkReadAsync(entitiesList);

I second this suggestion. After trying many homebrew solutions this cut my insert down to 1 second from over 50 seconds. And, it's MIT license so easy to incorporate.
is this avail for ef 6.x
this is only more performant than using AddRange if it's over 10 entities
10 000 inserts went from 9 minutes to 12 seconds. This deserves more attention!
If there's any way to change accepted answers, this should be the modern accepted answer now. And I wish EF team provided this out of box.
a
akjoshi

I agree with Adam Rackis. SqlBulkCopy is the fastest way of transferring bulk records from one data source to another. I used this to copy 20K records and it took less than 3 seconds. Have a look at the example below.

public static void InsertIntoMembers(DataTable dataTable)
{           
    using (var connection = new SqlConnection(@"data source=;persist security info=True;user id=;password=;initial catalog=;MultipleActiveResultSets=True;App=EntityFramework"))
    {
        SqlTransaction transaction = null;
        connection.Open();
        try
        {
            transaction = connection.BeginTransaction();
            using (var sqlBulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, transaction))
            {
                sqlBulkCopy.DestinationTableName = "Members";
                sqlBulkCopy.ColumnMappings.Add("Firstname", "Firstname");
                sqlBulkCopy.ColumnMappings.Add("Lastname", "Lastname");
                sqlBulkCopy.ColumnMappings.Add("DOB", "DOB");
                sqlBulkCopy.ColumnMappings.Add("Gender", "Gender");
                sqlBulkCopy.ColumnMappings.Add("Email", "Email");

                sqlBulkCopy.ColumnMappings.Add("Address1", "Address1");
                sqlBulkCopy.ColumnMappings.Add("Address2", "Address2");
                sqlBulkCopy.ColumnMappings.Add("Address3", "Address3");
                sqlBulkCopy.ColumnMappings.Add("Address4", "Address4");
                sqlBulkCopy.ColumnMappings.Add("Postcode", "Postcode");

                sqlBulkCopy.ColumnMappings.Add("MobileNumber", "MobileNumber");
                sqlBulkCopy.ColumnMappings.Add("TelephoneNumber", "TelephoneNumber");

                sqlBulkCopy.ColumnMappings.Add("Deleted", "Deleted");

                sqlBulkCopy.WriteToServer(dataTable);
            }
            transaction.Commit();
        }
        catch (Exception)
        {
            transaction.Rollback();
        }

    }
}

I tried many of the solutions provided in this post and SqlBulkCopy was by far the fastest. Pure EF took 15min, but with a mix of the solution and SqlBulkCopy I was able to get down to 1.5 min! This was with 2 million records! Without any DB index optimization.
List is easier than DataTable. There's an AsDataReader() extension method, explained in this answer: stackoverflow.com/a/36817205/1507899
But its only for top Entity not relational one
@ZahidMustafa: yeah. It's doing BulkInsert, not Bulk-Analysis-And-Relation-Tracing-On-Object-Graphs.. if you want to cover relations, you have to analyze and determine insertion order and then bulk-insert individual levels and maybe update some keys as needed, and you will get speedy custom tailored solution. Or, you can rely on EF to do that, no work on your side, but slower at runtime.
S
ShaTin

I would recommend this article on how to do bulk inserts using EF.

Entity Framework and slow bulk INSERTs

He explores these areas and compares perfomance:

Default EF (57 minutes to complete adding 30,000 records) Replacing with ADO.NET Code (25 seconds for those same 30,000) Context Bloat- Keep the active Context Graph small by using a new context for each Unit of Work (same 30,000 inserts take 33 seconds) Large Lists - Turn off AutoDetectChangesEnabled (brings the time down to about 20 seconds) Batching (down to 16 seconds) DbTable.AddRange() - (performance is in the 12 range)


A
Admir Tuzović

I've investigated Slauma's answer (which is awesome, thanks for the idea man), and I've reduced batch size until I've hit optimal speed. Looking at the Slauma's results:

commitCount = 1, recreateContext = true: more than 10 minutes

commitCount = 10, recreateContext = true: 241 sec

commitCount = 100, recreateContext = true: 164 sec

commitCount = 1000, recreateContext = true: 191 sec

It is visible that there is speed increase when moving from 1 to 10, and from 10 to 100, but from 100 to 1000 inserting speed is falling down again.

So I've focused on what's happening when you reduce batch size to value somewhere in between 10 and 100, and here are my results (I'm using different row contents, so my times are of different value):

Quantity    | Batch size    | Interval
1000    1   3
10000   1   34
100000  1   368

1000    5   1
10000   5   12
100000  5   133

1000    10  1
10000   10  11
100000  10  101

1000    20  1
10000   20  9
100000  20  92

1000    27  0
10000   27  9
100000  27  92

1000    30  0
10000   30  9
100000  30  92

1000    35  1
10000   35  9
100000  35  94

1000    50  1
10000   50  10
100000  50  106

1000    100 1
10000   100 14
100000  100 141

Based on my results, actual optimum is around value of 30 for batch size. It's less than both 10 and 100. Problem is, I have no idea why is 30 optimal, nor could have I found any logical explanation for it.


I found the same with Postrges and pure SQL (it's depends on SQL not on EF) that 30 is optimal.
My experience is that optimum differs for different connection speed and size of row. For fast connection and small rows optimum can be even >200 rows.
j
julianstark999

[2019 Update] EF Core 3.1

Following what have been said above, disabling AutoDetectChangesEnabled in EF Core worked perfectly: the insertion time was divided by 100 (from many minutes to a few seconds, 10k records with cross tables relationships)

The updated code is :

context.ChangeTracker.AutoDetectChangesEnabled = false;
foreach (IRecord record in records) {
    //Add records to your database        
}
context.ChangeTracker.DetectChanges();
context.SaveChanges();
context.ChangeTracker.AutoDetectChangesEnabled = true; //do not forget to re-enable

M
Mikael Eliasson

As other people have said SqlBulkCopy is the way to do it if you want really good insert performance.

It's a bit cumbersome to implement but there are libraries that can help you with it. There are a few out there but I will shamelesslyplug my own library this time: https://github.com/MikaelEliasson/EntityFramework.Utilities#batch-insert-entities

The only code you would need is:

 using (var db = new YourDbContext())
 {
     EFBatchOperation.For(db, db.BlogPosts).InsertAll(list);
 }

So how much faster is it? Very hard to say because it depends on so many factors, computer performance, network, object size etc etc. The performance tests I've made suggests 25k entities can be inserted at around 10s the standard way on localhost IF you optimize your EF configuration like mentioned in the other answers. With EFUtilities that takes about 300ms. Even more interesting is that I have saved around 3 millions entities in under 15 seconds using this method, averaging around 200k entities per second.

The one problem is ofcourse if you need to insert releated data. This can be done efficently into sql server using the method above but it requires you to have an Id generation strategy that let you generate id's in the app-code for the parent so you can set the foreign keys. This can be done using GUIDs or something like HiLo id generation.


Works well. The syntax is a bit verbose though. Think it would be better if EFBatchOperation had a constructor which you pass in the DbContext to rather than passing to every static method. Generic versions of InsertAll and UpdateAll which automatically find the collection, similar to DbContext.Set<T>, would be good too.
Just a quick comment to say thanks! This code allowed me to save 170k records in 1.5 seconds! Completely blows any other method I've tried out of the water.
@Mikael One issue is dealing with identity fields. Do you have a way to enable identity insert yet?
In contrast to EntityFramework.BulkInsert, this library remained free. +1
Is it applicable for EF Core?
R
RobJan

Dispose() context create problems if the entities you Add() rely on other preloaded entities (e.g. navigation properties) in the context

I use similar concept to keep my context small to achieve the same performance

But instead of Dispose() the context and recreate, I simply detach the entities that already SaveChanges()

public void AddAndSave<TEntity>(List<TEntity> entities) where TEntity : class {

const int CommitCount = 1000; //set your own best performance number here
int currentCount = 0;

while (currentCount < entities.Count())
{
    //make sure it don't commit more than the entities you have
    int commitCount = CommitCount;
    if ((entities.Count - currentCount) < commitCount)
        commitCount = entities.Count - currentCount;

    //e.g. Add entities [ i = 0 to 999, 1000 to 1999, ... , n to n+999... ] to conext
    for (int i = currentCount; i < (currentCount + commitCount); i++)        
        _context.Entry(entities[i]).State = System.Data.EntityState.Added;
        //same as calling _context.Set<TEntity>().Add(entities[i]);       

    //commit entities[n to n+999] to database
    _context.SaveChanges();

    //detach all entities in the context that committed to database
    //so it won't overload the context
    for (int i = currentCount; i < (currentCount + commitCount); i++)
        _context.Entry(entities[i]).State = System.Data.EntityState.Detached;

    currentCount += commitCount;
} }

wrap it with try catch and TrasactionScope() if you need, not showing them here for keeping the code clean


That slowed down the insert (AddRange) using Entity Framework 6.0. Inserting 20.000 rows went up from about 101 seconds to 118 seconds.
@Stephen Ho: I am also trying to avoid disposing my context. I can understand this is slower than recreating the context, but I want to know if you found this faster enough than not recreating the context but with a commitCount set.
@Learner: I think it was faster than recreate the context. But I don't really remember now cos I switched to use SqlBulkCopy at last.
I ended up having to use this technique because, for some weird reason, there was some left over tracking occurring on the second pass through the while loop, even though I had everything wrapped in a using statement and even called Dispose() on the DbContext. When I would add to the context (on the 2nd pass) the context set count would jump to 6 instead of just one. The other items that got arbitrarily added had already been inserted in the first pass through the while loop so the call to SaveChanges would fail on the second pass (for obvious reasons).
G
Guilherme

I know this is a very old question, but one guy here said that developed an extension method to use bulk insert with EF, and when I checked, I discovered that the library costs $599 today (for one developer). Maybe it makes sense for the entire library, however for just the bulk insert this is too much.

Here is a very simple extension method I made. I use that on pair with database first (do not tested with code first, but I think that works the same). Change YourEntities with the name of your context:

public partial class YourEntities : DbContext
{
    public async Task BulkInsertAllAsync<T>(IEnumerable<T> entities)
    {
        using (var conn = new SqlConnection(Database.Connection.ConnectionString))
        {
            await conn.OpenAsync();

            Type t = typeof(T);

            var bulkCopy = new SqlBulkCopy(conn)
            {
                DestinationTableName = GetTableName(t)
            };

            var table = new DataTable();

            var properties = t.GetProperties().Where(p => p.PropertyType.IsValueType || p.PropertyType == typeof(string));

            foreach (var property in properties)
            {
                Type propertyType = property.PropertyType;
                if (propertyType.IsGenericType &&
                    propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
                {
                    propertyType = Nullable.GetUnderlyingType(propertyType);
                }

                table.Columns.Add(new DataColumn(property.Name, propertyType));
            }

            foreach (var entity in entities)
            {
                table.Rows.Add(
                    properties.Select(property => property.GetValue(entity, null) ?? DBNull.Value).ToArray());
            }

            bulkCopy.BulkCopyTimeout = 0;
            await bulkCopy.WriteToServerAsync(table);
        }
    }

    public void BulkInsertAll<T>(IEnumerable<T> entities)
    {
        using (var conn = new SqlConnection(Database.Connection.ConnectionString))
        {
            conn.Open();

            Type t = typeof(T);

            var bulkCopy = new SqlBulkCopy(conn)
            {
                DestinationTableName = GetTableName(t)
            };

            var table = new DataTable();

            var properties = t.GetProperties().Where(p => p.PropertyType.IsValueType || p.PropertyType == typeof(string));

            foreach (var property in properties)
            {
                Type propertyType = property.PropertyType;
                if (propertyType.IsGenericType &&
                    propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
                {
                    propertyType = Nullable.GetUnderlyingType(propertyType);
                }

                table.Columns.Add(new DataColumn(property.Name, propertyType));
            }

            foreach (var entity in entities)
            {
                table.Rows.Add(
                    properties.Select(property => property.GetValue(entity, null) ?? DBNull.Value).ToArray());
            }

            bulkCopy.BulkCopyTimeout = 0;
            bulkCopy.WriteToServer(table);
        }
    }

    public string GetTableName(Type type)
    {
        var metadata = ((IObjectContextAdapter)this).ObjectContext.MetadataWorkspace;
        var objectItemCollection = ((ObjectItemCollection)metadata.GetItemCollection(DataSpace.OSpace));

        var entityType = metadata
                .GetItems<EntityType>(DataSpace.OSpace)
                .Single(e => objectItemCollection.GetClrType(e) == type);

        var entitySet = metadata
            .GetItems<EntityContainer>(DataSpace.CSpace)
            .Single()
            .EntitySets
            .Single(s => s.ElementType.Name == entityType.Name);

        var mapping = metadata.GetItems<EntityContainerMapping>(DataSpace.CSSpace)
                .Single()
                .EntitySetMappings
                .Single(s => s.EntitySet == entitySet);

        var table = mapping
            .EntityTypeMappings.Single()
            .Fragments.Single()
            .StoreEntitySet;

        return (string)table.MetadataProperties["Table"].Value ?? table.Name;
    }
}

You can use that against any collection that inherit from IEnumerable, like that:

await context.BulkInsertAllAsync(items);

please complete your example code. where is bulkCopy
It is already here: await bulkCopy.WriteToServerAsync(table);
Maybe i wasn't clear, in your write up, you suggest you made an extension...which i took to mean that no 3rd part lib was needed, when in fact in both methods use SqlBulkCopy lib. This entirely relies on SqlBulkCopy, when why i asked where does bulkCopy come from, its an extension lib which you wrote an extension lib on top of. Would of just made more sense to say here is how i used SqlBulkCopy lib.
should use conn.OpenAsync in async version
@guiherme Am I correct that the SqlBulkCopy in your code is really the SqlClient.SqlBulkCopy class built in to .net?
J
Jonathan Magnan

I'm looking for the fastest way of inserting into Entity Framework

There are some third-party libraries supporting Bulk Insert available:

Z.EntityFramework.Extensions (Recommended)

EFUtilities

EntityFramework.BulkInsert

See: Entity Framework Bulk Insert library

Be careful, when choosing a bulk insert library. Only Entity Framework Extensions supports all kind of associations and inheritances and it's the only one still supported.

Disclaimer: I'm the owner of Entity Framework Extensions

This library allows you to perform all bulk operations you need for your scenarios:

Bulk SaveChanges

Bulk Insert

Bulk Delete

Bulk Update

Bulk Merge

Example

// Easy to use
context.BulkSaveChanges();

// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);

// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);

// Customize Primary Key
context.BulkMerge(customers, operation => {
   operation.ColumnPrimaryKeyExpression = 
        customer => customer.Code;
});

this is a great extension but not free.
This answer is pretty good and EntityFramework.BulkInsert performs a bulk insertion of 15K rows in 1.5 seconds, works pretty nice for an internal process like a Windows Service.
Yeah, 600$ for bulk insert. Totaly worth it.
@eocron Yeat it's worth it if you use it comercially. I don't see any problem with $600 for something that i don't have to spend hours on building it myself which will cost me a lot more than $600. Yes it costs money but looking at my hourly rate it is money well spend!
R
Reza Jenabi

One of the fastest ways to save a list you must apply the following code

context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;

AutoDetectChangesEnabled = false

Add, AddRange & SaveChanges: Doesn't detect changes.

ValidateOnSaveEnabled = false;

Doesn't detect change tracker

You must add nuget

Install-Package Z.EntityFramework.Extensions

Now you can use the following code

var context = new MyContext();

context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;

context.BulkInsert(list);
context.BulkSaveChanges();

can I use Your sample Code For Bulk Update?
Z library is not free
Thanks @reza-jenabi. It saved me
M
Michal Hosala

Yes, SqlBulkUpdate is indeed the fastest tool for this type of task. I wanted to find "least effort" generic way for me in .NET Core so I ended up using great library from Marc Gravell called FastMember and writing one tiny extension method for entity framework DB context. Works lightning fast:

using System.Collections.Generic;
using System.Linq;
using FastMember;
using Microsoft.Data.SqlClient;
using Microsoft.EntityFrameworkCore;

namespace Services.Extensions
{
    public static class DbContextExtensions
    {
        public static void BulkCopyToServer<T>(this DbContext db, IEnumerable<T> collection)
        {
            var messageEntityType = db.Model.FindEntityType(typeof(T));

            var tableName = messageEntityType.GetSchema() + "." + messageEntityType.GetTableName();
            var tableColumnMappings = messageEntityType.GetProperties()
                .ToDictionary(p => p.PropertyInfo.Name, p => p.GetColumnName());

            using (var connection = new SqlConnection(db.Database.GetDbConnection().ConnectionString))
            using (var bulkCopy = new SqlBulkCopy(connection))
            {
                foreach (var (field, column) in tableColumnMappings)
                {
                    bulkCopy.ColumnMappings.Add(field, column);
                }

                using (var reader = ObjectReader.Create(collection, tableColumnMappings.Keys.ToArray()))
                {
                    bulkCopy.DestinationTableName = tableName;
                    connection.Open();
                    bulkCopy.WriteToServer(reader);
                    connection.Close();
                }
            }
        }
    }
}

The more effort less generic way would be to follow something like this (which again uses SqlBulkCopy): codingsight.com/…
M
Maxim

Try to use a Stored Procedure that will get an XML of the data that you want to insert.


Passing data as XML is not needed if you don't want to store them as XML. In SQL 2008 you can use table valued parameter.
i didn't clarify this but i need to also support SQL 2005
S
Sgedda

I have made an generic extension of @Slauma s example above;

public static class DataExtensions
{
    public static DbContext AddToContext<T>(this DbContext context, object entity, int count, int commitCount, bool recreateContext, Func<DbContext> contextCreator)
    {
        context.Set(typeof(T)).Add((T)entity);

        if (count % commitCount == 0)
        {
            context.SaveChanges();
            if (recreateContext)
            {
                context.Dispose();
                context = contextCreator.Invoke();
                context.Configuration.AutoDetectChangesEnabled = false;
            }
        }
        return context;
    }
}

Usage:

public void AddEntities(List<YourEntity> entities)
{
    using (var transactionScope = new TransactionScope())
    {
        DbContext context = new YourContext();
        int count = 0;
        foreach (var entity in entities)
        {
            ++count;
            context = context.AddToContext<TenancyNote>(entity, count, 100, true,
                () => new YourContext());
        }
        context.SaveChanges();
        transactionScope.Complete();
    }
}

P
Philip Johnson

SqlBulkCopy is super quick

This is my implementation:

// at some point in my calling code, I will call:
var myDataTable = CreateMyDataTable();
myDataTable.Rows.Add(Guid.NewGuid,tableHeaderId,theName,theValue); // e.g. - need this call for each row to insert

var efConnectionString = ConfigurationManager.ConnectionStrings["MyWebConfigEfConnection"].ConnectionString;
var efConnectionStringBuilder = new EntityConnectionStringBuilder(efConnectionString);
var connectionString = efConnectionStringBuilder.ProviderConnectionString;
BulkInsert(connectionString, myDataTable);

private DataTable CreateMyDataTable()
{
    var myDataTable = new DataTable { TableName = "MyTable"};
// this table has an identity column - don't need to specify that
    myDataTable.Columns.Add("MyTableRecordGuid", typeof(Guid));
    myDataTable.Columns.Add("MyTableHeaderId", typeof(int));
    myDataTable.Columns.Add("ColumnName", typeof(string));
    myDataTable.Columns.Add("ColumnValue", typeof(string));
    return myDataTable;
}

private void BulkInsert(string connectionString, DataTable dataTable)
{
    using (var connection = new SqlConnection(connectionString))
    {
        connection.Open();
        SqlTransaction transaction = null;
        try
        {
            transaction = connection.BeginTransaction();

            using (var sqlBulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, transaction))
            {
                sqlBulkCopy.DestinationTableName = dataTable.TableName;
                foreach (DataColumn column in dataTable.Columns) {
                    sqlBulkCopy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
                }

                sqlBulkCopy.WriteToServer(dataTable);
            }
            transaction.Commit();
        }
        catch (Exception)
        {
            transaction?.Rollback();
            throw;
        }
    }
}

Z
Zoran Horvat

Here is a performance comparison between using Entity Framework and using SqlBulkCopy class on a realistic example: How to Bulk Insert Complex Objects into SQL Server Database

As others already emphasized, ORMs are not meant to be used in bulk operations. They offer flexibility, separation of concerns and other benefits, but bulk operations (except bulk reading) are not one of them.


A
Amir Saniyan

Use SqlBulkCopy:

void BulkInsert(GpsReceiverTrack[] gpsReceiverTracks)
{
    if (gpsReceiverTracks == null)
    {
        throw new ArgumentNullException(nameof(gpsReceiverTracks));
    }

    DataTable dataTable = new DataTable("GpsReceiverTracks");
    dataTable.Columns.Add("ID", typeof(int));
    dataTable.Columns.Add("DownloadedTrackID", typeof(int));
    dataTable.Columns.Add("Time", typeof(TimeSpan));
    dataTable.Columns.Add("Latitude", typeof(double));
    dataTable.Columns.Add("Longitude", typeof(double));
    dataTable.Columns.Add("Altitude", typeof(double));

    for (int i = 0; i < gpsReceiverTracks.Length; i++)
    {
        dataTable.Rows.Add
        (
            new object[]
            {
                    gpsReceiverTracks[i].ID,
                    gpsReceiverTracks[i].DownloadedTrackID,
                    gpsReceiverTracks[i].Time,
                    gpsReceiverTracks[i].Latitude,
                    gpsReceiverTracks[i].Longitude,
                    gpsReceiverTracks[i].Altitude
            }
        );
    }

    string connectionString = (new TeamTrackerEntities()).Database.Connection.ConnectionString;
    using (var connection = new SqlConnection(connectionString))
    {
        connection.Open();
        using (var transaction = connection.BeginTransaction())
        {
            using (var sqlBulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, transaction))
            {
                sqlBulkCopy.DestinationTableName = dataTable.TableName;
                foreach (DataColumn column in dataTable.Columns)
                {
                    sqlBulkCopy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
                }

                sqlBulkCopy.WriteToServer(dataTable);
            }
            transaction.Commit();
        }
    }

    return;
}

a
anishMarokey

As per my knowledge there is no BulkInsert in EntityFramework to increase the performance of the huge inserts.

In this scenario you can go with SqlBulkCopy in ADO.net to solve your problem


I was taking a look at that class, but it seems to be more oriented to table-to-table insertions, isn't?
Not sure what you mean, it has an overloaded WriteToServer that takes a DataTable.
no you can insert from .Net objects to SQL also.What you are looking for?
A way to insert potentially thousands of records in the database within a TransactionScope block
you can use .Net TransactionScope technet.microsoft.com/en-us/library/bb896149.aspx
A
Aleksa

All the solutions written here don't help because when you do SaveChanges(), insert statements are sent to database one by one, that's how Entity works.

And if your trip to database and back is 50 ms for instance then time needed for insert is number of records x 50 ms.

You have to use BulkInsert, here is the link: https://efbulkinsert.codeplex.com/

I got insert time reduced from 5-6 minutes to 10-12 seconds by using it.


G
Greg R Taylor

Another option is to use SqlBulkTools available from Nuget. It's very easy to use and has some powerful features.

Example:

var bulk = new BulkOperations();
var books = GetBooks();

using (TransactionScope trans = new TransactionScope())
{
    using (SqlConnection conn = new SqlConnection(ConfigurationManager
    .ConnectionStrings["SqlBulkToolsTest"].ConnectionString))
    {
        bulk.Setup<Book>()
            .ForCollection(books)
            .WithTable("Books") 
            .AddAllColumns()
            .BulkInsert()
            .Commit(conn);
    }

    trans.Complete();
}

See the documentation for more examples and advanced usage. Disclaimer: I am the author of this library and any views are of my own opinion.


This project has been deleted from both NuGet and GitHub.
M
Michał Pilarek

[NEW SOLUTION FOR POSTGRESQL] Hey, I know it's quite an old post, but I have recently run into similar problem, but we were using Postgresql. I wanted to use effective bulkinsert, what turned out to be pretty difficult. I haven't found any proper free library to do so on this DB. I have only found this helper: https://bytefish.de/blog/postgresql_bulk_insert/ which is also on Nuget. I have written a small mapper, which auto mapped properties the way Entity Framework:

public static PostgreSQLCopyHelper<T> CreateHelper<T>(string schemaName, string tableName)
        {
            var helper = new PostgreSQLCopyHelper<T>("dbo", "\"" + tableName + "\"");
            var properties = typeof(T).GetProperties();
            foreach(var prop in properties)
            {
                var type = prop.PropertyType;
                if (Attribute.IsDefined(prop, typeof(KeyAttribute)) || Attribute.IsDefined(prop, typeof(ForeignKeyAttribute)))
                    continue;
                switch (type)
                {
                    case Type intType when intType == typeof(int) || intType == typeof(int?):
                        {
                            helper = helper.MapInteger("\"" + prop.Name + "\"",  x => (int?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type stringType when stringType == typeof(string):
                        {
                            helper = helper.MapText("\"" + prop.Name + "\"", x => (string)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type dateType when dateType == typeof(DateTime) || dateType == typeof(DateTime?):
                        {
                            helper = helper.MapTimeStamp("\"" + prop.Name + "\"", x => (DateTime?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type decimalType when decimalType == typeof(decimal) || decimalType == typeof(decimal?):
                        {
                            helper = helper.MapMoney("\"" + prop.Name + "\"", x => (decimal?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type doubleType when doubleType == typeof(double) || doubleType == typeof(double?):
                        {
                            helper = helper.MapDouble("\"" + prop.Name + "\"", x => (double?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type floatType when floatType == typeof(float) || floatType == typeof(float?):
                        {
                            helper = helper.MapReal("\"" + prop.Name + "\"", x => (float?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type guidType when guidType == typeof(Guid):
                        {
                            helper = helper.MapUUID("\"" + prop.Name + "\"", x => (Guid)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                }
            }
            return helper;
        }

I use it the following way (I had entity named Undertaking):

var undertakingHelper = BulkMapper.CreateHelper<Model.Undertaking>("dbo", nameof(Model.Undertaking));
undertakingHelper.SaveAll(transaction.UnderlyingTransaction.Connection as Npgsql.NpgsqlConnection, undertakingsToAdd));

I showed an example with transaction, but it can also be done with normal connection retrieved from context. undertakingsToAdd is enumerable of normal entity records, which I want to bulkInsert into DB.

This solution, to which I've got after few hours of research and trying, is as you could expect much faster and finally easy to use and free! I really advice you to use this solution, not only for the reasons mentioned above, but also because it's the only one with which I had no problems with Postgresql itself, many other solutions work flawlessly for example with SqlServer.


S
Simon Hughes

The secret is to insert into an identical blank staging table. Inserts are lightening quick. Then run a single insert from that into your main large table. Then truncate the staging table ready for the next batch.

ie.

insert into some_staging_table using Entity Framework.

-- Single insert into main table (this could be a tiny stored proc call)
insert into some_main_already_large_table (columns...)
   select (columns...) from some_staging_table
truncate table some_staging_table

Using EF, add all your records to an empty staging table. Then use SQL to insert into the main (large and slow) table in a single SQL instruction. Then empty your staging table. It's a very fast way of inserting a lot of data into an already large table.
When you say using EF, add the records to the staging table, did you actually try this with EF? Since EF issues a separate call to the database with each insert, I suspect you are going to see the same perf hit that the OP is trying to avoid. How does the staging table avoid this issue?
R
Rafael A. M. S.

Have you ever tried to insert through a background worker or task?

In my case, im inserting 7760 registers, distributed in 182 different tables with foreign key relationships ( by NavigationProperties).

Without the task, it took 2 minutes and a half. Within a Task ( Task.Factory.StartNew(...) ), it took 15 seconds.

Im only doing the SaveChanges() after adding all the entities to the context. (to ensure data integrity)


I am pretty sure that the context isn't thread safe. Do you have tests to ensure that all the entities were saved?
I know the entire entity framework isnt thread safe at all, but im just adding the objects to the context and saving at the end... Its working perfectly here.
So, You are calling DbContext.SaveChanges() in main thread, but adding entities to context is performed in background thread, right?
Yes, add data inside the threads; wait for all to finish; and Save Changes in main thread
Although I think this way is dangerous and prone to mistakes, I find it very interesting.
L
Leandro Bardelli

Taking several notes, this is my implementation with improvements mine and from other answers and comments.

Improvements:

Getting the SQL connection string from my Entity

Using SQLBulk just in some parts, the rest only Entity Framework

Using the same Datetable column names that uses the SQL Database without need of mapping each column

Using the same Datatable name that uses SQL Datatable public void InsertBulkDatatable(DataTable dataTable) { EntityConnectionStringBuilder entityBuilder = new EntityConnectionStringBuilder(ConfigurationManager.ConnectionStrings["MyDbContextConnectionName"].ConnectionString); string cs = entityBuilder.ProviderConnectionString; using (var connection = new SqlConnection(cs)) { SqlTransaction transaction = null; connection.Open(); try { transaction = connection.BeginTransaction(); using (var sqlBulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, transaction)) { sqlBulkCopy.DestinationTableName = dataTable.TableName; //Uses the SQL datatable to name the datatable in c# //Maping Columns foreach (DataColumn column in dataTable.Columns) { sqlBulkCopy.ColumnMappings.Add(column.ColumnName, column.ColumnName); } sqlBulkCopy.WriteToServer(dataTable); } transaction.Commit(); } catch (Exception) { transaction.Rollback(); } } }


N
Nadeem

You may use Bulk package library. Bulk Insert 1.0.0 version is used in projects having Entity framework >=6.0.0 .

More description can be found here- Bulkoperation source code


C
Ciro Corvino

TL;DR I know it is an old post, but I have implemented a solution starting from one of those proposed by extending it and solving some problems of this; moreover I have also read the other solutions presented and compared to these it seems to me to propose a solution that is much more suited to the requests formulated in the original question.

In this solution I extend Slauma's approach which I would say is perfect for the case proposed in the original question, and that is to use Entity Framework and Transaction Scope for an expensive write operation on the db.

In Slauma's solution - which incidentally was a draft and was only used to get an idea of the speed of EF with a strategy to implement bulk-insert - there were problems due to:

the timeout of the transaction (by default 1 minute extendable via code to max 10 minutes); the duplication of the first block of data with a width equal to the size of the commit used at the end of the transaction (this problem is quite weird and circumvented by means of a workaround).

I also extended the case study presented by Slauma by reporting an example that includes the contextual insertion of several dependent entities.

The performances that I have been able to verify have been of 10K rec/min inserting in the db a block of 200K wide records approximately 1KB each. The speed was constant, there was no degradation in performance and the test took about 20 minutes to run successfully.

The solution in detail

the method that presides over the bulk-insert operation inserted in an example repository class:

abstract class SomeRepository { 

    protected MyDbContext myDbContextRef;

    public void ImportData<TChild, TFather>(List<TChild> entities, TFather entityFather)
            where TChild : class, IEntityChild
            where TFather : class, IEntityFather
    {

        using (var scope = MyDbContext.CreateTransactionScope())
        {

            MyDbContext context = null;
            try
            {
                context = new MyDbContext(myDbContextRef.ConnectionString);

                context.Configuration.AutoDetectChangesEnabled = false;

                entityFather.BulkInsertResult = false;
                var fileEntity = context.Set<TFather>().Add(entityFather);
                context.SaveChanges();

                int count = 0;

                //avoids an issue with recreating context: EF duplicates the first commit block of data at the end of transaction!!
                context = MyDbContext.AddToContext<TChild>(context, null, 0, 1, true);

                foreach (var entityToInsert in entities)
                {
                    ++count;
                    entityToInsert.EntityFatherRefId = fileEntity.Id;
                    context = MyDbContext.AddToContext<TChild>(context, entityToInsert, count, 100, true);
                }

                entityFather.BulkInsertResult = true;
                context.Set<TFather>().Add(fileEntity);
                context.Entry<TFather>(fileEntity).State = EntityState.Modified;

                context.SaveChanges();
            }
            finally
            {
                if (context != null)
                    context.Dispose();
            }

            scope.Complete();
        }

    }

}

interfaces used for example purposes only:

public interface IEntityChild {

    //some properties ...

    int EntityFatherRefId { get; set; }

}

public interface IEntityFather {

    int Id { get; set; }
    bool BulkInsertResult { get; set; }
}

db context where I implemented the various elements of the solution as static methods:

public class MyDbContext : DbContext
{

    public string ConnectionString { get; set; }


    public MyDbContext(string nameOrConnectionString)
    : base(nameOrConnectionString)
    {
        Database.SetInitializer<MyDbContext>(null);
        ConnectionString = Database.Connection.ConnectionString;
    }


    /// <summary>
    /// Creates a TransactionScope raising timeout transaction to 30 minutes
    /// </summary>
    /// <param name="_isolationLevel"></param>
    /// <param name="timeout"></param>
    /// <remarks>
    /// It is possible to set isolation-level and timeout to different values. Pay close attention managing these 2 transactions working parameters.
    /// <para>Default TransactionScope values for isolation-level and timeout are the following:</para>
    /// <para>Default isolation-level is "Serializable"</para>
    /// <para>Default timeout ranges between 1 minute (default value if not specified a timeout) to max 10 minute (if not changed by code or updating max-timeout machine.config value)</para>
    /// </remarks>
    public static TransactionScope CreateTransactionScope(IsolationLevel _isolationLevel = IsolationLevel.Serializable, TimeSpan? timeout = null)
    {
        SetTransactionManagerField("_cachedMaxTimeout", true);
        SetTransactionManagerField("_maximumTimeout", timeout ?? TimeSpan.FromMinutes(30));

        var transactionOptions = new TransactionOptions();
        transactionOptions.IsolationLevel = _isolationLevel;
        transactionOptions.Timeout = TransactionManager.MaximumTimeout;
        return new TransactionScope(TransactionScopeOption.Required, transactionOptions);
    }

    private static void SetTransactionManagerField(string fieldName, object value)
    {
        typeof(TransactionManager).GetField(fieldName, BindingFlags.NonPublic | BindingFlags.Static).SetValue(null, value);
    }


    /// <summary>
    /// Adds a generic entity to a given context allowing commit on large block of data and improving performance to support db bulk-insert operations based on Entity Framework
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="context"></param>
    /// <param name="entity"></param>
    /// <param name="count"></param>
    /// <param name="commitCount">defines the block of data size</param>
    /// <param name="recreateContext"></param>
    /// <returns></returns>
    public static MyDbContext AddToContext<T>(MyDbContext context, T entity, int count, int commitCount, bool recreateContext) where T : class
    {
        if (entity != null)
            context.Set<T>().Add(entity);

        if (count % commitCount == 0)
        {
            context.SaveChanges();
            if (recreateContext)
            {
                var contextConnectionString = context.ConnectionString;
                context.Dispose();
                context = new MyDbContext(contextConnectionString);
                context.Configuration.AutoDetectChangesEnabled = false;
            }
        }

        return context;
    }
}

S
SelcukBah

Configuration.LazyLoadingEnabled = false; Configuration.ProxyCreationEnabled = false;

these are too effect to speed without AutoDetectChangesEnabled = false; and i advise to use different table header from dbo. generally i use like nop,sop,tbl etc..