Anyone having acceptable performance with SQL Server + odbc?
from kSPvhmTOlwvMd7Y7E@programming.dev to programming@programming.dev on 06 Sep 2024 20:38
https://programming.dev/post/19073795

Omg it’s sooo daammmn slooow it takes around 30 seconds to bulk - insert 15000 rows

Disabling indices doesn’t help. Database log is at SIMPLE. My table is 50 columns wide, and from what i understand the main reason is the stupid limit of 2100 parameters in query in ODBC driver. I am using the . NET SqlBulkCopy. I only open the connection + transaction once per ~15000 inserts

I have 50 millions rows to insert, it takes literally days, please send help, i can fucking write with a pen and paper faster than damned Microsoft driver inserts rows

#programming

threaded - newest

LainTrain@lemmy.dbzer0.com on 06 Sep 2024 21:03 next collapse

Hobbyist here, Is it normal for businesses to be having 50 mil rows to insert into a 50 columns wide database via a 2100+ parameters query, 15000 inserts at a time to a single DB?

kSPvhmTOlwvMd7Y7E@programming.dev on 06 Sep 2024 21:23 next collapse

Oh buddy, enjoy your life & don’t touch Microsoft even with a 10 meters stick

transientpunk@sh.itjust.works on 06 Sep 2024 21:23 next collapse

It definitely seems unusual and poorly optimized…

deegeese@sopuli.xyz on 07 Sep 2024 02:20 next collapse

Inserting 15k rows of 50 columns into a 50M table is something we do every day.

2100 params on a query sounds like spaghetti code.

I suspect OP is using single row insert statements when they need a bulk insert to be performant.

kSPvhmTOlwvMd7Y7E@programming.dev on 07 Sep 2024 06:14 next collapse

I am using SqlBulkInsert, given how bad MS is with naming things, that might as well be row inserts instead of bulks

kSPvhmTOlwvMd7Y7E@programming.dev on 07 Sep 2024 11:41 collapse

2100 parameters is a documented ODBC limitation( which applies on all statements in a batch)

This means that a

“insert into (c1, c2) values (?,?), (?,?)…” can only have 2100 bound parameters, and has nothing to do with code, and even less that surrounding code is “spaghetti”

The tables ARE normalised, the fact that there are 50 colums is because underlying market - data calibration functions expects dozens of parameters, and returns back dozens of other results, such as volatility, implied durations, forward duration and more

The amount of immaturity, inexperience, and ignorance coming from 2 people here is astounding

Blocked

GetOffMyLan@programming.dev on 07 Sep 2024 06:10 collapse

No. This seems like a poorly designed system. Definitely sounds like a nosql database would be a much better fit for this task.

And that many parameters seems like madness haha

kSPvhmTOlwvMd7Y7E@programming.dev on 07 Sep 2024 06:37 collapse

Please enlighten us? You barely know anything about the system or usage, and you have deduced nosql is better? Lol

GetOffMyLan@programming.dev on 07 Sep 2024 08:57 collapse

A flat 50 column table is usually an indicator of bad design and lack of normalization.

Nosql is absolutely ideal for flat data with lots of columns and huge amounts of rows. It’s like one of its main use cases.

That many parameters is an indicator of poorly structured queries and spaghetti code. There is no way that’s the best way the data can be structured.

kSPvhmTOlwvMd7Y7E@programming.dev on 07 Sep 2024 10:14 collapse

You should take a break from trolling

stalker@lemmy.ml on 06 Sep 2024 21:55 next collapse

What is your latency? Can you move data closer to where db is (cloud)? Did you change isolation level? Or recovery model? Did you try bcp? Any indexes you have in table should be deleted?

kSPvhmTOlwvMd7Y7E@programming.dev on 07 Sep 2024 06:39 collapse

Will try bcp & report back EDIT: I can’t install bcp because it is only distributed with SQLServer itself, and I cannot install it on my corporate laptop.

aMockTie@beehaw.org on 06 Sep 2024 21:57 next collapse

Been a little while since I worked on ODBC stuff, but I have a couple of thoughts:

  • Would it be possible to use something like a table function on the DB side to simplify the query from the ODBC side?

  • I could be misremembering, but I feel like looping through individual inserts with an open connection was faster than trying to submit data in bulk when inserting that much data in one shot. Might be worth doing a benchmark in a test DB and table to confirm.

I know I was able to insert more than 50M rows in a manner of single digit hours, but unfortunately don’t have access to that codebase anymore to double check the specifics.

deegeese@sopuli.xyz on 07 Sep 2024 02:23 collapse

Looping single inserts over an open connection is far far slower than a bulk insert because every row is another transaction.

Only thing it’s faster than is if you opened and closed a connection for each row.

RagingHungryPanda@lemm.ee on 06 Sep 2024 21:57 next collapse

I’ve done a lot of work and no, that is not normal.

A few things: First - SQL server has tools for migrating data that’s pretty fast. SQL bulk copy can use some of these. Check to see if the built in db tools are better for this.

SQL bulk copy can handle way more than 15,000 records

Why are you wrapping a data dump in a transaction? That will slow things down for sure.

You generally shouldn’t be doing huge queries like that to where you’re nearing the parameter limit.

Can you share the code?

kSPvhmTOlwvMd7Y7E@programming.dev on 07 Sep 2024 06:44 collapse

I timed the transaction and opening of the connection, it takes maybe a 100 milliseconds, absolutely doesn’t explain ghe abysmal performance

Transaction is needed because 2 tables are touched, i don’t want to deal with partially inserted data

Cannot share the code, but it’s python calling .NET through “clr”, and using SqlBulkCopy

What do you suggest i shouldn’t be using that? It’s either a prepared query, with thousands of parameters, or a plain text string with parameters inside (which admittedly, i didn’t try, might be faster lol)

RagingHungryPanda@lemm.ee on 07 Sep 2024 11:54 collapse

One thing to know about transactions is that they track data and then write it. It’s not the opening that slows it down. I have a question though, what is your source data? Do you have a big CSV for something? Can you do a db to db transfer instead? There’s another tool called the BCP utility.

Edit: SQL server/ssms have tools for doing migrations and batch imports

cccrontab@lemmy.world on 06 Sep 2024 22:57 next collapse

Try BCP. I’m fairly new to the Microsoft landscape too, but found using BCP really helped with efficiency on loading.

kSPvhmTOlwvMd7Y7E@programming.dev on 07 Sep 2024 06:38 collapse

I will try bcp. Somehow, i was convinced I had to have access to the machine running the sql server to use it, but from the doca i see i can specify a remote host… Will report back! EDIT: I can’t install bcp because it is only distributed with SQLServer itself, and I cannot install it on my corporate laptop.

cccrontab@lemmy.world on 07 Sep 2024 08:51 collapse

No, it’s a standalone utility that you can download and install separate from SQL Server. It just adds BCP.exe to your command line.

Docs

Look for the link that says “Download Microsoft Command Line Utilities 15 for SQL Server (x64)”.

RonSijm@programming.dev on 07 Sep 2024 12:11 next collapse

Omg it’s sooo daammmn slooow it takes around 30 seconds to bulk - insert 15000 rows

Do you have any measurements on how long it takes when you just ‘do it raw’? Like trying to do the same insert though SQL Server Management Studio or something?

Because to me it’s not really clear what’s slow. Like you’re complaining specifically about the Microsoft ODBC driver - but do you base that on anything? Can you insert faster from Linux or through other means?

Like if it’s just ‘always slow’ it might just be the SQL Server. If you can better pinpoint when it’s slow, and when it’s fast(er) that probably helps to tell how to speed it up

Randelung@lemmy.world on 07 Sep 2024 12:26 next collapse

A friend of a friend found that exporting to csv and importing is the fastest route. Honestly crazy, but I recreated a test and it’s actually a little faster (when dumping and recreating the whole table, ymmv when inserting).

I’m not 100% sure if it was MSSQL, though.

kyoji@programming.dev on 07 Sep 2024 13:00 collapse

Just used SqlBulkCopy via C# and .NET a few weeks ago to insert 5-7 million rows into multiple tables in a matter of seconds.

I don’t think any of my tables had 50 columns, but one had maybe half of that. Reading your other posts, my experience was different in these ways:

  • Not using Python but C#
  • The machine performing the insert was physically close to the SQL server and did not utilize WAN. (Not sure if this applies to you as well, I don’t recall you saying)
  • I don’t remember putting a transaction on the insert. I just followed Microsoft’s examples from the documentation. A transaction I think has a chance of nullifying the speed you gain from using bulk insert.

Lastly I think you should consider being be more respectful in some of your replies. We all get being frustrated with technology, but you don’t need to extend that to people who are helping you for free.