01-12-2017 07:38 AM
I need to load 8 billion records of data from SAS to SQL Server on windows platform. I tried options below.
1. Using Bulkload, It took around 9 hours 20 min.
2. using Insertbuff DBcommit, it took more than 18 hours.
3. Using Bulkload & BL_USE_PIPE=Yes option, it took around 9 hrs.
However, I cannot afford this much time in my process for just loading. Please suggest if you have any better ideas to do this.
your help would be much appriciable. Thanks in advance.
01-12-2017 07:44 AM
I guess you are shoveling the data across a network connection. How large (physical file size) is the SAS dataset? And what is the bandwidth/latency on the network?
Try to run a proper system diagnostic tool (Windows task manager or UNIX topas/nmon) on the SAS server to display CPU consumption and network throughput during the load process.
01-12-2017 07:57 AM
Thanks for your reply.
Dataset size is 40 GB.
and could you please provide how to check the details about 2nd point that you mentioned in your previous reply.
01-12-2017 08:06 AM
Use the diagnostic tools mentioned to get a feel how much data is sent at any given time. Also do a ping to the SQL server, this will show you the turnaround time (latency) for the network packets.
On top of that, I'd try to create a similar dataset in SQL server only, so you get a feel what the SQL server can do performancewise.
01-12-2017 08:38 AM - edited 01-17-2017 04:03 PM
40GB and especially 8 billions rows are serious volumes. Why are 8 hours not acceptable? Are these volumes you have to load regularly?
I haven't that much experience with loading into SQL Server so just asking some "general" questions:
1. Do you know where the bottleneck is?
2. Are you sure it's only 40GB that you're transfering over the network? Or is this eventually the size of the compressed SAS table?
3. Are you loading into an empty table or not?
4. Are there indexes and constraints on the target table? And if yes, do you know how much time (re-) creation of these indexes and constraints takes up?
5. Is your target table and eventually indexes partitioned? If yes, would it be possible to run multiple SAS sessions each loading a data sub-set directly into a partition?
6. Consider running a test where you simply load external data with SQL Server tools. Is SAS the bottleneck, or Network, or the SQL Server?
I would have to research how exactly SAS transfers the data to SQL server but I do remember that with Oracle it's in a "rectangular" form - so a text file with all the character variables fully expanded to their full length as defined in SAS (=padded up with blanks). This can of course expand the file size of the text file to be transfered over the network quite significantly.
That the text file is in such a fixed record format for Oracle is not primarily a SAS thing but how the Oracle SQL*Loader expects such a file to be "by default" - at least this was the case in the past.
If you don't have such "oversized" SAS character variables with lengths of up to 32KB which mostly doesn't get used, then this is not that much of a concern for you. Else: There are at least with Oracle ways around it (variable record format) but it requires quite a bit of extra coding and "deep dive" into how things work.
I'm also not sure if such "oversized" SAS character variables are an issue with SQL Server but if you've got such variables then this would certainly be an area for further investigation.
If you have to load such volumes on a regular basis, then also consider if you could load earlier in multiple batches - at least into some SQL Server staging tables.
It would also be interesting to better understand your end to end process like: Where and it what form does the source data come from? What's the role of SAS in the process?
I hope above will be of some use for you.
01-12-2017 08:44 AM
Didn't know about the fact that some DBMS's expand all the character variables. If your SAS dataset is compressed and contains character variables, you might be better off by exporting from SAS to an external file with variable record length (CSV), moving the file with sftp/WinSCP/whatever, and loading the file locally into SQL server.
You would see the "expanded variable effect" by watching the network throughput during the load. If throughput * time is massively higher than dataset size, you have your probable culprit.
01-12-2017 09:38 AM
It was one of my very savy work mates who's gone down this path with bulk loading into Oracle in a case where performance and insufficient network throughput was the main concern.
I remember having read in some Oracle documentation some years ago that fixed record is the preferred format as variable length adds overhead when loading - but of course if network is the bottleneck then the overhead on the Oracle side can be insignificant; and it's also possible that this is no more true for the most recent Oracle versions.
As already written: I have no clue if this is also of any relevance for SQL Server but if there are "oversized" SAS character variables then it's certainly worth investigating.
01-13-2017 02:51 AM
The bottleneck is Network issues. we are checking on those currently.
- we need to transfer the complete 40 GB data every day.
- Also, this is incrimental data day on day. So it grows everyday.
- we are writing to empty table.
- we have an existing application in sas where we need to migrate to ssis. so here, only the process need to be chaged to ssis, however output should be write back to SAS datasets because these datasets will be used by some other team.
01-13-2017 03:15 AM
If you have incremental data where most of the data stays the same from one day to the next, I would suggest to create a delta and transfer that. Integrating it on the receiving side should be much quicker than the "everything including the kitchen sink" approach.
01-13-2017 04:09 AM
Yes..I hope this is the right way to do this ..instead of loading every thing at one point daily i could have used Delta and append . Earlier we did the same thing in DI delta load which is very dynamic> First time i am working on this migration to SSIS which i have no idea.
But i need to check with the team about this approach.
01-17-2017 04:19 PM
Are you using SAS/ACCESS Interface to Microsoft SQL or SAS/ACCESS Interface to ODBC? Can you you provide a code example and tell us which ODBC driver or SQL Server Client you are using?