BookmarkSubscribeRSS Feed
Vish33
Lapis Lazuli | Level 10

Hi,

 

I need to load 8 billion records of data from SAS to SQL Server on windows platform. I tried options below.

 

1. Using Bulkload, It took around 9 hours 20 min.

 

2. using Insertbuff DBcommit, it took more than 18 hours.

 

3. Using Bulkload & BL_USE_PIPE=Yes option, it took around 9 hrs.

 

However, I cannot afford this much time in my process for just loading. Please suggest if you have any better ideas to do this.

 

your help would be much appriciable. Thanks in advance.

 

10 REPLIES 10
Kurt_Bremser
Super User

I guess you are shoveling the data across a network connection. How large (physical file size) is the SAS dataset? And what is the bandwidth/latency on the network?

Try to run a proper system diagnostic tool (Windows task manager or UNIX topas/nmon) on the SAS server to display CPU consumption and network throughput during the load process.

Vish33
Lapis Lazuli | Level 10

Hi,

Thanks for your reply.

 

Dataset size is 40 GB.

 

and could you please provide how to check the details about 2nd point that you mentioned in your previous reply.

 

Kurt_Bremser
Super User

Use the diagnostic tools mentioned to get a feel how much data is sent at any given time. Also do a ping to the SQL server, this will show you the turnaround time (latency) for the network packets.

 

On top of that, I'd try to create a similar dataset in SQL server only, so you get a feel what the SQL server can do performancewise.

Patrick
Opal | Level 21

40GB and especially 8 billions rows are serious volumes. Why are 8 hours not acceptable? Are these volumes you have to load regularly?

 

I haven't that much experience with loading into SQL Server so just asking some "general" questions:

 

1. Do you know where the bottleneck is? 

2. Are you sure it's only 40GB that you're transfering over the network? Or is this eventually the size of the compressed SAS table?

3. Are you loading into an empty table or not? 

4. Are there indexes and constraints on the target table? And if yes, do you know how much time (re-) creation of these indexes and constraints takes up?

5. Is your target table and eventually indexes partitioned? If yes, would it be possible to run multiple SAS sessions each loading a data sub-set directly into a partition?

6. Consider running a test where you simply load external data with SQL Server tools. Is SAS the bottleneck, or Network, or the SQL Server?

 

I would have to research how exactly SAS transfers the data to SQL server but I do remember that with Oracle it's in a "rectangular" form - so a text file with all the character variables fully expanded to their full length as defined in SAS (=padded up with blanks). This can of course expand the file size of the text file to be transfered over the network quite significantly.

 

That the text file is in such a fixed record format for Oracle is not primarily a SAS thing but how the Oracle SQL*Loader expects such a file to be "by default" - at least this was the case in the past.

If you don't have such "oversized" SAS character variables with lengths of up to 32KB which mostly doesn't get used, then this is not that much of a concern for you. Else: There are at least with Oracle ways around it (variable record format) but it requires quite a bit of extra coding and "deep dive" into how things work.

 

I'm also not sure if such "oversized" SAS character variables are an issue with SQL Server but if you've got such variables then this would certainly be an area for further investigation.

 

If you have to load such volumes on a regular basis, then also consider if you could load earlier in multiple batches - at least into some SQL Server staging tables.

 

It would also be interesting to better understand your end to end process like: Where and it what form does the source data come from? What's the role of SAS in the process?

 

I hope above will be of some use for you.

 

Patrick

Kurt_Bremser
Super User

Didn't know about the fact that some DBMS's expand all the character variables. If your SAS dataset is compressed and contains character variables, you might be better off by exporting from SAS to an external file with variable record length (CSV), moving the file with sftp/WinSCP/whatever, and loading the file locally into SQL server.

 

You would see the "expanded variable effect" by watching the network throughput during the load. If throughput * time is massively higher than dataset size, you have your probable culprit.

Patrick
Opal | Level 21

@Kurt_Bremser

It was one of my very savy work mates who's gone down this path with bulk loading into Oracle in a case where performance and insufficient network throughput was the main concern.

 

I remember having read in some Oracle documentation some years ago that fixed record is the preferred format as variable length adds overhead when loading - but of course if network is the bottleneck then the overhead on the Oracle side can be insignificant; and it's also possible that this is no more true for the most recent Oracle versions.

 

As already written: I have no clue if this is also of any relevance for SQL Server but if there are "oversized" SAS character variables then it's certainly worth investigating.

Vish33
Lapis Lazuli | Level 10

The bottleneck is Network issues. we are checking on those currently.

 

-  we need to transfer the complete 40 GB data every day.

-  Also, this is incrimental data day on day. So it grows everyday.

-  we are writing to empty table.

- we have an existing application in sas where we need to migrate to ssis. so here, only the process need to be chaged to ssis, however output should be write back to SAS datasets because these datasets will be used by some other team.

Kurt_Bremser
Super User

If you have incremental data where most of the data stays the same from one day to the next, I would suggest to create a delta and transfer that. Integrating it on the receiving side should be much quicker than the "everything including the kitchen sink" approach.

Vish33
Lapis Lazuli | Level 10

Yes..I hope this is the right way to do this ..instead of loading every  thing at one point daily i could have used Delta and append . Earlier  we did the same thing in DI delta load which is very dynamic> First time i am working on this migration to SSIS which i have no idea.

 

But i need to check with the team about this approach.

 

Thanks @Kurt_Bremser and @Patrick for your valuble suggestions.

 

 

--Vish

JBailey
Barite | Level 11

Hi @Vish33

 

Are you using SAS/ACCESS Interface to Microsoft SQL or SAS/ACCESS Interface to ODBC? Can you you provide a code example and tell us which ODBC driver or SQL Server Client you are using?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 3854 views
  • 4 likes
  • 4 in conversation