BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

There are over 5 Million records in SQL table and those records being loaded to SQL table from SAS. So everyday the DI Job is accessing the SQL Server table to feed the data and it is taking over 15-20 minutes to feed the data.

 

So we decided to do the performace tuning to quick access the SQL Server table and feed the data. Not sure if we can achieve this by Partition technique or similar technique. Appericiate your help here.

 

We already explored about the bulk loading mechanism and it is helpless though.

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

Bulk load is the recommended way for what you are doing.

If it is not working (and no one here can help), you might want to ask SAS tech support for tips. 

s_lassen
Meteorite | Level 14

15 to 20 minutes for 5 million records is not that bad. If you need to speed it up, you may get away with running the process in parallel. I have tried something like that once, but it's been a while, I will try to see what I can remember. 

 

First thing is to find out how many processors you have available on your SQL Server. My experience was that you should not run more than that number minus 2, so if the server has 10 processors, you will probably get the best result by running 8 processes in parallel.

 

Then you should partition your data to that number of partitions. I would normally create a dummy byte variable to partition, it is OK to create all the possible partitions on your output table, but only use some of them, so that the table in theory has e.g. 256 partitions, but only e.g. 8 partitions are actually active.

 

If your SAS server also has multiple processors, you may get good results by using the SYSTASK statement, writing a master SAS program which starts other SAS tasks, each of which bulk load a specific partition. Something like

%let noof_partitions=8;
filename tempsas temp;
data _null_;
  if 0 then set <input data> nobs=nobs;
  file tempsas;
  do x=1 to &noof_partitions;
    first_obs=floor((x-1)*nobs/&noof_partitions)+1;
	last_obs=floor(x*nobs/&noof_partitions);
	put 'systask command "SAS -initstmt %load_data(partition=' x ',' first_obs= ',' last_obs= ')" taskname=sas' x ';';
	end;
  stop;
run;
%include tempsas;
waitfor _ALL_;

You should then make sure that the SAS invocation from SYSTASK refers to a valid macro in the INITSTMT option. You will probably have to tweak my initial code quite a bit, read the information on the SYSTASK statement carefully.

 

Your LOAD_DATA macro could be something like

%macro load_data(partition=,first_obs=,last_obs=);
data work.temp/view=work.temp;
  set <input data> (firstobs=&first_obs obs=&last_obs);
  retain <partitioning variable> &partition;
run;
libname SQLServ <your SQL server database>;
proc sql;
insert into SQLServ.<output table>(<variables>) select(<variables>) from work.temp;
quit; %mend;

If possible, I would still use bulk loading to SQL Server for the individual load processes. For this to work, your SAS server must be up to the task as well, of course.

 

David_Billa
Rhodochrosite | Level 12
I have two questions.

A. How to identify the partitioning variable?
B. How to make sure that SYSTASK refers to a valid macro in the INITSTMT
option?
s_lassen
Meteorite | Level 14

A: You must set up the partitioning in SQL server, and give the variable a name there. 

 

B: Your macro should be in the standard autocall library, and your SAS invocation should match that.

Another possibility is to write a standalone program, which uses to 3 global macro variables, and use that as the SYSIN parameter in the SAS invocation, using INITSTMT to assign values to the 3 variables, e.g.:

%let noof_partitions=8;
filename tempsas temp;
data _null_;
  if 0 then set <input data> nobs=nobs;
  file tempsas;
  do x=1 to &noof_partitions;
    first_obs=floor((x-1)*nobs/&noof_partitions)+1;
    last_obs=floor(x*nobs/&noof_partitions);
    Assign=cats('%let partition=',x,';%let first_obs=',first_obs,'%let last_obs=',last_obs,';');
    put 'systask command "SAS -sysin c:\load_db.sas -initstmt ''' Assign "' taskname=sas" x ';';
    end;
  stop;
run;
%include tempsas;
waitfor _ALL_;

Your SAS program should then be similar to the macro, only without the %macro and %mend statements.

 

Parallel loading is not so simple, but if performance really matters, it may be worth it.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 726 views
  • 1 like
  • 3 in conversation