BookmarkSubscribeRSS Feed
prad001
Obsidian | Level 7

Hi All,

I am using SAS 9.4M4 on Unix platform.

 

I have a situation where in a SAS program has nested loops. The input data is not huge, it has only 20-30 mil observations but due to the nested loops its taking almost 20-30 hours.

 

Below is the sample code:

 

data _null_;
set ds1;
call symput('a',compress(_n_));
run;

 

%do j=1 %to &a; 

data _null_;
set ds(firstobs=&j obs=&j);
call symput('a',compress(a));
call symput('b',compress(b));
run;

 

data a;
set ds2(where=(a="&a" and b="&b"));
run;

 

proc sort data=a nodupkey out=c; by person; run;

 

data _null_;
set c;
call symput('c',compress(_n_));
run;

 

%do i=1 %to &c; 

****inside this loop there are bunch of manipulation steps which are not time consuming as individual but the overall time related to the nested loops is alot. At the end it appends into one final DS.

 

Hence I was wondering if there is a way to parallelize the nested loop so that it can spawn at the same time and later, I can append. Appreciate your help and advice.

 

Thank you

 

6 REPLIES 6
ballardw
Super User

Why?

What is this actually supposed to do?This looks like a terribly over-complicated way to count how many records have some values.

Provide some small examples of the data sets and the desired result, including the manipulation rules you didn't show.

 

One of the most time intensive activities is reading from a data set and then writing to a data set. Here you are opening the same data set many times (Just how large is that not shown value for &a?) then selecting and overwriting another data set many times.

 

One suspects you might be using macro language where perhaps Proc SQL would be appropriate to match values from one data set to another.

Tom
Super User Tom
Super User

What is it the inner steps are doing?

Why not just combine the two dataset and do the calculation using BY groups?

proc sql;
create table both as 
  select ds1.a,ds1.b,ds2.person,ds2.*
  from ds1 inner join ds2
 on ds1.a = ds2.a and ds1.b=ds2.b
 order by 1,2,3
;
quit;

data want;
  set both;
  by a b person;
.... calculations ...
run;
Kurt_Bremser
Super User

This is the least efficient way to retrieve the number of observations into a macro variable:

data _null_;
set ds1;
call symput('a',compress(_n_));
run;

You read the whole dataset and call the routine CALL SYMPUT in every observation. Use DICTIONARY.TABLES instead:

proc sql noprint;
select nobs into :a
from dictionary.tables
where libname = "WORK" and memname = "DS1";
quit;

Technically, this only reads the first dataset page and retrieves the value from there.

See this for a test, run on ODA:

data big;
set sashelp.cars;
do i = 1 to 10000;
  output;
end;
run;

data _null_;
set big;
call symput("a",_n_);
run;

proc sql noprint;
select nobs into :a from dictionary.tables
where libname ="WORK" and memname = "BIG";
quit;

Log excerpts:

 76         data _null_;
 77         set big;
 78         call symput("a",_n_);
 79         run;
 
 NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
       78:17   
 NOTE: There were 4280000 observations read from the data set WORK.BIG.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           1.44 seconds
       user cpu time       1.29 seconds


 81         proc sql noprint;
 82         select nobs into :a from dictionary.tables
 83         where libname ="WORK" and memname = "BIG";
 84         quit;
 NOTE:  Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       user cpu time       0.01 seconds

But that's just one little and IMO unnecessary part, as you're much better off joining the tables and running the calculations in one step with BY processing. But for that we need to know what you actually do in the innermost loop.

prad001
Obsidian | Level 7

Hi @ballardw @Tom  @Kurt_Bremser 

 

Please find the complete code for your evaluation... 

 

%MACRO TEST;
LIBNAME test 'path/location/';

DATA DS1;
SET test.inp_1;
RUN;

DATA DS2;
SET tes.inp_2;
RUN;

PROC SORT DATA=DS1;
By var;
run;

PROC SORT DATA=DS2;
By var;
run;

DATA ds3;
MERGE ds1(in=a) ds2(in=b);
by var;

if a and b;
RUN;

PROC SORT DATA=ds3 nodupkey out=ds4;
By var2;
RUN;

DATA _null_;
SET ds4;
CALL SYMPUT('var3',compress(_n_));
RUN;

%DO j=1 %TO &var3;**********50 times;
DATA _null_;
set ds3(firstobs=&j obs=&j);
CALL SYMPUT('var4',compress(var4));
CALL SYMPUT('var5',compress(var5));
RUN;

DATA ds5;
SET ds3;
WHERE var4="var4" and var5="var5";
RUN;

PROC SORT DATA=ds5 nodupkey out=ds6;
BY var4 var5 var6;
RUN;

DATA _null_;
SET ds6;
CALL SYMPUT('var7',compress(_n_));
RUN;

%DO i=1 %TO &var7;***************100 times;
DATA _null_;
SET ds6;
CALL SYMPUT('var4',compress(var4));
CALL SYMPUT('var5',compress(var5));
CALL SYMPUT('var8',compress(var8));
RUN;

DATA ds7;
set ds5;
WHERE var4="&var4" and var5="&var5" and var8="&var5";
RUN;

DATA ds7;
set ds7;

do i=1 to count;
output;
end;

if a > . then
call symput ('a',compress(a));

if b > . then
call symput ('b',compress(b));
RUN;

PROC SUMMARY data=ds7;
var date;
output out=ds8 min=mindate max=maxdate;
RUN;

******
ODS FOR GRAPH
PROC CAPABILITY
PROC APPEND;

******;
%END;
%END;
%MEND;

%TEST;

Kurt_Bremser
Super User
PROC SUMMARY data=ds7;
var date;
output out=ds8 min=mindate max=maxdate;
RUN;

******
ODS FOR GRAPH
PROC CAPABILITY

Look into BY group processing here before engaging in complicated and inefficient macro programming.

ChrisNZ
Tourmaline | Level 20

Your code might be more legible if you pasted it using the appropriate icon.

As it is, it is hard to understand what the purpose of this incredible succession of data steps and macro variable assignment is. 

 

Maybe you just need 2 steps?
1. Run proc sort to remove dupkeys in DS3 by VAR4 VAR5 VAR8

2. Run proc summary by VAR4 VAR5 VAR8?

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 702 views
  • 3 likes
  • 5 in conversation