Nested Do Loops is taking hours to run

prad001 · Posted 10-06-2021 01:28 PM

Hi All,

I am using SAS 9.4M4 on Unix platform.

I have a situation where in a SAS program has nested loops. The input data is not huge, it has only 20-30 mil observations but due to the nested loops its taking almost 20-30 hours.

Below is the sample code:

data _null_;
set ds1;
call symput('a',compress(_n_));
run;

%do j=1 %to &a;

data _null_;
set ds(firstobs=&j obs=&j);
call symput('a',compress(a));
call symput('b',compress(b));
run;

data a;
set ds2(where=(a="&a" and b="&b"));
run;

proc sort data=a nodupkey out=c; by person; run;

data _null_;
set c;
call symput('c',compress(_n_));
run;

%do i=1 %to &c;

****inside this loop there are bunch of manipulation steps which are not time consuming as individual but the overall time related to the nested loops is alot. At the end it appends into one final DS.

Hence I was wondering if there is a way to parallelize the nested loop so that it can spawn at the same time and later, I can append. Appreciate your help and advice.

Thank you

ballardw · Posted 10-06-2021 02:16 PM

Why?

What is this actually supposed to do?This looks like a terribly over-complicated way to count how many records have some values.

Provide some small examples of the data sets and the desired result, including the manipulation rules you didn't show.

One of the most time intensive activities is reading from a data set and then writing to a data set. Here you are opening the same data set many times (Just how large is that not shown value for &a?) then selecting and overwriting another data set many times.

One suspects you might be using macro language where perhaps Proc SQL would be appropriate to match values from one data set to another.

Tom · Posted 10-06-2021 02:51 PM

What is it the inner steps are doing?

Why not just combine the two dataset and do the calculation using BY groups?

proc sql;
create table both as 
  select ds1.a,ds1.b,ds2.person,ds2.*
  from ds1 inner join ds2
 on ds1.a = ds2.a and ds1.b=ds2.b
 order by 1,2,3
;
quit;

data want;
  set both;
  by a b person;
.... calculations ...
run;

Kurt_Bremser · Posted 10-07-2021 03:09 AM

This is the least efficient way to retrieve the number of observations into a macro variable:

data _null_;
set ds1;
call symput('a',compress(_n_));
run;

You read the whole dataset and call the routine CALL SYMPUT in every observation. Use DICTIONARY.TABLES instead:

proc sql noprint;
select nobs into :a
from dictionary.tables
where libname = "WORK" and memname = "DS1";
quit;

Technically, this only reads the first dataset page and retrieves the value from there.

See this for a test, run on ODA:

data big;
set sashelp.cars;
do i = 1 to 10000;
  output;
end;
run;

data _null_;
set big;
call symput("a",_n_);
run;

proc sql noprint;
select nobs into :a from dictionary.tables
where libname ="WORK" and memname = "BIG";
quit;

Log excerpts:

 76         data _null_;
 77         set big;
 78         call symput("a",_n_);
 79         run;
 
 NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
       78:17   
 NOTE: There were 4280000 observations read from the data set WORK.BIG.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           1.44 seconds
       user cpu time       1.29 seconds


 81         proc sql noprint;
 82         select nobs into :a from dictionary.tables
 83         where libname ="WORK" and memname = "BIG";
 84         quit;
 NOTE:  Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       user cpu time       0.01 seconds

But that's just one little and IMO unnecessary part, as you're much better off joining the tables and running the calculations in one step with BY processing. But for that we need to know what you actually do in the innermost loop.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

prad001 · Posted 10-07-2021 03:59 PM

Hi @ballardw @Tom @Kurt_Bremser

Please find the complete code for your evaluation...

%MACRO TEST;
LIBNAME test 'path/location/';

DATA DS1;
SET test.inp_1;
RUN;

DATA DS2;
SET tes.inp_2;
RUN;

PROC SORT DATA=DS1;
By var;
run;

PROC SORT DATA=DS2;
By var;
run;

DATA ds3;
MERGE ds1(in=a) ds2(in=b);
by var;

if a and b;
RUN;

PROC SORT DATA=ds3 nodupkey out=ds4;
By var2;
RUN;

DATA _null_;
SET ds4;
CALL SYMPUT('var3',compress(_n_));
RUN;

%DO j=1 %TO &var3;**********50 times;
DATA _null_;
set ds3(firstobs=&j obs=&j);
CALL SYMPUT('var4',compress(var4));
CALL SYMPUT('var5',compress(var5));
RUN;

DATA ds5;
SET ds3;
WHERE var4="var4" and var5="var5";
RUN;

PROC SORT DATA=ds5 nodupkey out=ds6;
BY var4 var5 var6;
RUN;

DATA _null_;
SET ds6;
CALL SYMPUT('var7',compress(_n_));
RUN;

%DO i=1 %TO &var7;***************100 times;
DATA _null_;
SET ds6;
CALL SYMPUT('var4',compress(var4));
CALL SYMPUT('var5',compress(var5));
CALL SYMPUT('var8',compress(var8));
RUN;

DATA ds7;
set ds5;
WHERE var4="&var4" and var5="&var5" and var8="&var5";
RUN;

DATA ds7;
set ds7;

do i=1 to count;
output;
end;

if a > . then
call symput ('a',compress(a));

if b > . then
call symput ('b',compress(b));
RUN;

PROC SUMMARY data=ds7;
var date;
output out=ds8 min=mindate max=maxdate;
RUN;

******
ODS FOR GRAPH
PROC CAPABILITY
PROC APPEND;

******;
%END;
%END;
%MEND;

%TEST;

Kurt_Bremser · Posted 10-09-2021 04:52 AM

PROC SUMMARY data=ds7;
var date;
output out=ds8 min=mindate max=maxdate;
RUN;

******
ODS FOR GRAPH
PROC CAPABILITY

Look into BY group processing here before engaging in complicated and inefficient macro programming.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ChrisNZ · Posted 10-09-2021 06:34 PM

Your code might be more legible if you pasted it using the appropriate icon.

As it is, it is hard to understand what the purpose of this incredible succession of data steps and macro variable assignment is.

Maybe you just need 2 steps?
1. Run proc sort to remove dupkeys in DS3 by VAR4 VAR5 VAR8

2. Run proc summary by VAR4 VAR5 VAR8?

High-Performance SAS Coding - Third Edition

Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

Re: Nested Do Loops is taking hours to run

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away