Re: Parallel processing with Loops

prad001 · Posted 10-12-2021 04:58 PM

HI Patrick,

Please find the below code.. I know this is not the best way to write SAS program. It is the legacy code, I want to see if parallel processing makes any difference. I wanna process the "J" loop in parallel.

%MACRO TEST;
LIBNAME test 'path/location/';

DATA DS1;
SET test.inp_1;
RUN;

DATA DS2;
SET tes.inp_2;
RUN;

PROC SORT DATA=DS1;
By var;
run;

PROC SORT DATA=DS2;
By var;
run;

DATA ds3;
MERGE ds1(in=a) ds2(in=b);
by var;

if a and b;
RUN;

PROC SORT DATA=ds3 nodupkey out=ds4;
By var2;
RUN;

DATA _null_;
SET ds3;
CALL SYMPUT('var3',compress(_n_));
RUN;

%DO j=1 %TO &var3;**********50 times;
DATA _null_;
set ds3(firstobs=&j obs=&j);
CALL SYMPUT('var4',compress(var4));
CALL SYMPUT('var5',compress(var5));
RUN;

DATA ds5;
SET ds3;
WHERE var4="var4" and var5="var5";
RUN;

PROC SORT DATA=ds5 nodupkey out=ds6;
BY var4 var5 var6;
RUN;

DATA _null_;
SET ds6;
CALL SYMPUT('var7',compress(_n_));
RUN;

%DO i=1 %TO &var7;***************100 times;
DATA _null_;
SET ds6;
CALL SYMPUT('var4',compress(var4));
CALL SYMPUT('var5',compress(var5));
CALL SYMPUT('var8',compress(var8));
RUN;

DATA ds7;
set ds5;
WHERE var4="&var4" and var5="&var5" and var8="&var5";
RUN;

DATA ds7;
set ds7;

do i=1 to count;
output;
end;

if a > . then
call symput ('a',compress(a));

if b > . then
call symput ('b',compress(b));
RUN;

PROC SUMMARY data=ds7;
var date;
output out=ds8 min=mindate max=maxdate;
RUN;

******
ODS FOR GRAPH
PROC CAPABILITY
PROC APPEND;

******;
%END;
%END;
%MEND;

%TEST;

Patrick · Posted 10-12-2021 06:21 PM

These are macro do loops that just generate SAS code. These loops won't take that long to run. It's the generated SAS code (50*100 times almost the same with many passes through the data) that will take up the time.

Looking into the code you've shared I'm rather certain that you could get rid of all macro processing and get this done via "normal" SAS only using by-group processing. This will then also perform much better. Fixing the code is where you should spend your time.

It's a bit hard to provide fixed code without representative sample data and desired result. I've mocked-up something below but it will likely not fully match what you need. It should show you the way to go.

data inp_1;
  infile datalines truncover dlm=',' dsd;
  input var (var4 var5) ($);
  datalines;
1,x,y
2,x,y
3,a,b
;
data inp_2;
  infile datalines truncover dlm=',' dsd;
  input var var2 $ date :date9.;
  format date date9.;
  datalines;
1,a,01jan2021
1,a,01feb2021
1,b,01mar2021
1,a,01apr2021
2,a,01jan2021
2,c,01feb2021
2,c,01mar2021
4,a,01jan2021
;

libname test "%sysfunc(pathname(work))";
data ds1;
  set test.inp_1;
run;

data ds2;
  set test.inp_2;
run;

proc sort data=ds1;
  by var;
run;

proc sort data=ds2;
  by var;
run;

data ds3;
  merge ds1(in=a) ds2(in=b);
  by var;
  if a and b;
run;

proc summary data=ds3 ;
  class var2;
  var date;
  ways 1;
  output out=ds8 min=mindate max=maxdate;
run;

prad001 · Posted 10-13-2021 09:38 AM

Hi Patrick., Thank you for the code. But if you see my code., ds3 and ds6 are made unique(nodupkey) and the both the loops are up until the last unique value on the full dataset. And if you do by processing, I am not sure of how the nested loops will work. Because this is not very complex code at all but I am not getting an approach of how to write it alternatively.

Kurt_Bremser · Posted 10-13-2021 02:32 PM

Nesting is done in BY by using multiple variables.

by a b;

will do a group change whenever b or a changes, and because of the preceding sort with the same BY, all b groups within the first a group will be dealt with first, then all b groups within the second a group, and so on.

You really need to get an understanding of BY first before you engage in such unwieldy and inefficient macro coding.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX