BookmarkSubscribeRSS Feed
Matt3
Quartz | Level 8

Hi, is there a way to make if statement more efficient? I ve got about 80 mln obs with over 80 if statments in loop, processing this dataset takes couple of hours? Belowe you will find scrap of the code:

 

%Macro test(x_date)
data want;
set have;
		%do i=3 %to 12 %by 3; 
			if datepart(date)>=intnx('month', "&x_date"d,%eval(-&i+1), 'B') and datepart(data_danych)<=intnx('month', "&x_date"d,0,'E') and kod_3 = '1001' then do;
		  var1_&i.m=round(amount*exch_rete,0.01);
			end;

			else if datepart(date)>=intnx('month', "&x_date"d,%eval(-&i+1), 'B') and datepart(data_danych)<=intnx('month', "&x_date"d,0,'E') and kod_3 = '1201' then do;
			var2_w_UP_&i.m=round(amount*exch_rate,0.01);
			end;

			else if datepart(date)>=intnx('month', "&x_date"d,%eval(-&i+1), 'B') and datepart(data_danych)<=intnx('month', "&x_date"d,0,'E') and kod_3 in ('1041','1241') then do;
			 var3_&i.m=round(amount*exch_rate,0.01);
			end;

........
%end;

run;
%mend;
test(31AUG2017);

 

 

 

Thank you.

9 REPLIES 9
PeterClemmensen
Tourmaline | Level 20

Why do you want to do this in a macro? Why not do it with just the data step?

 

You say you have 80 different if statements, but are the two first conditions in your if statements always the same like they are below, so only the third condition differ?

 

 

Matt3
Quartz | Level 8

All conditions are differ but variables are always the same, actualy I do this in macro, due to &i macro variable.

PeterClemmensen
Tourmaline | Level 20

In my opinion, you should not use macro programming if it is not necessary 🙂

 

If you have the same pattern in all your if statements, where the two first conditions are the same, there is no need to test these two conditions in every if statement. Instead do something like this

 

%do i=3 %to 12 %by 3; 
	if datepart(date)>=intnx('month', "&x_date"d,%eval(-&i+1), 'B') and datepart(data_danych)<=intnx('month', "&x_date"d,0,'E') then do;
		if kod_3 = '1001' then var1_&i.m=round(amount*exch_rete,0.01);
		else if kod_3 = '1201' then var2_w_UP_&i.m=round(amount*exch_rate,0.01);
		else if kod_3 in ('1041','1241') then var3_&i.m=round(amount*exch_rate,0.01);
	end;
%end;
Kurt_Bremser
Super User

And this:

if kod_3 = '1001' then var1_&i.m=round(amount*exch_rete,0.01);
else if kod_3 = '1201' then var2_w_UP_&i.m=round(amount*exch_rate,0.01);
else if kod_3 in ('1041','1241') then var3_&i.m=round(amount*exch_rate,0.01);

looks much better when coded as

select (kod_3);
  when ('1001') var1_&i.m=round(amount*exch_rete,0.01);
  when ('1201') var2_w_UP_&i.m=round(amount*exch_rate,0.01);
  when ('1041','1241') var3_&i.m=round(amount*exch_rate,0.01);
  otherwise;
end;

and might even perform better.

ShiroAmada
Lapis Lazuli | Level 10

Try this.....

data KOD_3_VALUES;
  infile datalines;
  input cond $30.;
datalines;
='1001'
='1201'
in ('1041','1241')
;
run;


data _null_;
  set KOD_3_VALUES end=obs;
  call symput(compress("cond"||_n_), cond);
  if obs then call symput('totobs',_n_);
run;

%macro TestMyConditions;
  %do i=1 %to &totobs.;
    %if %eval(&i=1) %then %put if kod_3 %bquote(&&cond&i);
       %else  %if %eval(&i>1) %then %put if kod_3 %bquote(&&cond&i);
  %end;
%mend;

%TestMyConditions;

 

Hope this helps.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

It would help a lot if you actually showed some test data (in the form of a datastep using the code window {i}).  It looks to me like using arrays, i.e. a reference to a groups of variables, would be far simpler than doing lots of if statements.  The code you have shown doesn't really show any differences??  One other option, which will likely make your life a lot easier is to normalise the data, i.e. have the various variables going down the page rather than across, you would need a lot of loops and arrays that way.  Again, post test data so we can work on it, and show what you want out.  Otherwise we are just guessing.

Astounding
PROC Star

@Matt3,

 

IF / THEN runs pretty quickly.  The CPU hog here is the number of functions you are calling.  You can reduce them considerably by making a few changes;

 

data want;

set have;

dp = datepart(date);

 

Now you never need another DATEPART function in the rest of your code ... just refer to the variable DP.

 

%do i=3 %to 12 %by 3;

   range_begins = intnx('month', "&x_date"d, %eval(-&i+1), 'B');

   range_ends = intnx('month', "&x_date"d, 0, 'E');

   ...

%end;

 

Again, you don't need any additional INTNX functions.  Just refer to the variables RANGE_BEGINS and RANGE_ENDS.

 

A much smaller savings could then be achieved by nesting the IF / THEN conditions.  For example, you could try:

 

if range_begins <= dp <= range_ends then do;

   if kod_3 = '1001' then ...

   else if kod_3 = '1201' then ...

   else etc. ...

end; 

Tom
Super User Tom
Super User

Hard to tell what the logic is without either specifications or data, but couldn't you turn this loop around.

 

%do i=3 %to 12 %by 3; 
if datepart(date)>=intnx('month', "&x_date"d,%eval(-&i+1), 'B')
 and datepart(data_danych)<=intnx('month', "&x_date"d,0,'E') then do;
  if  kod_3 = '1001' then do;
    var1_&i.m=round(amount*exch_rete,0.01);
  end;
...
end;
%end;

 It looks like you are dividing things into quarters and updating a series of quarter based varaibles.

Couldn't you just calculate the quarter from the date and then generate the index?

 

 

Tom
Super User Tom
Super User

Don't call functions to recalculate on every observations values that are constants.

Don't call functions you don't need.

So you would change something like this:

%do i=3 %to 12 %by 3; 
if datepart(date)>=intnx('month', "&x_date"d,%eval(-&i+1), 'B')
 and datepart(data_danych)<=intnx('month', "&x_date"d,0,'E') then do;

to:

%do i=3 %to 12 %by 3; 
 %let lower=%sysfunc(intnx(dtmonth,"&x_date:00:00"dt,-&i+1,b));
 %let upper=%sysfunc(intnx(dtmonth,"&x_date:00:00"dt,0,e));
if date >= &lower and date_danych <= &upper then do;

So calculate the constants that are based on macro variables since they are know before the data step is even compiled.

Also calculate the constants as datetime values instead of date values so that you no longer need to call the DATEPART() function multiple times on each observation.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2296 views
  • 2 likes
  • 7 in conversation