BookmarkSubscribeRSS Feed
sjb1
Fluorite | Level 6

I am wondering if there are any rules about how SAS handles constants and if having the macro language resolve macro variables before they reach the data-step can improve processing time. 

 

Take the following toy example. (I know there are more efficient ways to achieve the same outcome in the code below, this is purely for illustrative purposes.)

 

%macro macro_vs_datastep();

	%let macro_date = 01JAN1980;
	%let quarters_to_process = 10;

	data data_step;
		set sashelp.gnp;
		%do h = 1 %to &quarters_to_process;
			if intnx("QUARTER", Date, 0, "E") = intnx("QUARTER", "&macro_date"d, &h, "E") then output;
		%end;
	run;

	data macro;
		set sashelp.gnp;

		%do h = 1 %to &quarters_to_process;
			if intnx("QUARTER", Date, 0, "E") = %sysfunc(intnx(QUARTER, "&macro_date"d, &h, E)) then output;
		%end;
	run;

%mend;

%macro_vs_datastep();

My intuition is that that resolving the macro variables 'macro_date' and 'h' inside of %sysfunc() should reduce I/O time as in the 2nd data-step, as SAS only has to resolve intnx() with each loop of 'h'. Compare that to the first data-step where intnx() must be resolved with each loop of 'h' along with each record in the dataset. 

 

Are there any rules-of-thumb around best practices in these sorts of situations? 

4 REPLIES 4
Quentin
Super User

I don't think there is a good general rule.  Personally I would go with option 2:

%sysfunc(intnx(QUARTER, "&macro_date"d, &h, E))

I would do that for logical coherence, rather than run-time efficiency.  The expression doesn't have  any DATA step variables, so it makes sense to me to use the macro language.

 

The first option:

 intnx("QUARTER", "&macro_date"d, &h, "E")

looks weird to me, because you're using the DATA step INTNX function on constants.  Which isn't a bad thing or a thing I've never done, but feels odd.

 

Efficiency-wise, I think it's going to come down to how many macro loops you have, and how many data step loops you have.

 

In option 1 you have two DATA step invocations of INTNX.  I'm assuming the DATA step compiler is not smart enough to recognize that the second one is a constant (but I could be wrong, which would destroy my argument here).

 

In option 2 you have one DATA step invocation of INTNX, and one macro language invocation of INTNX.  

 

So if the number of records in your data is N, I think the difference in time between the two is  N*10*(speed of DATA step to execute INTNX one time) - 10*speed of %sysfunc to execute INTNX once.

 

The macro language is generally slow, for for small N it could lose, but I guess for big N the macro language might win.  Essentially in the macro language the expression:

%sysfunc(intnx(QUARTER, "&macro_date"d, &h, E))

Executes in constant time, independent of N.   I think that is O(1) in big O notation.

 

You should try testing it out.  Fun question.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.
sjb1
Fluorite | Level 6

You were right! It seems that the Data-Step Compiler does not recognize:

 

intnx("QUARTER", "&macro_date", &h, "E")

as a constant.

 

%macro macro_vs_datastep();

	options fullstimer;

	
	data input_dta;
		length Date t 8;

		do i = 1 to 100;
			Date = intnx("MONTH", "01JAN1980"d, i, "E");
			do t = 1 to 10000;
				output;
			end;
		end;
	run;	

	%let macro_date = 01JAN1980;
	%let months_to_process = 100;

	data data_step;
		set input_dta;
		%do h = 1 %to &months_to_process;
			if Date = intnx("MONTH", "&macro_date"d, &h, "E") then output;
		%end;
	run;

	data macro;
		set input_dta;
		%do h = 1 %to &months_to_process;
			if Date = %sysfunc(intnx(MONTH, "&macro_date"d, &h, E)) then output;
		%end;
	run;

%mend;

%macro_vs_datastep();

Here's the log:

 

NOTE: There were 1000000 observations read from the data set WORK.INPUT_DTA.
NOTE: The data set WORK.DATA_STEP has 1000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 18.73 seconds
user cpu time 18.70 seconds
system cpu time 0.03 seconds
memory 1497.43k
OS Memory 21272.00k
Timestamp 04/29/2024 05:05:39 PM
Step Count 59 Switch Count 9


NOTE: There were 1000000 observations read from the data set WORK.INPUT_DTA.
NOTE: The data set WORK.MACRO has 1000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.23 seconds
user cpu time 0.21 seconds
system cpu time 0.01 seconds
memory 1195.31k
OS Memory 21088.00k
Timestamp 04/29/2024 05:05:40 PM
Step Count 60 Switch Count 5

This could be specific to intnx(), other functions might be optimized to recognize this behavior, but good to know!

data_null__
Jade | Level 19

Seems like you don't need to ask 100 questions but just one with end date of 100 months from start.

 

48         %let macro_date = 01JAN1980;
49         %let months_to_process = 100;
50         
51         %let x = %sysfunc(intnx(MONTH,"&macro_date"d,100,E),date9);
52         %put &=x;
X=31MAY1988
Patrick
Opal | Level 21

@sjb1 I believe the difference in runtime you observe is solely due to the SAS datastep code your macro generates.

 

To easily see the macro generated code the SAS compiler will get:

filename mprint temp;
options fullstimer mprint mfile;
%macro_vs_datastep();
data _null_;
  infile mprint;
  input;
  put _infile_;
run;
filename mprint clear;

 

With 5 loops (%let months_to_process = 5;) the code the macro generates is as below:

data input_dta;
  length Date t 8;
  do i = 1 to 100;
    Date = intnx("MONTH", "01JAN1980"d, i, "E");
    do t = 1 to 10000;
      output;
    end;
  end;
run;
data data_step;
  set input_dta;
  if Date = intnx("MONTH", "01JAN1980"d, 1, "E") then output;
  if Date = intnx("MONTH", "01JAN1980"d, 2, "E") then output;
  if Date = intnx("MONTH", "01JAN1980"d, 3, "E") then output;
  if Date = intnx("MONTH", "01JAN1980"d, 4, "E") then output;
  if Date = intnx("MONTH", "01JAN1980"d, 5, "E") then output;
run;
data macro;
  set input_dta;
  if Date = 7364 then output;
  if Date = 7395 then output;
  if Date = 7425 then output;
  if Date = 7456 then output;
  if Date = 7486 then output;
run;

The step "data_step" needs to execute the intnx() function once per loop while step "macro" only needs to do a simple comparison per loop. I believe that's the reason for the performance difference you observe.

As you mentioned already it does look like the intnx() function needs to execute for every single iteration of the data step and doesn't already resolve to a SAS date value during compilation time.

 

I guess if you would generate below data step code then the intnx() function would also only execute once.

if Date = %sysfunc(intnx(MONTH, "01JAN1980"d, 1, E)) then output;

 

...and of course both code versions would profit from change for a real implementation. 

 

 

 

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 917 views
  • 3 likes
  • 4 in conversation