Solved: Re: How to avoid using if as much as I can?

Phil_NZ · Posted 03-05-2021 04:03 AM

Hi all SAS Users,

After reading a couple of discussion from the forum, people normally try to replace if by where or else because if consumes quite a bit cpu or else.

I want to ask how to replace if in my case here, and is there any general rule to do so. At least I know in some case we can use where instead of if.

data dollar_volume;
	set work.filter_;
	If raw_return ne . then
	raw_return_=abs(raw_return);
	by gviidkey;
		Comment trading volume <share oustanding =>trading volume as missing;
		if cshtrd > cshoc then cshtrd=.;
		Comment trading volume*unadjusted price (converted to USD)< 100$ then set missing ;
		if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /*This code is still under suspicious*/
		then dollar_vol=.; /*This code is still under suspicious*/
		else dollar_vol=cshtrd*prccd_abs_; /*This code is still under suspicious*/
	label 
		dollar_vol=daily dollar volume
		raw_return_= abs of raw_return
	;
run;

And I am wondering is there any specific document that mentions ways to replace if ?

Many thanks and warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

PhilC · Posted 03-05-2021 09:52 AM

A thought, I don't know this -- but, is it that the BY statement implicitly assigns values, and also performs "IF" operations that if you did not use the BY statement you would increase performance on large sets?

The only two uses of the BY statement I can see here is that it 1) asserts a sort on GVIIDKEY and 2) conveys the properties of being sorted by GVIIDKEY in the metadata for DOLLAR_VOLUME. If one can assert that the datasets is already sorted prior to the run of the data step, would it be better (faster) to not use the BY statement, but to relay the property of being sorted, explicitly, via the SORTEDBY dataset option on DOLLAR_VOLUME?

data dollar_volume (sortedby=gviidkey);
	set work.filter_;
	If raw_return ne . then
	raw_return_=abs(raw_return);
/*	by gviidkey; */
		Comment trading volume <share oustanding =>trading volume as missing;
		if cshtrd > cshoc then cshtrd=.;
		Comment trading volume*unadjusted price (converted to USD)< 100$ then set missing ;
		if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /*This code is still under suspicious*/
		then dollar_vol=.; /*This code is still under suspicious*/
		else dollar_vol=cshtrd*prccd_abs_; /*This code is still under suspicious*/
	label 
		dollar_vol=daily dollar volume
		raw_return_= abs of raw_return
	;
run;

View solution in original post

Shmuel · Posted 03-05-2021 04:20 AM

WHERE is used only to select observations from the input and skip the not required.

IF can be used to select observations for output but in most cases to select observations for more specific programing.

In some cases it may be easier to code and use: SELECT statement with WHEN. Check documentation for examples.

Using SQL you have CASE clause - similar to the sas datastep SELECT.

PhilC · Posted 03-05-2021 09:08 AM

Consider this code:

data _NULL_;
  X=.;  Y=.;  Z=X*y; putlog "NOTE: " _ALL_;
  X=2;  Y=.;  Z=X*y; putlog "NOTE: " _ALL_;
  X=.;  Y=3;  Z=X*y; putlog "NOTE: " _ALL_;
  stop;
run;

Because of the way numbers multiply with missing values, I believe there is no need to use the n() function.

Also you could avoid multiplying twice, consider the difference in these cases:

data;
/*Case 1*/
if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /* 1 multiplication */
  then dollar_vol=.;
  else dollar_vol=cshtrd*prccd_abs_; /* 2nd multiplication, conditional on if*/

/*Case 2*/
dollar_vol=cshtrd*prccd_abs_; /* only 1 multiplication */
if dollar_vol <100 
  then dollar_vol=.; 
run;

PhilC · Posted 03-05-2021 09:12 AM

You are also you leaving your self open for misinterpretation since you have not prepared example data. We cannot help you as well as we might if you withhold the structure of the dataset.

Kurt_Bremser · Posted 03-05-2021 09:14 AM

WHERE selects observations for processing; observations that do not match the condition never make it into the data step.

IF works while an observation is processed.

A Subsetting IF terminates processing of the current observation, without performing the implicit OUTPUT.

Your IFs are all non-subsetting, so they can't be replaced by a WHERE.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PhilC · Posted 03-05-2021 09:52 AM

A thought, I don't know this -- but, is it that the BY statement implicitly assigns values, and also performs "IF" operations that if you did not use the BY statement you would increase performance on large sets?

The only two uses of the BY statement I can see here is that it 1) asserts a sort on GVIIDKEY and 2) conveys the properties of being sorted by GVIIDKEY in the metadata for DOLLAR_VOLUME. If one can assert that the datasets is already sorted prior to the run of the data step, would it be better (faster) to not use the BY statement, but to relay the property of being sorted, explicitly, via the SORTEDBY dataset option on DOLLAR_VOLUME?

data dollar_volume (sortedby=gviidkey);
	set work.filter_;
	If raw_return ne . then
	raw_return_=abs(raw_return);
/*	by gviidkey; */
		Comment trading volume <share oustanding =>trading volume as missing;
		if cshtrd > cshoc then cshtrd=.;
		Comment trading volume*unadjusted price (converted to USD)< 100$ then set missing ;
		if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /*This code is still under suspicious*/
		then dollar_vol=.; /*This code is still under suspicious*/
		else dollar_vol=cshtrd*prccd_abs_; /*This code is still under suspicious*/
	label 
		dollar_vol=daily dollar volume
		raw_return_= abs of raw_return
	;
run;

How to avoid using if as much as I can?

Re: How to avoid using if as much as I can?

Re: How to avoid using if as much as I can?

Re: How to avoid using if as much as I can?

Re: How to avoid using if as much as I can?

Re: How to avoid using if as much as I can?

Re: How to avoid using if as much as I can?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away