Hi all SAS Users,
After reading a couple of discussion from the forum, people normally try to replace if by where or else because if consumes quite a bit cpu or else.
I want to ask how to replace if in my case here, and is there any general rule to do so. At least I know in some case we can use where instead of if.
data dollar_volume;
set work.filter_;
If raw_return ne . then
raw_return_=abs(raw_return);
by gviidkey;
Comment trading volume <share oustanding =>trading volume as missing;
if cshtrd > cshoc then cshtrd=.;
Comment trading volume*unadjusted price (converted to USD)< 100$ then set missing ;
if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /*This code is still under suspicious*/
then dollar_vol=.; /*This code is still under suspicious*/
else dollar_vol=cshtrd*prccd_abs_; /*This code is still under suspicious*/
label
dollar_vol=daily dollar volume
raw_return_= abs of raw_return
;
run;
And I am wondering is there any specific document that mentions ways to replace if ?
Many thanks and warm regards.
A thought, I don't know this -- but, is it that the BY statement implicitly assigns values, and also performs "IF" operations that if you did not use the BY statement you would increase performance on large sets?
The only two uses of the BY statement I can see here is that it 1) asserts a sort on GVIIDKEY and 2) conveys the properties of being sorted by GVIIDKEY in the metadata for DOLLAR_VOLUME. If one can assert that the datasets is already sorted prior to the run of the data step, would it be better (faster) to not use the BY statement, but to relay the property of being sorted, explicitly, via the SORTEDBY dataset option on DOLLAR_VOLUME?
data dollar_volume (sortedby=gviidkey);
set work.filter_;
If raw_return ne . then
raw_return_=abs(raw_return);
/* by gviidkey; */
Comment trading volume <share oustanding =>trading volume as missing;
if cshtrd > cshoc then cshtrd=.;
Comment trading volume*unadjusted price (converted to USD)< 100$ then set missing ;
if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /*This code is still under suspicious*/
then dollar_vol=.; /*This code is still under suspicious*/
else dollar_vol=cshtrd*prccd_abs_; /*This code is still under suspicious*/
label
dollar_vol=daily dollar volume
raw_return_= abs of raw_return
;
run;
WHERE is used only to select observations from the input and skip the not required.
IF can be used to select observations for output but in most cases to select observations for more specific programing.
In some cases it may be easier to code and use: SELECT statement with WHEN. Check documentation for examples.
Using SQL you have CASE clause - similar to the sas datastep SELECT.
Consider this code:
data _NULL_;
X=.; Y=.; Z=X*y; putlog "NOTE: " _ALL_;
X=2; Y=.; Z=X*y; putlog "NOTE: " _ALL_;
X=.; Y=3; Z=X*y; putlog "NOTE: " _ALL_;
stop;
run;
Because of the way numbers multiply with missing values, I believe there is no need to use the n() function.
Also you could avoid multiplying twice, consider the difference in these cases:
data;
/*Case 1*/
if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /* 1 multiplication */
then dollar_vol=.;
else dollar_vol=cshtrd*prccd_abs_; /* 2nd multiplication, conditional on if*/
/*Case 2*/
dollar_vol=cshtrd*prccd_abs_; /* only 1 multiplication */
if dollar_vol <100
then dollar_vol=.;
run;
You are also you leaving your self open for misinterpretation since you have not prepared example data. We cannot help you as well as we might if you withhold the structure of the dataset.
WHERE selects observations for processing; observations that do not match the condition never make it into the data step.
IF works while an observation is processed.
A Subsetting IF terminates processing of the current observation, without performing the implicit OUTPUT.
Your IFs are all non-subsetting, so they can't be replaced by a WHERE.
A thought, I don't know this -- but, is it that the BY statement implicitly assigns values, and also performs "IF" operations that if you did not use the BY statement you would increase performance on large sets?
The only two uses of the BY statement I can see here is that it 1) asserts a sort on GVIIDKEY and 2) conveys the properties of being sorted by GVIIDKEY in the metadata for DOLLAR_VOLUME. If one can assert that the datasets is already sorted prior to the run of the data step, would it be better (faster) to not use the BY statement, but to relay the property of being sorted, explicitly, via the SORTEDBY dataset option on DOLLAR_VOLUME?
data dollar_volume (sortedby=gviidkey);
set work.filter_;
If raw_return ne . then
raw_return_=abs(raw_return);
/* by gviidkey; */
Comment trading volume <share oustanding =>trading volume as missing;
if cshtrd > cshoc then cshtrd=.;
Comment trading volume*unadjusted price (converted to USD)< 100$ then set missing ;
if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /*This code is still under suspicious*/
then dollar_vol=.; /*This code is still under suspicious*/
else dollar_vol=cshtrd*prccd_abs_; /*This code is still under suspicious*/
label
dollar_vol=daily dollar volume
raw_return_= abs of raw_return
;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.