BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi all SAS Users,

After reading a couple of discussion from the forum, people normally try to replace if by where or else because if consumes quite a bit cpu or else.

I want to ask how to replace if in my case here, and is there any general rule to do so. At least I know in some case we can use where instead of if.

 

data dollar_volume;
	set work.filter_;
	If raw_return ne . then
	raw_return_=abs(raw_return);
	by gviidkey;
		Comment trading volume <share oustanding =>trading volume as missing;
		if cshtrd > cshoc then cshtrd=.;
		Comment trading volume*unadjusted price (converted to USD)< 100$ then set missing ;
		if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /*This code is still under suspicious*/
		then dollar_vol=.; /*This code is still under suspicious*/
		else dollar_vol=cshtrd*prccd_abs_; /*This code is still under suspicious*/
	label 
		dollar_vol=daily dollar volume
		raw_return_= abs of raw_return
	;
run;

And I am wondering is there any specific document that mentions ways to replace if ?

 

Many thanks and warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
PhilC
Rhodochrosite | Level 12

A thought, I don't know this -- but, is it that the BY statement implicitly assigns values, and also performs "IF" operations that if you did not use the BY statement you would increase performance on large sets?

 

The only two uses of the BY statement I can see here is that it 1) asserts a sort on GVIIDKEY and 2) conveys the properties of being sorted by GVIIDKEY in the metadata for DOLLAR_VOLUME.  If one can assert that the datasets is already sorted prior to the run of the data step, would it be better (faster) to not use the BY statement, but to relay the property of being sorted, explicitly, via the SORTEDBY dataset option on DOLLAR_VOLUME?

 

data dollar_volume (sortedby=gviidkey);
	set work.filter_;
	If raw_return ne . then
	raw_return_=abs(raw_return);
/*	by gviidkey; */
		Comment trading volume <share oustanding =>trading volume as missing;
		if cshtrd > cshoc then cshtrd=.;
		Comment trading volume*unadjusted price (converted to USD)< 100$ then set missing ;
		if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /*This code is still under suspicious*/
		then dollar_vol=.; /*This code is still under suspicious*/
		else dollar_vol=cshtrd*prccd_abs_; /*This code is still under suspicious*/
	label 
		dollar_vol=daily dollar volume
		raw_return_= abs of raw_return
	;
run;

 

View solution in original post

5 REPLIES 5
Shmuel
Garnet | Level 18

WHERE is used only to select observations from the input and skip the not required.

IF can be used to select observations for output but in most cases to select observations for more specific programing.

In some cases it may be easier to code and use: SELECT statement with WHEN.  Check documentation for examples.

Using SQL you have CASE clause - similar to the sas datastep SELECT

 

 

PhilC
Rhodochrosite | Level 12

Consider this code:

data _NULL_;
  X=.;  Y=.;  Z=X*y; putlog "NOTE: " _ALL_;
  X=2;  Y=.;  Z=X*y; putlog "NOTE: " _ALL_;
  X=.;  Y=3;  Z=X*y; putlog "NOTE: " _ALL_;
  stop;
run;

Because of the way numbers multiply with missing values, I believe there is no need to use the n() function.

 

Also you could avoid multiplying twice, consider the difference in these cases:

data;
/*Case 1*/
if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /* 1 multiplication */
  then dollar_vol=.;
  else dollar_vol=cshtrd*prccd_abs_; /* 2nd multiplication, conditional on if*/

/*Case 2*/
dollar_vol=cshtrd*prccd_abs_; /* only 1 multiplication */
if dollar_vol <100 
  then dollar_vol=.; 
run;
PhilC
Rhodochrosite | Level 12

You are also you leaving your self open for misinterpretation since you have not prepared example data.  We cannot help you as well as we might if you withhold the structure of the dataset.

Kurt_Bremser
Super User

WHERE selects observations for processing; observations that do not match the condition never make it into the data step.

IF works while an observation is processed.

A Subsetting IF terminates processing of the current observation, without performing the implicit OUTPUT.

 

Your IFs are all non-subsetting, so they can't be replaced by a WHERE.

PhilC
Rhodochrosite | Level 12

A thought, I don't know this -- but, is it that the BY statement implicitly assigns values, and also performs "IF" operations that if you did not use the BY statement you would increase performance on large sets?

 

The only two uses of the BY statement I can see here is that it 1) asserts a sort on GVIIDKEY and 2) conveys the properties of being sorted by GVIIDKEY in the metadata for DOLLAR_VOLUME.  If one can assert that the datasets is already sorted prior to the run of the data step, would it be better (faster) to not use the BY statement, but to relay the property of being sorted, explicitly, via the SORTEDBY dataset option on DOLLAR_VOLUME?

 

data dollar_volume (sortedby=gviidkey);
	set work.filter_;
	If raw_return ne . then
	raw_return_=abs(raw_return);
/*	by gviidkey; */
		Comment trading volume <share oustanding =>trading volume as missing;
		if cshtrd > cshoc then cshtrd=.;
		Comment trading volume*unadjusted price (converted to USD)< 100$ then set missing ;
		if cshtrd*prccd_abs_ <100 and n(cshtrd,prccd_abs_)=2 /*This code is still under suspicious*/
		then dollar_vol=.; /*This code is still under suspicious*/
		else dollar_vol=cshtrd*prccd_abs_; /*This code is still under suspicious*/
	label 
		dollar_vol=daily dollar volume
		raw_return_= abs of raw_return
	;
run;

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 607 views
  • 4 likes
  • 4 in conversation